From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval

Jaeyoung Choi; Martha Larson; Gerald Friedland; Alan Hanjalic

doi:10.1109/BigMM.2019.00-48

From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval

Jaeyoung Choi, Martha Larson, Gerald Friedland, Alan Hanjalic

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

1 Citation (Scopus)

Abstract

Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space. We integrate multi-task learning into each step, making it possible to leverage multiple datasets, annotated with different concepts, as if they were one large dataset. Large-scale systematic experiments demonstrate improvements over previously reported state-of-the-art methods on cross-modal retrieval tasks.

Original language	English
Title of host publication	Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
Pages	1-10
Number of pages	10
ISBN (Electronic)	9781728155272
DOIs	https://doi.org/10.1109/BigMM.2019.00-48
Publication status	Published - 1 Sept 2019
Event	5th IEEE International Conference on Multimedia Big Data, BigMM 2019 - Singapore, Singapore Duration: 11 Sept 2019 → 13 Sept 2019

Publication series

Name	Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019

Conference

Conference	5th IEEE International Conference on Multimedia Big Data, BigMM 2019
Country/Territory	Singapore
City	Singapore
Period	11/09/19 → 13/09/19

Keywords

Cross-modal retrieval
Image retrieval
Multi-task learning
Video retrieval

Access to Document

10.1109/BigMM.2019.00-48

Cite this

Choi, J., Larson, M., Friedland, G., & Hanjalic, A. (2019). From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval. In Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019 (pp. 1-10). Article 8919383 (Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/BigMM.2019.00-48

Choi, Jaeyoung ; Larson, Martha ; Friedland, Gerald et al. / From intra-modal to inter-modal space : Multi-task learning of shared representations for cross-modal retrieval. Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019. Institute of Electrical and Electronics Engineers (IEEE), 2019. pp. 1-10 (Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019).

@inproceedings{a9eac0f80bd8429b98f8e91e056c707a,

title = "From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval",

abstract = "Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space. We integrate multi-task learning into each step, making it possible to leverage multiple datasets, annotated with different concepts, as if they were one large dataset. Large-scale systematic experiments demonstrate improvements over previously reported state-of-the-art methods on cross-modal retrieval tasks.",

keywords = "Cross-modal retrieval, Image retrieval, Multi-task learning, Video retrieval",

author = "Jaeyoung Choi and Martha Larson and Gerald Friedland and Alan Hanjalic",

year = "2019",

month = sep,

day = "1",

doi = "10.1109/BigMM.2019.00-48",

language = "English",

series = "Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

pages = "1--10",

booktitle = "Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019",

address = "United States",

note = "5th IEEE International Conference on Multimedia Big Data, BigMM 2019 ; Conference date: 11-09-2019 Through 13-09-2019",

}

Choi, J, Larson, M, Friedland, G & Hanjalic, A 2019, From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval. in Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019., 8919383, Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019, Institute of Electrical and Electronics Engineers (IEEE), pp. 1-10, 5th IEEE International Conference on Multimedia Big Data, BigMM 2019, Singapore, Singapore, 11/09/19. https://doi.org/10.1109/BigMM.2019.00-48

From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval. / Choi, Jaeyoung; Larson, Martha; Friedland, Gerald et al.
Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019. Institute of Electrical and Electronics Engineers (IEEE), 2019. p. 1-10 8919383 (Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - From intra-modal to inter-modal space

T2 - 5th IEEE International Conference on Multimedia Big Data, BigMM 2019

AU - Choi, Jaeyoung

AU - Larson, Martha

AU - Friedland, Gerald

AU - Hanjalic, Alan

PY - 2019/9/1

Y1 - 2019/9/1

N2 - Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space. We integrate multi-task learning into each step, making it possible to leverage multiple datasets, annotated with different concepts, as if they were one large dataset. Large-scale systematic experiments demonstrate improvements over previously reported state-of-the-art methods on cross-modal retrieval tasks.

AB - Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space. We integrate multi-task learning into each step, making it possible to leverage multiple datasets, annotated with different concepts, as if they were one large dataset. Large-scale systematic experiments demonstrate improvements over previously reported state-of-the-art methods on cross-modal retrieval tasks.

KW - Cross-modal retrieval

KW - Image retrieval

KW - Multi-task learning

KW - Video retrieval

UR - http://www.scopus.com/inward/record.url?scp=85077054017&partnerID=8YFLogxK

U2 - 10.1109/BigMM.2019.00-48

DO - 10.1109/BigMM.2019.00-48

M3 - Conference contribution

T3 - Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019

SP - 1

EP - 10

BT - Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019

PB - Institute of Electrical and Electronics Engineers (IEEE)

Y2 - 11 September 2019 through 13 September 2019

ER -

Choi J, Larson M, Friedland G, Hanjalic A. From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval. In Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019. Institute of Electrical and Electronics Engineers (IEEE). 2019. p. 1-10. 8919383. (Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019). doi: 10.1109/BigMM.2019.00-48

From intra-modal to inter-modal space: Multi-task learning of shared representations for cross-modal retrieval

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this