LGM3A 2024: The 2nd Workshop on Large Generative Models Meet Multimodal Applications

Shihao Xu, Yiyang Luo, Justin Dauwels, Andy Khong, Zheng Wang, Qianqian Chen, Chen Cai, Wei Shi, Tat Seng Chua

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

Abstract

This workshop aims to explore the potential of large generative models to revolutionize how we interact with multimodal information. A Large Language Model (LLM) represents a sophisticated form of artificial intelligence engineered to comprehend and produce natural language text, exemplified by technologies such as GPT, LLaMA, Flan-T5, ChatGLM, Qwen, etc. These models undergo training on extensive text datasets, exhibiting commendable attributes including robust language generation, zero-shot transfer capabilities, and In-Context Learning (ICL). With the surge in multimodal content-encompassing images, videos, audio, and 3D models-over the recent period, Large MultiModal Models (LMMs) have seen significant enhancements. These improvements enable the augmentation of conventional LLMs to accommodate multimodal inputs or outputs, as seen in BLIP, Flamingo, KOSMOS, LLaVA, Gemini, GPT-4, etc. Concurrently, certain research initiatives have developed specific modalities, with Kosmos2 and MiniGPT-5 focusing on image generation, and SpeechGPT on speech production. There are also endeavors to integrate LLMs with external tools to achieve a near "any-to-any" multimodal comprehension and generation capacity, illustrated by projects like Visual-ChatGPT, ViperGPT, MMREACT, HuggingGPT, and AudioGPT. Collectively,these models, spanning not only text and image generation but also other modalities, are referred to as large generative models. This workshop will allow researchers, practitioners, and industry professionals to explore the latest trends and best practices in the multimodal applications of large generative models.
Original languageEnglish
Title of host publicationLGM3A '24
Subtitle of host publicationProceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery (ACM)
Pages1-3
Number of pages3
ISBN (Electronic)979-8-4007-1193-0
DOIs
Publication statusPublished - 2024
Event2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024 - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024
https://lgm3a.github.io/LGM3A2024/

Conference

Conference2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024
Abbreviated titleLGM3A 2024
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24
Internet address

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • generative models
  • large language models
  • multimodal applications

Fingerprint

Dive into the research topics of 'LGM3A 2024: The 2nd Workshop on Large Generative Models Meet Multimodal Applications'. Together they form a unique fingerprint.

Cite this