2026年2月团队成果被FMS B期刊IPM录用和发表;
点击次数:
基础信息
标题:A disentangled multimodal neural topic model
作者:熊迎秋,刘业政,钱洋*,姜元春,柴一栋,凌海峰
发表期刊/来源:Information Processing & Management
链接:https://doi.org/10.1016/j.ipm.2026.104683
摘要
This study focuses on multimodal topic modeling and attempts to separate public topics (shared across modalities) from private topics (unique to each modality) hidden in text and image data. To address this issue, we propose a novel Disentangled Multimodal Neural Topic Model (DMNTM). Specifically, we design the modality-specific encoder with an independence constraint to capture private topics, and the public encoder with a product-of-experts module to extract cross-modal shared topics. We conduct extensive experiments on six public datasets, including multimodal online reviews from Amazon, posts from Flickr, tweets from Twitter, and webpages from Wikipedia. Compared with state-of-the-art methods, we find that DMNTM significantly improves topic modeling performance in terms of perplexity, coherence, diversity, and topic quality over the best baseline. In two downstream tasks, including recommendation and sentiment classification, DMNTM further improves the performance. These results show that disentangling public and private topics effectively enhances both the quality and utility of multimodal representations.
中文翻译:本研究聚焦于多模态主题建模,旨在从文本和图像数据中分离出隐藏的两类主题:公共主题(跨模态共享)和私有主题(各模态独有)。为解决这一问题,我们提出了一种新颖的解耦多模态神经主题模型(DMNTM)。具体而言,我们设计了带有独立性约束的模态特定编码器以捕捉私有主题,并采用专家乘积模块构建公共编码器以提取跨模态共享主题。我们在六个公开数据集上进行了大量实验,包括亚马逊的多模态在线评论、Flickr的帖子、Twitter的推文以及维基百科的网页。与最先进的方法相比,我们发现DMNTM在困惑度、连贯性、多样性和主题质量等方面均显著优于最佳基线模型。在两个下游任务(推荐和情感分类)中,DMNTM进一步提升了性能。这些结果表明,有效解耦公共主题和私有主题能够显著提升多模态表示的质量与实用性。
