上一条:Jia Li, Yin Chen, Xuesong Zhang, et al. Multimodal feature extraction and fusion for emotional reaction intensity estimation and expression classification in videos with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2023: 5837-5843.
下一条:Xiao J, Hu Z, Li J, et al. Text proxy: Decomposing retrieval from a 1-to-N relationship into N 1-to-1 relationships for text-video retrieval[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 39(8): 8655-8663.