Sign Language Translation and Production
The research in this area includes the understanding and analysis of sign language videos, the translation of sign language visual content into textual sentences, and the generation of sign language visual expressions based on textual sentences.
Sign language video translation primarily involves inputting a video, where the network performs action recognition and sequence alignment, ultimately outputting a textual statement. The focus of this research is on modeling the spatio-temporal relationships in videos, aligning visual feature sequences with textual word sequences in a cross-modal fashion, and the automatic translation of colloquial sentences. Similar research tasks include the classification and recognition of human motion videos and the automated generation of video captions (descriptions).
Sign language video generation can be further subdivided into sign language animated video synthesis, sign language posture video generation, and realistic sign language video generation. This task is closely linked with the current hot topic of artificial intelligence content generation technology (AIGC). Sign language posture video generation, often an intermediate process in the generation of realistic sign language videos, is also widely focused on by current researchers in the field of visual generation. The research focus in this direction lies in the mining and representation of textual semantics, detailed modeling of human body posture and motion, alignment of cross-modal sequences, and the control of the authenticity and coherence of the generated videos.