合肥工业大学主页平台管理系统 Shengeng Tang--Home-- Sign Language Translation and Production

Shengeng Tang

Supervisor of Master's Candidates

School/Department:School of Computer Science and Information Engineering

Administrative Position:Lecturer

Education Level:With Certificate of Graduation for Doctorate Study

Business Address:A904, Science and Education Building, Feicui Lake Campus, Hefei University of Technology

Alma Mater:Hefei University of Technology

Discipline:Computer Applications Technology

Recommended MA Supervisor Team Member

Research Focus

Current position: Home >>Research Focus

Sign Language Translation and Production

The research in this area includes the understanding and analysis of sign language videos, the translation of sign language visual content into textual sentences, and the generation of sign language visual expressions based on textual sentences.

Sign language video translation primarily involves inputting a video, where the network performs action recognition and sequence alignment, ultimately outputting a textual statement. The focus of this research is on modeling the spatio-temporal relationships in videos, aligning visual feature sequences with textual word sequences in a cross-modal fashion, and the automatic translation of colloquial sentences. Similar research tasks include the classification and recognition of human motion videos and the automated generation of video captions (descriptions).

Sign language video generation can be further subdivided into sign language animated video synthesis, sign language posture video generation, and realistic sign language video generation. This task is closely linked with the current hot topic of artificial intelligence content generation technology (AIGC). Sign language posture video generation, often an intermediate process in the generation of realistic sign language videos, is also widely focused on by current researchers in the field of visual generation. The research focus in this direction lies in the mining and representation of textual semantics, detailed modeling of human body posture and motion, alignment of cross-modal sequences, and the control of the authenticity and coherence of the generated videos.