Sign Language Processing

Sign languages are fully natural, visually expressed languages with rich grammatical structure. Unlike spoken language, they unfold in three-dimensional space through the coordinated use of hands, arms, body posture, and facial expressions. Each country or region typically has its own sign language — Norwegian Sign Language (NTS), British Sign Language (BSL), Sign Language of the Netherlands (NGT), and American Sign Language (ASL) are among the hundreds worldwide, none of which are mutually intelligible.

Sign Language Processing (SLP) is the subfield of AI concerned with the automatic analysis and generation of sign language. It sits at the intersection of computer vision, natural language processing, and sign language linguistics.


Core Tasks

Sign Language Recognition (SLR) maps signed video or pose data to linguistic representations. Isolated Sign Recognition (ISR) classifies individual signs from segmented clips. Continuous Sign Language Recognition (CSLR) transcribes connected signing into a gloss sequence without predefined boundaries — substantially harder due to coarticulation, signer variation, and unsegmented input.

Sign Language Translation (SLT) maps a sign language utterance to a spoken or written language sentence, requiring cross-modal and cross-lingual transfer. Historically framed as a recognition-then-translation pipeline, recent work increasingly pursues gloss-free end-to-end approaches that model sign-to-text directly.

Sign Language Production (SLP/G) generates sign language from text, typically via pose synthesis or photorealistic video generation. This encompasses motion synthesis, avatar animation, and increasingly diffusion-based approaches. Despite the name, it is a translation task in the opposite direction from SLT.

Other tasks include sign segmentation (identifying temporal boundaries in continuous signing), sign spotting (locating specific signs within a stream), signer anonymisation, and sign-text alignment for corpus annotation.


Beyond the Lexicon

A key challenge — and active research frontier — is that sign languages are not purely lexical. In spontaneous signing, roughly 40% of signs are non-lexical, consisting of productive constructions that exploit three-dimensional space, iconicity, and discourse context. These include:

Most current benchmarks and models treat signing as a linear sequence of glosses, implicitly assuming a discrete and finite lexicon. This misses the spatial grammar and simultaneity that are central to how sign languages actually work.



Key Challenges