What’s the Point? Spatial Grammar & Index Resolution for Sign Language Recognition

less than 1 minute read

Published:

What's the Point? Spatial Grammar & Index Resolution for Sign Language Recognition is available on arXiv.

Venue: arXiv preprint
Authors: Ranum, O., Hadfield, S., Bowden, R.

🌐 Project Page  ·  📄 arXiv

Abstract: Sign language models are predominantly trained with gloss-sequence or text supervision, thereby under-modeling non-lexical and productive constructions. One comparatively tractable instance is spatial indexing: pointing gestures that assign discourse entities to spatial loci for subsequent co-reference, which lexicon-centric objectives largely fail to capture. We present a targeted evaluation of indexing in Sign Language Recognition, showing that despite comprising 10-15% of signing content, indexing is poorly recovered. We introduce a framework for training and evaluating indexing experts, establishing a baseline for index-aware sign language modeling. Our approach decomposes spatial reference resolution into index detection and discourse entity linking. The resulting mention representations enable automatic annotation and non-lexical structure modeling, and serve as an auxiliary indexing expert that augments a frozen SLR model at inference time.