← All reviews

Semantic Knowledge in Word Completion

Jianhua Li, Graeme Hirst · 2005 · Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '05) · doi:10.1145/1090785.1090809

Summary

This paper proposes an integrated word-completion system that combines semantic knowledge with traditional n-gram statistical models to produce more contextually appropriate predictions for users with linguistic or physical disabilities. Standard word-completion systems use n-gram probabilities to predict likely words based on letter prefixes and preceding text, but these models are weak at capturing long-distance semantic relationships between words. The authors address this by building a semantic knowledge base from the British National Corpus (BNC, 83 million words) using pointwise mutual information (PMI) to identify co-occurring word pairs, then filtering these through a Lesk-like algorithm that checks whether candidates appear in each other's WordNet glosses to confirm genuine semantic relatedness rather than mere statistical co-occurrence. This produces relationship data for 3,031 distinct nouns covering most common English nouns. During prediction, the system computes a semantic association score between each completion candidate and the content words in the preceding context, then combines this with n-gram probabilities using an expert combination formula. The system also incorporates two additional features: automatic learning of salient terms (words that are both contextually uncommon and frequently recurring in the current text) and an out-of-vocabulary named entity prediction strategy that caches proper nouns for reuse.

Key findings

The integrated semantic model achieved a 65% keystroke saving rate, compared to 59% for the syntax-and-n-gram baseline — a 14.63% improvement. This was evaluated on a test set of 3,700 nouns (22,854 characters) from randomly selected BNC text. The semantic model's contribution was particularly strong for content words (nouns), reducing the keystrokes needed from 9,654 to 7,888. The out-of-vocabulary (OOV) named entity strategy contributed more than half of the total improvement, as proper nouns are very long words that n-gram models handle poorly but that tend to repeat in a document. Varying the number of seed words (10 to 80) used to build semantic relatives had minimal effect on performance, suggesting the approach is robust. Extending context length from one to two sentences improved keystroke saving by nearly 1%, but three or four sentences showed diminishing returns. The authors note an important caveat for cognitively disabled users: higher-quality prediction lists might actually slow users down if they must evaluate more semantically plausible but incorrect candidates, meaning the system must reject all but the very best predictions for this population.

Relevance

Word completion is a critical assistive technology for people with physical disabilities (for whom every keystroke is effortful), learning disabilities, and cognitive disabilities. This research demonstrates that adding semantic awareness to prediction models can meaningfully reduce the number of keystrokes required, which directly translates to faster communication and reduced fatigue. The 14.63% improvement over the statistical baseline is significant for users who may type thousands of characters daily using single-switch scanning, head pointers, or other slow input methods. For accessibility practitioners, the paper's observation about cognitive disability is particularly important: better prediction lists are not automatically better for all users, because evaluating candidates requires cognitive effort. This tension between prediction accuracy and cognitive load remains relevant in modern AAC and text-entry systems. The work is part of a larger project developing spelling and grammar aids for users with cognitive disabilities, connecting word completion to the broader goal of supporting written communication for people with diverse linguistic abilities.

Tags: word prediction · word completion · natural language processing · text entry · AAC · cognitive disability · physical disability · semantic analysis · language model · keystroke saving