Glossary

Terms used in accessibility research and practice. Each entry has a definition, common aliases, and category tags.

Search results

Part-of-Speech Tagging(also: POS Tagging, Grammatical Tagging): Part-of-speech tagging is the natural-language-processing task of labelling each word in a text with its grammatical category — noun, verb, adjective, and so on — using context from surrounding words. Classical approaches use hidden Markov models with the Viterbi algorithm;…
Perplexity(also: Language Model Perplexity): A standard metric for evaluating language models that measures how well the model predicts a sample of text. Mathematically, perplexity is the inverse probability of the test set, normalised by the number of words — a lower perplexity indicates that the model assigns higher…
Pointwise Mutual Information(also: PMI): A statistical measure used in natural language processing to quantify the strength of association between two words based on how much more frequently they co-occur in a corpus than would be expected by chance. PMI is calculated as the logarithm of the ratio of the observed…
Polysemy(also: Polysemous Words): The property of a word having multiple related meanings or senses. For example, the word "bank" can refer to a financial institution or the edge of a river. Polysemy creates particular challenges for text simplification and accessibility tools because choosing an appropriate…
Prompt Chaining(also: Chained Prompting, Sequential Prompting): A technique for interacting with large language models where multiple prompts are issued in sequence, with each prompt building on the output of the previous one to achieve a more refined or accurate result. In accessibility and bias mitigation contexts, prompt chaining enables…
Pronominal Reference(also: Pronoun Reference, Anaphoric Reference): The use of pronouns or pronoun-like expressions to refer back to entities previously introduced in a discourse. In spoken and written languages this is typically achieved with words such as "he," "she," "it," or "they"; in American Sign Language and other signed languages,…
SARI(also: System output Against References and against the Input sentence): An automatic evaluation metric for text simplification systems that compares a system’s output against both the original input sentence and a set of human-written simplification references, rewarding the system for adding appropriate words, keeping important words, and deleting…
Semantic Disambiguation(also: Word Sense Disambiguation): Semantic disambiguation is the process of determining the intended meaning of a word, symbol, or input when multiple interpretations are possible. In accessibility and assistive technology contexts, semantic disambiguation is important in communication aids, predictive text…
Semantic Relatedness(also: Semantic Similarity, Semantic Association): A measure of how closely related two words or concepts are in meaning, encompassing various types of relationships including synonymy, hyponymy, meronymy, and general topical association. In assistive technology, semantic relatedness is used to improve word prediction and…
Semantic distance(also: Semantic similarity, Word embedding distance): A computational measure of how different two words are in meaning, typically derived from word embedding models like word2vec that represent words as vectors in a high-dimensional space. In caption evaluation for DHH users, semantic distance between an ASR error and the intended…
Sentiment Analysis(also: Opinion Mining): A natural language processing technique that identifies and extracts subjective information from text, classifying it as positive, negative, or neutral. In accessibility research, sentiment analysis can be applied to social media posts, product reviews, and online discussions to…
Sequence-to-Sequence(also: Seq2Seq, Encoder-Decoder): A neural network architecture designed for tasks where both input and output are sequences of variable length, such as machine translation, speech recognition, and video captioning. A seq2seq model consists of an encoder that processes the input sequence into a fixed-length…
Sign Language Generation(also: Sign Language Synthesis, Signing Generation): The automatic production of sign language content, typically through computer-generated animations of signing avatars or video synthesis. Sign language generation systems convert text or symbolic representations of signs into visual output, often using motion-capture data,…
Sign Language Machine Translation(also: English-to-ASL Translation, Sign Language MT, Text-to-Sign Translation): The automatic translation of written or spoken text into a signed language (or vice versa) using computational methods, typically producing output as an animated signing avatar or, less commonly, as recorded video clips. Because signed languages such as American Sign Language…
Speech Repair(also: Self-Correction, Speech Self-Repair, Command Correction): Speech repair is the process of correcting or modifying a spoken utterance after it has been produced, either within the same turn or in a subsequent one. In natural conversation, speakers commonly interrupt themselves to fix errors, change wording, or update information using…
Spoken Dialog System(also: SDS, Voice Dialog System, Conversational AI): A computer system that uses speech as both input and output to conduct goal-oriented conversations with users. Unlike simple voice command systems, spoken dialog systems can handle multi-turn exchanges, track conversation context, manage misunderstandings, and adapt to user…
Stemming(also: Word Stemming, Suffix Stripping): Stemming is a natural-language-processing technique that reduces inflected or derived words to their base or root form — 'running', 'runs', and 'ran' all map to the stem 'run'. The Porter stemmer (1980) is the canonical example for English. Stemming helps information-retrieval…
Syntactic Parse Tree(also: Parse Tree, Syntactic Tree): A tree-shaped data structure that represents the grammatical structure of a sentence according to a formal grammar. Internal nodes correspond to phrases (noun phrase, verb phrase, clause, sentence) and leaves correspond to individual words or signs. Parse trees are produced…
Syntactic Simplification(also: Sentence Simplification): A form of text simplification that restructures complex sentences into simpler grammatical forms, such as splitting compound sentences, converting passive voice to active voice, or resolving relative clauses. Syntactic simplification reduces the cognitive load of parsing…
Text Alignment(also: Sequence Alignment, Transcript Alignment): The process of matching corresponding segments between two or more text sequences that represent the same content but may differ in timing, wording, or structure. In captioning systems, text alignment is used to synchronize parallel transcription streams — such as…
Text Simplification(also: Automatic Text Simplification, Content Simplification): The process of transforming complex written text into simpler, more understandable versions while preserving the essential meaning. Text simplification can be performed manually by content authors following plain language guidelines, or automatically using natural language…
Topic Modeling(also: LDA, Latent Dirichlet Allocation): A machine learning technique that automatically discovers abstract themes or topics within a collection of documents by analyzing patterns of word co-occurrence. Latent Dirichlet Allocation (LDA) is the most widely used topic modeling algorithm. In accessibility research, topic…
Topic Segmentation(also: Text Segmentation, Topicalisation): A natural language processing technique that automatically divides a document into coherent sections based on changes in topic or subject matter. Topic segmentation algorithms detect boundaries where the semantic content of adjacent sentences or paragraphs shifts significantly,…
Toxicity detection(also: Content toxicity scoring, Toxic speech detection): An NLP-based content moderation technique that assigns scores to text indicating the likelihood it is rude, disrespectful, or likely to make someone leave a conversation. Research has shown that toxicity detection models encode disability bias, scoring innocuous sentences that…
Transfer Machine Translation(also: Transfer MT, Rule-Based Transfer Translation): A rule-based machine-translation paradigm that analyses the source text into a syntactic or semantic structure, applies a set of transfer rules to produce a corresponding structure in the target language, and then generates the target surface form. Transfer MT sits between…
Transformer(also: Transformer Model, Transformer Architecture): A deep learning architecture introduced by Vaswani et al. in 2017 that relies entirely on attention mechanisms rather than recurrence (RNNs) or convolution for sequence modeling tasks. Transformers process entire input sequences in parallel using "self-attention" to weigh the…
Trigram(also: 3-gram): A sequence of three consecutive words used in statistical language modeling for word prediction. Trigram models predict the next word based on the two preceding words, capturing more context than simpler unigram (single word) or bigram (two word) models. In AAC word prediction,…
Visual Dialogue(also: Visual Dialog, VisDial): Visual dialogue is an AI task that involves holding a multi-turn natural language conversation about visual content such as an image or video frame. Unlike single-turn visual question answering (VQA), visual dialogue systems maintain context across multiple exchanges, using…
Visualization Question Answering(also: Chart QA, Visualization QA, VisQA): A class of interactive systems that let users ask natural-language questions about a data visualization — a chart, graph, or map — and receive direct textual or spoken answers rather than having to interpret the visualization themselves. Visualization QA systems typically…
Viterbi Algorithm: The Viterbi algorithm is a dynamic-programming procedure for finding the most likely sequence of hidden states in a Hidden Markov Model given a sequence of observations. It is the standard solution to part-of-speech tagging, many speech-recognition tasks, and decoding problems…
Word Embedding(also: Word Vector, Distributed Word Representation): A technique in natural language processing that represents words as numerical vectors in a multi-dimensional space, where words with similar meanings are positioned closer together. Word embeddings enable computers to understand semantic relationships between words, which…
Word Frequency(also: Lexical Frequency): A measure of how often a word occurs in a given language or text corpus. High-frequency words like common function words are encountered regularly and recognized quickly, while low-frequency words are rarer and require more cognitive effort to process. Word frequency…
Word Lattice(also: Recognition Lattice, Speech Lattice): A graph data structure produced by a speech recognizer that represents multiple competing word hypotheses explored during recognition, along with their acoustic and language model scores. Each path through the lattice represents a possible transcription of the spoken input. Word…
Word Prediction(also: Predictive Text, Word Completion): A text input feature that suggests complete words based on the characters already typed, using language models to anticipate the most likely intended word. In assistive technology, word prediction is particularly valuable for single-switch users and people with motor…
Word importance(also: Lexical importance, Information content): A measure of how critical a specific word is to the overall meaning of a sentence, typically computed using neural language models that estimate how predictable a word is from its context. In captioning evaluation, word importance helps determine the impact of ASR errors:…
Word sense disambiguation(also: WSD): A natural language processing task that determines which meaning of a word is being used in a given context when the word has multiple possible meanings (polysemy). In accessibility applications, particularly automatic text simplification, WSD is essential for lexical…
WordNet: A large-scale electronic lexical database of the English language developed at Princeton University, in which nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets) linked by semantic and lexical relations. WordNet is designed to model how…
iVector(also: Identity Vector, i-vector): A low-dimensional representation of voice characteristics widely used in speaker recognition and verification systems. iVectors capture many acoustic aspects of a speaker's voice in a compact form, making them useful for automatically estimating speech intelligibility in people…

Category

Search results