Toxicity detection

Also known as: Content toxicity scoring, Toxic speech detection

An NLP-based content moderation technique that assigns scores to text indicating the likelihood it is rude, disrespectful, or likely to make someone leave a conversation. Research has shown that toxicity detection models encode disability bias, scoring innocuous sentences that mention disability (e.g., "I am a person with mental illness") as significantly more toxic than equivalent sentences without disability references, potentially suppressing legitimate disability-related discourse online.

Category: artificial intelligence · natural language processing · content moderation

Related: Algorithmic bias · Ableism · Stigma

Sources

https://doi.org/10.1145/3386296.3386305
https://perspectiveapi.com/