Twitter A11y: A Browser Extension to Make Twitter Images Accessible
Cole Gleason, Amy Pavel, Emma McCamey, Christina Low, Patrick Carrington, Kris M. Kitani, Jeffrey P. Bigham · 2020 · CHI Conference on Human Factors in Computing Systems · doi:10.1145/3313831.3376728
Summary
This paper presents Twitter A11y, a browser extension designed to address the widespread lack of alternative text on images posted to Twitter. The authors note that while around 12% of Twitter content consists of images, only 0.1% of those images include user-provided alt text, creating a significant accessibility barrier for people who are blind or have low vision. Twitter A11y combines six different methods for generating or retrieving alt text: URL following (extracting descriptions from linked external pages), text recognition via OCR (for screenshot-style images common on Twitter), tweet matching (identifying screenshots of other tweets and retrieving the original text), scene description using computer vision APIs, reverse image search via Caption Crawler (finding existing alt text for the same image elsewhere on the web), and crowdsourcing through Amazon Mechanical Turk as a fallback. The extension operates as a client-server system where the browser extension detects images in a user's timeline and sends requests to a backend server, which tries each method in sequence, returning a result as soon as one succeeds. The system prioritizes faster and cheaper methods first, with crowdsourcing reserved for images that no automatic method can handle. The researchers evaluated coverage and quality through a static analysis of timelines from 50 self-identified blind Twitter users, sampling 1,198 images, and conducted a week-long deployment study with 10 participants who have visual impairments.
Key findings
Twitter A11y increased alt text coverage from 7.6% to 78.5% using automatic methods alone, with crowdsourcing available for the remaining 21.5%. Scene description covered the most images (67.3%), while text recognition and Caption Crawler each covered about 19%. In quality ratings, crowdsourcing produced the highest percentage of "Great" descriptions (62.5%), followed by text recognition (44.9% "Great") and original author-provided alt text (50.0% "Great"). Scene description produced mostly "Good" quality (53.1%) but only 14.3% "Great." Overall, an estimated 57.5% of automatically generated descriptions were rated "Good" or "Great." During the week-long deployment with 10 blind participants, Twitter A11y provided descriptions for 82.4% of images encountered, with crowdsourcing covering the remainder. Participants perceived image accessibility jumping from 12.1% to 72.3%. Participants preferred text recognition and scene description for their speed (~2.5 seconds) and descriptiveness, while crowdsourcing was valued for accuracy but criticized for slow response times (~2 minutes). Participants expressed strong interest in combining multiple methods and having the tool integrated into their preferred third-party Twitter clients.
Relevance
This research demonstrates that a combination of existing automatic and human-powered methods can dramatically improve social media image accessibility without requiring action from content posters. The finding that only 0.1% of Twitter images had alt text at the time of the study underscores how platform design failures place the burden of accessibility on users with disabilities. The paper makes a practical case that platforms should integrate text recognition and scene description by default, as these methods are fast, inexpensive, and produce acceptable quality descriptions. For accessibility practitioners, the study highlights that the problem of missing alt text on social media is solvable with current technology, and that users with visual impairments strongly prefer imperfect automated descriptions over no descriptions at all. The work also raises important questions about who should bear the cost of making user-generated content accessible — the study authors argue that platforms, not end users, must take responsibility.
Tags: social media accessibility · alternative text · screen readers · image accessibility · optical character recognition · crowdsourcing · browser extensions · blindness · computer vision