AutoChemplete - Making Chemical Structural Formulas Accessible

Merlin Knaeble, Gabriel Sailer, Zihan Chen, Thorsten Schwarz, Kailun Yang, Mario Nadj, Rainer Stiefelhagen, Alexander Maedche · 2023 · Proceedings of the 20th International Web for All Conference (W4A) · doi:10.1145/3587281.3587293

Summary

This paper presents AutoChemplete, an interactive labeling tool that makes chemical structural formulas accessible to blind and low-vision (BLV) students. Chemical structural formulas — visual diagrams showing atoms, bonds, and molecular geometry — are fundamental to studying chemistry but are inherently inaccessible as bitmap images in PDF textbooks and papers. The current process of annotating these images requires expert chemistry knowledge, is time-consuming (accessibility centers annotate 5-10 documents per month), and frustrating. AutoChemplete combines machine learning with human-in-the-loop correction using an autocomplete paradigm. The tool ingests a bitmap image of a structural formula and uses an ensemble of five encoder-decoder ML models (combining EfficientNet encoders with LSTM or Transformer decoders) to predict a SMILES string — a linear text notation representing molecular structure. It then performs a similarity search in the PubChem database using Tanimoto similarity on molecular fingerprints, presenting the top four matching molecules as suggestions. Users can accept a suggestion directly, open it in a built-in chemical editor (based on the open-source kekule.js), or manually correct the prediction. From the confirmed SMILES string, AutoChemplete generates multiple accessible output formats: IUPAC names, colloquial names, sum formulas, configurable vector graphics (with Latin or Braille lettering), and formats for tactile display or embossed paper.

Key findings

The ML model achieves 83.36% exact match accuracy on SMILES prediction, surpassing prior work (67-79% and 77-83%). When combined with the similarity search, the correct molecule appears in the top four suggestions for 93.76% of all molecules in the test set, with a median placement of 1 for correctly predicted molecules. In think-aloud sessions with 15 participants of varying chemistry expertise (from a chemistry professor to a law school student), participants achieved approximately 97% correct annotations. The majority of molecules (57%) were annotated by simply accepting a suggestion, while 43% involved the editor in some capacity. Crucially, the tool supports users of all expertise levels simultaneously: experts use names and substructure recognition to quickly identify molecules, while novices rely on visual side-by-side comparison of suggestions. Participants across expertise levels found AutoChemplete entertaining — describing it as "a puzzle," "a game," and "fun" — transforming what was previously considered tedious annotation work into an engaging activity. Interviews with three BLV chemists and four accessibility professionals revealed five key requirements: exact annotations (100% accuracy needed for exam materials), reasonable speed, skill support for non-expert annotators, multiple output formats, and integration with existing document accessibility workflows.

Relevance

This research addresses a critical barrier to STEM education for BLV students. Among US students with vision impairments, three-quarters are more than a full grade level behind their sighted peers in mathematics, and while 69% show interest in STEM during high school, only 8% pursue related college degrees — largely attributed to inaccessible learning materials. AutoChemplete demonstrates a generalizable human-AI interaction pattern — imperfect model plus solution space search plus human intervention — that could extend to other STEM accessibility challenges like physics diagrams, mathematical plots, and engineering schematics. For accessibility practitioners, the tool offers a practical way to dramatically reduce the expertise barrier for annotating chemistry content, enabling student assistants without chemistry backgrounds to produce accurate accessible materials. The open-source release on GitHub makes it immediately available for accessibility centers. The finding that BLV chemists need individualized representations — not a one-size-fits-all solution — reinforces the importance of generating multiple output modalities from a single annotation.

Tags: STEM accessibility · blind and low vision · chemistry · machine learning · interactive labeling · document accessibility · alt text · tactile graphics