← All reviews

Uncovering the New Accessibility Crisis in Scholarly PDFs: Publishing Model and Platform Changes Contribute to Declining Scholarly Document Accessibility in the Last Decade

Anukriti Kumar, Lucy Lu Wang · 2024 · ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility · doi:10.1145/3663548.3675634

Summary

This paper presents a large-scale analysis of 20,000 scholarly PDFs published between 2014 and 2023 to characterise the scope and trajectory of document accessibility in academic publishing. Most scholarly works are distributed as PDFs, a format that can present severe accessibility barriers for blind and low-vision readers who depend on screen readers. The authors assessed accessibility compliance against six criteria derived from PDF/UA standards and the SIGACCESS Guide for Accessible PDFs: Default Language (specifying the document language for screen reader pronunciation), Appropriate Nesting (correct heading hierarchy), Tagged PDF (semantic structure tags enabling screen reader navigation), Table Headers (marked header cells in data tables), Tab Order (logical reading order matching visual layout), and Alt-Text (alternative descriptions for images and figures). The sample spanned both open-access and closed-access papers across broad fields including computer science, medicine, biology, engineering, social sciences, and humanities. The methodology combined automated accessibility checking using Adobe Acrobat's accessibility checker and PAC (PDF Accessibility Checker), manual evaluation of alt text quality (not just presence but meaningfulness), and manual screen reader testing with JAWS and NVDA to validate automated findings. This multi-method approach addressed known limitations of automated-only testing, which can miss quality issues like auto-generated meaningless alt text.

Key findings

The results revealed a stark accessibility crisis: only 3.2% of tested PDFs satisfied all six criteria, while 74.9% failed to meet any criteria at all. Most alarmingly, the study documented a significant decline in PDF accessibility since 2019, reversing modest improvements seen in earlier years. This decline was concentrated among open-access papers, suggesting that the rapid growth of open access publishing — while democratising access to scholarship generally — has inadvertently worsened accessibility for disabled readers. The primary driver was identified as platform and tool changes: the shift from traditional publisher-typeset PDFs (often produced with Adobe InDesign, which supports tagged PDF output) to author-produced PDFs generated through LaTeX and preprint platforms like arXiv, which typically produce untagged PDFs with no semantic structure. LaTeX-generated PDFs were consistently the worst performers across all criteria — the standard LaTeX toolchain does not produce tagged PDFs by default, and the available accessibility packages are immature and rarely used. Publisher-typeset PDFs from major publishers like Springer and Elsevier performed better on tagging but still had very low alt text rates (under 2% of figures had meaningful alt text across the entire dataset). Computer science papers performed worst overall, likely due to heavy LaTeX use and the prevalence of arXiv preprints. Alt text was virtually nonexistent — even among the small percentage of PDFs that were tagged, fewer than 1% contained meaningful image descriptions. The manual screen reader evaluation confirmed that automated checker results aligned with real-world accessibility experience, with untagged PDFs being essentially unusable via screen reader.

Relevance

This research documents what the authors call a "new accessibility crisis" in scholarly publishing — one that is getting worse, not better, despite growing awareness of digital accessibility. The findings have immediate implications for multiple stakeholders. For publishers, the data makes clear that accessibility must be built into production workflows, not treated as an optional afterthought — publisher toolchains need to produce tagged, accessible PDFs by default. For the LaTeX community, the lack of accessible PDF output is a critical gap that affects millions of scholarly documents; investment in LaTeX accessibility tooling (such as the tagpdf project) is urgent. For conference organisers and journals (including ASSETS itself), requiring accessible camera-ready submissions could drive change but requires providing authors with usable tools and guidance. For the open access movement, the finding that open access has inadvertently harmed accessibility creates an ironic equity tension — papers made freely available to all are simultaneously made inaccessible to blind researchers. For accessibility practitioners more broadly, the paper demonstrates the value of large-scale quantitative accessibility auditing to identify systemic trends that individual evaluations miss, and the importance of combining automated and manual testing methods.

Tags: PDF accessibility · scholarly publishing · document accessibility · screen readers · tagged PDF · alt text · open access · automated testing · large-scale analysis

Standards referenced: PDF/UA · WCAG 2.1