PDF Readability Enhancement on Mobile Devices

Zachary Shelton, Chen-Hsiang Yu · 2020 · Proceedings of the 17th International Web for All Conference (W4A) · doi:10.1145/3371300.3383352

Summary

This paper presents PDFroggy, an Android application that applies content transformation techniques to PDF documents on mobile devices to enhance readability. The research addresses a specific gap: while previous work has demonstrated that content transformation methods like Visual-Syntactic Text Formatting (VSTF) and Jenga format can improve reading comprehension for non-native English readers on web pages and paper documents, these techniques had not been applied to PDF files. PDFs present a unique challenge because they use fixed layouts where elements remain at the same position regardless of screen size or platform — making them particularly problematic on mobile devices with small screens. The system works by parsing a PDF using Apache PDFBox (ported to Android), extracting document structure from character-level data through a custom algorithm that reconstructs paragraph, sentence, and word boundaries, then rendering the content using native Android TextView components. The content transformation method is abstracted into a separate module, allowing different transformation approaches to be plugged in. The authors demonstrate the system using Jenga format, which restructures text presentation to aid comprehension by visually grouping syntactic units, but envision the framework supporting multiple transformation methods tailored to different user populations.

Key findings

The research successfully demonstrated that content transformation can be applied to fixed-layout PDF documents on mobile devices — a first in the literature according to the authors. The custom algorithm extracts document structure in two passes: the first iteration builds a hierarchy of Document, Page, Paragraph, Sentence, and Word objects from character-level data using punctuation detection and position analysis; the second iteration refines this into sub-sentence structures (LineSplices) suitable for transformation. The system successfully applied Jenga format transformations at both sentence and paragraph levels. However, the current implementation has notable limitations: it works best with text-heavy single-column documents and cannot yet handle multi-column layouts (common in academic papers), embedded multimedia, tables, footnotes, headers, or footers. PDFBox does not provide sufficient information to reconstruct multi-column documents. Additionally, embedded fonts not available on the mobile OS default to system fonts, and there were unexplained performance lag spikes from PDFBox processing.

Relevance

This research is relevant to the broader challenge of making PDF documents — one of the web's most common document formats — accessible to diverse populations on the devices they actually use. With mobile internet subscribers growing by 1.7 billion between 2017 and 2025, and over half of web content in English, the intersection of PDF readability and mobile access is increasingly important. The framework's pluggable transformation architecture means it could potentially support different accessibility needs: VSTF for children, Jenga format for non-native English readers, and potentially other transformations for people with dyslexia or cognitive disabilities. The work highlights a persistent tension in document accessibility — PDF's fixed-layout design preserves visual fidelity but actively undermines adaptability and accessibility, especially on mobile screens. While PDFroggy is a prototype with significant limitations (single-column only, no multimedia support, standalone app rather than browser integration), it establishes a technical foundation for making PDF content dynamically transformable on mobile devices rather than treating PDFs as static, take-it-or-leave-it artifacts.

Tags: PDF accessibility · readability · mobile accessibility · content transformation · document accessibility · dyslexia · ESL readers