Proving the validity and accessibility of dynamic web-pages

R. G. Stone, J. Dhiensa · 2004 · Proceedings of the 2004 International Cross-Disciplinary Workshop on Web Accessibility (W4A) · doi:10.1145/990657.990665

Summary

This paper from Loughborough University proposes a formal verification approach to proving that dynamically generated web pages will always produce valid and accessible output, regardless of which code path executes at runtime. The authors draw a direct analogy to program correctness proofs in computer science: testing a dynamic page by checking one output is like testing a program with one input — passing the test provides no guarantee about other runs. The paper addresses this by introducing the concept of a "generalized output" — a representation using regular expression notation that captures all possible outputs a server-side script (PHP, ASP) can produce. Conditional statements (if/else) map to choice meta-tags, while loops map to list meta-tags representing zero-or-more or one-or-more repetitions. The key insight is that the control flow structures in a script (conditionals, loops) correspond to the structural operators in a Document Type Definition (alternation, optionality, repetition), making it possible to verify that every possible output path produces valid markup. The authors built a prototype using YACC/LEX parser generator tools, creating a modified PHP parser that emits generalized output and extended XHTML/WML validators that accept meta-tagged expressions.

Key findings

The paper demonstrates the approach through concrete examples. A PHP script with three conditional branches (admin user, office hours, default) each outputting an image tag is transformed into a generalized output containing all three alternatives wrapped in CHOICES meta-tags. A validator can then check that every branch produces valid markup and that each image has an alt attribute — providing a once-and-for-all guarantee rather than a per-execution check. The authors also identify cases where proofs require algebraic simplification: sequences like pp*, p*p, pp+, or p+p can all be rewritten as p+ to match DTD requirements. They acknowledge a key limitation: while loops generate zero-or-more (LIST0) output, but the DTD may require one-or-more (p+), creating an unprovable case unless more sophisticated analysis of the loop condition is performed. The biggest practical obstacle to scalability is extracting generalized output from full scripting languages where tags may be constructed through string concatenation or function calls rather than appearing as literal constants in echo statements.

Relevance

This paper is notable as one of the earliest attempts to apply formal methods to web accessibility verification. While the specific PHP/DTD context feels dated, the underlying problem remains highly relevant: modern web applications generate HTML dynamically through templating engines, component frameworks, and server-side rendering, and accessibility testing still predominantly relies on checking individual page instances rather than proving properties across all possible states. The concept of a "generalized page" was subsequently built upon by Freire and Fortes (2005) for XSLT-based systems. The paper anticipated modern interests in static analysis for accessibility — tools that analyse source code or component definitions rather than rendered output. For practitioners, the paper underscores that dynamic content generation is a significant blind spot in accessibility testing: any content that varies based on user state, data, or conditions needs testing across those variations, not just one snapshot.

Tags: automated testing · dynamic web pages · formal verification · PHP · server-side scripting · web accessibility evaluation

Standards referenced: WCAG 1.0