Lived User Testing

Two lines of work, related but separable: a CNIB-owned production tool that turns lived-experience tester recordings into structured, WCAG-mapped accessibility reports; and a Bob-owned open-source companion line, currently developing a vision-AI analysis path using open-weights models on think-aloud recordings. Audio-and-video analysis of real sessions — the territory commercial automated scanners do not reach.

Dictaphone (CNIB-owned)

At CNIB Access Labs I lead development of Dictaphone (codenamed pythonAudioA11y), an audio-and-video accessibility analyser that takes MP4 recordings of lived-experience and audit sessions and produces structured, WCAG-mapped reports with time-indexed callouts linked to the source video. The tool has been in production at CNIB for over two years and has been presented at a11yTO. Dictaphone is a CNIB product; the underlying intellectual approach is mine. Practice, not portfolio.

The tagline from the public deck: AI-Powered Accessibility Analysis from Recordings — Transcription | Speaker Identification | WCAG Analysis | Integrated with Auto A11y. Closing slide: From Recording to Report.

The five-stage pipeline

  1. Video Input— MP4 recordings from lived-experience or audit sessions.
  2. Audio Extraction— FFmpeg splits the audio at natural silence points (not fixed intervals). Roughly 10-minute segments preserve speaker context; 44.1 kHz quality is maintained throughout.
  3. Transcription— Deepgram Nova-2 with speaker diarisation and word-level timestamp precision.
  4. Speaker Identification— pyannote.audio voice embeddings plus ML clustering for consistent speaker identity across long recordings. Cross-segment speaker remapping is non-trivial audio ML; most accessibility tools don’t touch this territory at all.
  5. AI Analysis— Claude with WCAG heuristics and context-aware prompts; extended-context support up to 1M tokens for long sessions; prompt caching for cost optimisation.

The five-stage pipeline is publicly named in the CNIB Access Labs commercial deck; the description here uses that same vocabulary at the same abstraction level.

Three productised analysis contexts

Each recording can be analysed under one of three first-class contexts in the production tool:

  • Audit— WCAG compliance focus, structured issue reporting with success-criteria mapping, designed for professional accessibility auditors.
  • Lived Experience— user impact and pain points from the disabled-user perspective. Extracts user quotes, assertions, and key takeaways; designed for lived-experience testing programmes.
  • NaviLens— wayfinding and QR navigation. Specialised detection of QR-code navigation issues; physical environment and signage accessibility; designed for indoor and outdoor navigation testing.

The NaviLens context is the public-deck evidence that CNIB Access Labs has built specific tooling for QR-code navigation evaluation — not just observed the product from outside. That anchors the related Navilens framing on /maps in actual productised evaluation capability.

Four-category structured output

Each recording produces four distinct kinds of evidence, kept separate rather than collapsed into a single “findings” bucket:

  • Key Takeaways— analyst-narrative top-level findings.
  • User Pain Points— discrete moments of friction with severity ratings.
  • User Assertions— direct statements and observations from testers, with full quotes.
  • Accessibility Issues— WCAG-mapped with remediation guidance and timecoded video references.

The four-way split is its own design move. Most tools collapse what to tell stakeholders and where the user struggled and what the user said and what is structurally wrong into one undifferentiated stream. Keeping them apart makes the report usable as four different kinds of deliverable for four different audiences.

Per-issue fields, outputs, captions

Per issue: title, description, what, why, who, how-to-fix, WCAG 2.2 success-criterion mapping, impact (Low / Medium / High), and precise timecodes linking back into the source video. Built-in heuristics for screen reader, screen magnifier, keyboard navigation, and assistive-technology compatibility.

Outputs: JSON (machine-consumable), HTML ( human-review, bilingual), VTT captions, and an optional enhanced video with callouts overlaid at precise moments and chapter markers. Callouts are positioned with word-level timing for pixel-perfect sync; chapter markers are compatible with VLC, QuickTime, and web players. The companion video editor keeps editing non-destructive.

Bilingual-native

The analysis is performed natively in each language — not machine-translated after the fact. Three modes: English-only, French-only, or both in parallel (the default). Distinct per-language output files (issues.fr.json, issues.fr.html, etc.). Canadian-government-grade discipline at the data-model level, not as a post-hoc translation pass.

The capability gap this addresses

No commercial automated accessibility tool currently analyses recorded user behaviour against the screen being recorded. axe handles static HTML. Lighthouse handles runtime DOM. LLM scanners increasingly handle code and markup. None of them analyse the interaction between user and interface across time, with the user’s own spoken commentary as evidence.

That territory has been human-led usability research; bringing AI assistance to it has research-grade significance even at present-day tooling maturity. Dictaphone makes the analysis tractable at scale — the kind of audit that would take a human auditor a day per recording can be drafted by the tool in minutes, then reviewed and corrected by the auditor in considerably less than a day.

Important workflow note

The output is always manually checked. Humans take responsibility for conformance statements; machines do not. The automation accelerates the review process; it does not replace the reviewer. This is the right framing for AI-assisted accessibility audit work and is worth being explicit about whenever the tool is mentioned.

Where this sits relative to the other tools

Dictaphone is integrated with autoA11y— CNIB’s commercial accessibility-testing platform. The recordings dashboard, the WCAG Issues view filterable by level, and the lived-experience results all sit alongside automated and manual findings in a single unified report. Together the tools cover what the practice describes publicly as the Three Pillars: automated testing, manual inspection, lived experience.

Dictaphone is the lived-experience pillar made tractable at scale. autoA11y is the automated pillar, productised. Manual inspection remains human-led. The three pillars run as one integrated audit pipeline rather than as three separate workstreams.

Bob-owned: the vision-AI line

A Bob-owned proof-of-concept in development that adds Qwen 3.5(Alibaba’s vision-language model, accessed via API) to the audio-and-video pipeline. Qwen is used exclusively for the vision side: analysis of screen recordings of user interaction. It works as the video/vision counterpart to what Claude does for audio in the same pipeline — two specialised models for two modalities, not alternatives to each other.

Targeting two classes of issue that vision-on-video can reach: visual accessibility problems a sighted analyst would normally catch from watching, and user-interaction problems (hesitation, abandonment, target misidentification — the usability dimension of accessibility audits). Licence: GPL-3.0, matching the rest of the Bob-owned tooling.

The home-version source lives at bobdodd/lived-user-testing. The repo is currently a placeholder — content will appear there as the home and production versions diverge sufficiently to be safely separable.

Reading on

  • Carnforth — runtime DOM testing; the open-source companion to the Bob-owned tooling line.
  • automated-testing— AI-driven text-and-HTML testing PoCs.
  • Paradise— source-level multi-model analysis.
  • /work— the CNIB Access Labs framing for the production tooling.