Why Documents Are 80% of Higher-Ed Accessibility Work
Here's the conversation I've had more times than any other.
An institution runs a web scan, gets a few thousand findings, and builds a remediation plan around them. Six months in, someone finally asks about the 40,000 PDFs in the LMS. The room goes quiet. The plan was built around the 20% of the problem the scanner could see.
I spent 21 years inside higher education, including a long run at Quinsigamond Community College, and the pattern never changed: documents are roughly 80% of the remediation surface, and they're the part nobody scoped.
Why documents dominate the math
Walk through where content actually lives at a college:
- A four-year institution has tens of thousands of course documents — syllabi, lecture slides, problem sets, scanned readings, lab handouts
- Every one was created by a different person, in a different tool, to a different standard
- New ones are produced every single term, faster than any backlog gets cleared
- Most are PDFs and PowerPoint files, which is exactly what automated web scanners are worst at evaluating
The marketing website is a few hundred pages on a handful of templates. Fix the template, fix the pages. Documents have no template. Each one is its own small remediation job, and the pile grows every semester.
That's the asymmetry. Web is bounded and template-driven. Documents are unbounded and human-driven.
Why the scanner makes it worse
A web accessibility scanner is good at what it does — and what it does is crawl HTML. It will tell you about color contrast and missing alt text on pages it can reach.
It will not meaningfully tell you that:
- A 200-page scanned course reader is an image with no text layer at all
- A faculty slide deck has reading-order problems no automated tool flags
- A fillable PDF form can't be completed with a keyboard
So the scanner produces a confident report about 20% of the problem, the plan gets built around that report, and the 80% stays invisible until someone trips over it. The tool didn't lie. It just answered a smaller question than the one you were actually asking.
The danger isn't that scanners are wrong. It's that they're precise about the small part and silent about the large part — and precision reads as completeness.
What this means for your plan
If documents are 80% of the work, then a compliance plan that spends most of its timeline on the website is sequenced backward, no matter how good it looks.
Three practical moves:
- Scope documents explicitly. Estimate the volume — LMS exports, shared drives, the public site. A rough count of 30,000 vs. 3,000 changes the entire plan.
- Triage by use, not by age. The reading assigned to 400 students this term outranks a 2014 committee memo nobody opens. Fix what's in front of students now.
- Stop the inflow. Remediating the backlog while faculty add 5,000 new inaccessible files a term is bailing a boat without patching it. Training the people who create documents is not optional — it's the patch.
Where institutions should actually start
Not with the backlog. Start with the documents students are required to use this term in required courses — that's the highest-harm, highest-reach slice, and it's a finite, fundable first phase. It also produces the thing budget committees respond to: a visible win for real students before the Title II date, not a percentage on a dashboard.
Then work outward — high-enrollment courses, then the long tail — while training closes the tap so the pile stops growing.
The short version
Your scanner reports on the 20% it can see. Documents are the other 80%, they have no template, and they regenerate every semester. Any plan that doesn't scope documents first isn't conservative — it's just incomplete with good production values. Count the documents before you trust the dashboard.