Improving our website's accessibility with static analysis + CI

Author: Matt Wang
Date Published: 06 August 2024

Over the past year (and … mostly right before this post was published), I’ve been working with our wonderful website chair Yihong to make small web accessibility improvements to the PLSE website. I wanted to spend a bit of time writing up what we’ve done for a couple of reasons, which will coincidentally also sketch out the structure of this post:

serve as a record for other folks going through similar first steps (maybe you!)
talk some nuts and bolts of semantic HTML & web accessibility basics
write down some future goals for us (which I’ll continue to work on)

To be upfront: I am not claiming that the PLSE website is “perfectly accessible” (this is probably not possible) nor that we’re done with our accessibility work — far from it! Rather, I think this is a good place to get started — especially as this sort of (static analysis-style) work is more familiar to software folks. It also builds momentum for larger changes.

I am also not claiming the expert on web accessibility. I’m constantly in awe of the experts that we do have in at UW (and UW CSE) — including folks like Jen Mankoff, Richard Ladner, and Jacob Wobbrock, among many others (frequently through groups like AccessComputing, CREATE, and DO-IT). You should check out their work!

This post is primarily targeted towards folks who have some understanding of HTML & CSS but not much (or any) experience implementing web accessibility. My hope is that you’ll learn a bit more about web standards and accessibility. Folks who have quite a bit of experience with accessibility work may get more out of the last few sections of this post (under Future Work).

Motivation & Scope

Web accessibility is important — both to us at PLSE, and more broadly as computer scientists, public employees, and humans. But, we’re historically awful at web accessibility: easily over 95% of the top million most-visited web pages have at least one core accessibility failure (“The WebAim Million”, 2024). The web should be more accessible, and we as developers should do better.

From my perspective, a few things hinder small web development groups from making their websites more accessible:

many website editors (developers or otherwise) are unfamiliar with accessibility; it is rarely a “first-class” concern in web design, development, or content writing.
other editors broadly know it’s important, but do not know how to action on it.
even fewer editors are familiar with auditing web accessibility (whether it be automated testing or manually using a screenreader, keyboard navigation, etc.).
when accessibility work is done, it’s usually done after-the-fact, which leads to a large backlog; if it’s too large to resolve in one go, it usually languishes.

I don’t have a silver bullet to fix all of these issues. However, I do have some low-hanging fruit: add accessibility tests to your continuous integration setup. And, most importantly, do not let people push updates that introduce an error! I argue that this addresses each of the concerns (though I’ll leave the details as an exercise for the reader).

There’s a vast array of ways to approach this, but the simplest starting point is picking a subset of accessibility guidelines that can be statically evaluated just by looking at HTML (and not simulating a browser). This still catches a sizeable chunk of common web accessibility issues, such as missing alternative text for images, inaccessible link text, and incorrectly structured markup. It also has the side effect of making sure that your website conforms with the HTML spec (which is harder to ensure than it sounds).

Importantly, this is also a reasonably-scoped problem that others have worked on! Auditing accessibility can be quite challenging; not having to reinvent the wheel (and relying on others’ domain-expertise) means that we can focus on making accessibility improvements on the margin. It also means that we aren’t maintaining a bespoke system, which is particularly relevant given how quickly web standards evolve and the fundamentally cyclical nature of academic groups.

I want to stress that this approach is not complete. Most accessibility guidelines are contextual in nature: the best alternative text for an image depends on what it’s being used for (decoratively, as an icon, as a diagram, or the subject of design critique) just as much as the content of the image. Validating some guidelines requires implementing a CSS and layout engine (e.g. color contrast) and thus falls out of the umbrella of static analysis. Practical limitations (and a high emphasis on avoiding false positives) means that these tools tend to be conservative.

Given that we’re a small team and that this is a first attempt, we also tightly limited our scope to changes that do not affect the visual look of the site (at a pixel-by-pixel/image diff level). This is very restrictive! We’ll talk about other changes in next steps.

What We Did

In short, we:

ran the Nu HTML Validator against the PLSE website
fixed (almost) all of the errors that Nu caught, most of them manually
in addition, fixed some of the errors we came across while touching basically all the files in the codebase
audit our changes manually with a screenreader (I use VoiceOver on macOS)
now, run the validator as part of our CI pipeline on every push

We fixed somewhere in the ballpark of 100-1000 errors (depending on how you count). Generally speaking, the issues we found were under a few broad categories:

missing accessible text — most commonly alt text for images, but also links with no screenreader-accessible text, icons, etc.
incorrect use of headings (i.e. h1-h6), section, and other “semantic” HTML elements and/or markup
syntactically-invalid HTML that does not cause an accessibility issue (usually, because browsers tend to be very forgiving and/or lenient)
typos! (with functional impacts on the site)

Future pushes to the website will always trigger the validator. Authors are now automatically notified if they push inaccessible code — and we can prevent folks from pushing inaccessible changes!

Some Examples

This is a non-exhaustive list of changes we made that were flagged by the validator or manual testing. Most of these are relatively surface-level and only require a passing understanding of HTML.

`alt` text (and why it’s hard)

A good chunk of images didn’t have alt text. For example, here’s an e-graph (from The Theoretical Aspect of Equality Saturation (Part II)), including some of the preceding text:

Ruleset 8
\[\{ h(f(x), y) \rightarrow h(x, g(y)), h(x, g(y)) \rightarrow h(x, y), f(x) \rightarrow x \}\]
[…] Consider the following e-graph containing term \(\{ g(a, b)\}\) and identity \(f(a)\approx a\).

An e-graph with three e-classes: the parent h, its left-child a = f(a), and its right-child g(b).

Without an annotation, a blind or low-vision reader using a screenreader would encounter “image”. Not very helpful! Instead, it’s on us to add alternative text (or, “alt text”) that the screenreader will dictate. This also has other benefits (including acting as a placeholder if the image doesn’t load properly).

Writing good alt text is quite challenging! You might be tempted to write something like:

This is an image diagram of an e-graph, a graph data structure that represents equivalence relations between terms in a language. This e-graph has three broad equivalence-classes (or e-classes): a top parent with just h, a left child with f and a, and a right child with g(b). The h parent has an arrow that points to each of the children. The f in the left child also points to itself. The e-classes are represented with a yellow box and dotted outlines; each term is a white box with solid outlines.

However, this:

is way too long! Alt text should be concise: a common rule of thumb is that it should be less than 200 characters.
contains a redundant description “this is an image…” — the screenreader already demonstrates this!
contains irrelevant information (does it matter that the e-classes are yellow?)
is potentially redundant with the surrounding text (this is debatable, especially depending on the audience of the post).

To be clear, I don’t know what the perfect alt text is here, especially as I’m not a leading expert on e-graphs (though … many folks here are) and I’m restricted by not changing the layout of the page. After workshopping, I came up with:

An e-graph with three e-classes: the parent {h}, its left-child {a, f(a)}, and its right-child {g(b)}.

From my perspective, this captures the important portions of the figure without being redundant with the surrounding context. In an ideal world, I’d consider adding:

a <figcaption>, which would help all users
a much longer description (closer to the earlier alt text), hidden by default but toggleable by the user

Static analysis can catch <img> tags with no alt text (and the PLSE website now has no non-null alt text). However, it can’t identify “good” alt text versus bad alt text — that’s too context-dependent. While I wrote some preliminary alt text for all images, they certainly can be improved with more refinement. Later, I’ll touch on some alternatives to simple alt text for graphs.

I’ll also just mention: a common misconception is that every image must have non-empty alt text. That’s not true! If an image is purely decorative and/or provides no extra context to a user, making the alt text empty (via alt="") is a viable practice. WebAIM’s alt text tutorial has a section on decorative images that’s a good primer.

Screenreader-Accessible Text: Links

While images are a common example of the necessity of screenreader-accessible text, they’re not the only example. Screenreaders form a structured list of all links on a page. A common navigation pattern is to cycle through links present on a page (or <section>, etc.); the screenreader reads text content of a link. Ideally, you want the link text alone to provide sufficient context on where the link takes the user.

However, many content anti-patterns create inaccessible link text, including:

links of the form “here”, “click here”, “read more”, “more”
multiple links with the same text, but different URLs (often in post lists, project lists, etc.)
links with only an image or icon as a body, with either no alt text or alt text that describes the image itself (e.g. “link icon”)

When I teach web accessibility, I frequently do a live demo of navigating a page with many “click here” links: the screenreader essentially just reads “click here” tens or hundreds of times. It’s incredibly annoying for the user and drives the point home.

Nevertheless, these issues are really common in websites — so much so that I wrote an ESLint plugin rule + blog post on “ambiguous anchor text” when I interned at CZI. And, there were over 100 instances of this on the PLSE website.

In many cases, I’d rewrite the link text itself (since clearer link text helps everybody — not just screenreader users). But, given the limitation of changing as little web content as possible, I instead mostly opted to use screenreader-only text. A common (if hacky) way to do this was popularized by Bootstrap:

.sr-only {
    position: absolute;
    width: 1px;
    height: 1px;
    padding: 0;
    margin: -1px;
    overflow: hidden;
    clip: rect(0, 0, 0, 0);
    border: 0;
}

This essentially creates an element in the DOM that has zero size, but valid content — which a screenreader can then dictate. I use this aggressively for many parts of this blog post (including later sections).

For example, the home page’s Projects section has a link with the text “More”. “More” what? I’ve added an .sr-only annotation clarifying that it’s “More Projects”:

<a href="projects.html">More <span class="sr-only">projects</span> »</a>

Another big source is the “Read More” link under each entry on the blog homepage; I’ve appended the post’s title, making the text unique & more descriptive.

Screenreader-Accessible Text: Labels

Sometimes, key descriptive context is non-textual. For example, consider this “post metadata” component:

Author: Matt Wang
Date Published: 06 August 2024

The icons are key: the pencil hints that the following text is the author, while the calendar hints that the following text is a date (perhaps, the date published). The corresponding markup used to be:

<div>
  <span class="fa fa-pencil" aria-hidden="true"></span>
  <span>Matt Wang</span>
  <span class="fa fa-calendar" aria-hidden="true"></span>
  <time>06 August 2024</time>
</div>

The screenreader-accessible text ends up being:

Matt Wang 06 August 2024

This is not very helpful! Is this the start of a sentence or quote? What does the date represent (date published, last edited, etc.)?

Without changing the visual look of the component, I:

used a description list (<dl>) to represent the metadata, with a clear semantic link between the label title (e.g. “author”) and the label value (e.g. “Matt Wang”)
added screenreader-only text to explain the semantic purpose of the icons
added the datetime attribute to <time>, machine-encoding the datetime so the client can interact with it however they’d like

The code now looks like this:

<dl>
  <dt>
    <span class="sr-only">Author</span>
    <span class="fa fa-pencil" aria-hidden="true"></span>
  </dt>
  <dd>Matt Wang</dd>
  <dt>
    <span class="sr-only">Date Published</span>
    <span class="fa fa-calendar" aria-hidden="true"></span>
  </dt>
  <dd>
    <time datetime="2024-08-06T00:00:00+00:00">
        06 August 2024
    </time>
  </dd>
</dl>

Page Structure: Headings, Sections, and Lists

A commonly misunderstood part of HTML is the purpose of elements like <h1>, <section>, and <ul>. Many beginners use elements like <h1> and <ul> for styling (e.g. “I want big text” or “I want bullet points”). However, the primary purpose of these elements is not the styling associated with them but rather their semantic meaning:

section headings (i.e. <h1> to <h6>) should hierarchically describe sections of a page, regardless of intended styling:
- headings should descend in order and should not be skipped (e.g. a page shouldn’t have an <h1> followed by an <h3>)
- a page should have exactly one <h1> that describes the entire page (though this isn’t strictly a part of the spec)
a <ul> should describe any unordered list of items (regardless of how the list is styled)
a <section> should represent a generic (but not self-contained) section of a page, with a corresponding section element

Other commonly misused tags include <table> (often used for formatting but not describing tabular data), tags used primarily for their style effect (e.g. <small>, <code>), and the admittedly confusing interplay between <b>, <strong>, <em>, and <i>.

These changes are the least exciting to talk about/show but also the ones where I arguably spent the most time. Fixes included:

adding an <h1> to pages without them (and, in following with the no-visual-change constraint, making them .sr-only)
restarting the heading hierarchy to avoid “skipping” headings (and, in following with tne no-visual-change constraint, restyling headings to “borrow” other heading styles)
pairing section grouping tags (e.g. <section>, <article>) with a section heading
converting list items to use <ul> and <li> (but keeping the original styling)
removing accidentally-nested section grouping tags
removing extraneous tags and replacing them with CSS styles

Worked Example: Topic Tags

Here’s a seemingly simple example that ties together a few of the previous sections! Consider the set of topic tags for this post. They come in two flavours:

Coloured:

Plain:

Previously, their markup was:

<!-- old coloured markup -->
<span class="keywords-container">
    <a href="/tag/accessibility">
        <code>accessibility</code>
    </a>
</span>

<!-- old plain markup -->
<small>
    <a href="/tag/website">
        <code>#website</code>
    </a>
</small>

This markup is alright, but there’s a few notable weak areas:

the link text is not terribly descriptive: what is “website” a link to (another website??)
<small> is being used for its styling property, not its semantic meaning (as a “side comment” element); same for <code>
the # is mostly a visual aid (treating a category like a hashtag), but is a non-trivial symbol that gets read out!
the enveloping <span> isn’t necessary

So, let’s try to address those, by:

adding screenreader-only text that provides the minimal context for the link
use CSS for the styles applied by <small> (font-size: smaller) and <code> (font-family: monospace)
hide the # from the screenreader
remove unnecessary <span>s

<!-- new coloured markup -->
<a class="keywords-container" href="/tag/accessibility">
  <span class="sr-only">keyword: </span>
  accessibility
</a>

<!-- new plain markup -->
<a class="post-keyword-plain" href="/tag/website">
  <span class="sr-only">keyword: </span>
  <span aria-hidden="true">#</span>website
</a>

Then, since these tags are actually elements of lists, it would make sense to make them <li> elements:

<ul class="list-unstyled">
  <li>
    <a class="keywords-container" href="/tag/accessibility">
      <span class="sr-only">keyword: </span>
      accessibility
    </a>
  </li>
  <!-- more items... -->
</ul>

Future Work

All of this is just a first step — there’s much more to do.

One category of future work is “do what we did, but better”. This includes:

writing better alternative text — particularly by involving domain experts (usually: the blog post author) and tailoring content to the audience
more generally, writing content with better accessible names (e.g. link text that doesn’t require a .sr-only workaround)
reducing more extraneous markup & better using semantic HTML (e.g. we could better use role)

However, there are also other classes of work that are independent of this static analysis & validation-driven code. Here’s a non-exhaustive sketch!

Color Contrast

Color contrast is one of the more well-known and quantitatively-defined accessibility metrics. The common standard for this is WCAG 2.1’s Contrast Success Criterion — which, among other things, is cited by the ADA.

There are a handful of areas of the website that don’t meet this standard, most notably the use of gray text for affiliation under the homepage’s Ph.D. & Master’s Alumni — though I haven’t done an in-depth audit.

Fixing this is trivially out of scope of this first project (since it changes the look of the website), but it’s a piece of low-hanging fruit that we should work on next. It is a bit harder to audit automatically with CI-style tools, since to truly evaluate this you need to actually render HTML & CSS — but this is also easily auditable with tools like WAVE or browser dev tools (e.g. Chrome DevTools’ contrast tool).

Interestingly, there’s a lot of discussion on the specific metric that defines color contrast. In short, WCAG 2.x’s contrast metric doesn’t take into account human perception of colors (most notably: luminance) and can lead to unintuitive results. One popular potential replacement is the Advanced Perceptual Contrast Algorithm (APCA), which attempts to take into account human perception of contrast. At this moment, it’s unclear if it will become the WCAG 3.x standard (there is active debate in the standards committee).

Copyable Code

Many of our blog posts (including this one!) include code snippets:

<!-- new coloured markup -->
<a class="keywords-container" href="/tag/accessibility">
  <span class="sr-only">keyword: </span>
  accessibility
</a>

Under the hood, our static site generator (Jekyll) uses Rouge to produce the nice code highlighting. However, this has a side effect of polluting the markup. Here’s the HTML for the above highlighted snippet:

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">&lt;!-- new coloured markup --&gt;</span>
<span class="nt">&lt;a</span> <span class="na">class=</span><span class="s">"keywords-container"</span> <span class="na">href=</span><span class="s">"/tag/accessibility"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;span</span> <span class="na">class=</span><span class="s">"sr-only"</span><span class="nt">&gt;</span>keyword: <span class="nt">&lt;/span&gt;</span>
  accessibility
<span class="nt">&lt;/a&gt;</span>
</code></pre></div></div>

Tough to read! Screenreaders can interact with this text (with some potential minor hiccups); however, IDEs and plugins designed for blind & low-vision developers can provide better affordances to interact with this code (e.g. navigating via AST or using an LSP). For an example, take a look at recent UW CSE PhD grad Venkatesh Potluri’s work on CodeTalk¹.

Reimplementing an IDE on a website seems like overkill (and non-customizable). Instead, we could add a “copy code” button to each code snippet. This:

lets blind & low-vision users engage with the code using their editor of choice, without dealing with code highlighting artifacts & escaped characters in HTML
helps all users — this is just a helpful feature to have!

Adding this is a bit more involved, since it breaks our constraint on not changing the layout of any page and requires some Ruby code (since we’re hooking into the markdown to HTML conversion part of the static site generator). But, I’d like to get around to this eventually!

Open-Ended: Accessible Figures and Visualizations

Most of our blog posts use MathJax for typesetting; for example, here’s a classic definition of the Y Combinator:

\[Y = \lambda f. \ (\lambda x.f\ (x\ x))\ (\lambda x.f\ (x\ x))\]

From an accessibility perspective, this is much better than a screenshot of some LaTeX; the folks who work on MathJax have put a lot of work into accessibility! Screenreaders can interact with the above equation: when they first tab over it, the screenreader will dictate the equation left-to-right, including reading the \(\lambda\) (though the parentheses get tricky to keep track of). The screenreader can also process the math term-by-term and in larger units (e.g. by sub-expression), and there are many accessibility options (try right-clicking the above equation).

Unfortunately, anything that’s more than just an equation (or a table) becomes much more challenging. For example, let’s revisit this e-graph (from The Theoretical Aspect of Equality Saturation (Part II)):

An e-graph with three e-classes: the parent h, its left-child a = f(a), and its right-child g(b).

The new alt text of

An e-graph with three e-classes: the parent {h}, its left-child {a, f(a)}, and its right-child {g(b)}.

is better than nothing, but it certainly doesn’t capture the nuance of the e-graph. It also doesn’t have a sense of abstraction — I’m not able to “list all e-classes”, or follow a specific derivation, etc.

There are various tools that render graphviz, tikz, and related libraries on the web via Canvas or SVG. But this doesn’t solve the problem: these diagrams aren’t “accessible by default” (i.e. do not come with accessible text), and by default don’t have the depth that MathJAX has (nor is there a standard with the same level of traction as MathML).

One common suggestion is Mermaid, a popular library that’s now used in GitHub’s markdown renderer. However, their accessibility features are quite light — as of writing, they only support basic descriptive text (e.g. aria-roledescription), but no interactivity. D3 provides more control, but you need to annotate elements individually — which can also be overwhelming for users.

Accessible visualizations and diagrams are still an open problem in the HCI research space. Some folks at UW CSE have done some great work in this field! Ather Sharif (advised by Jacob Wobbrock) has a long line of work in this area — a CHI ‘22 paper ² introduces VoxLens, a JavaScript plugin that, in their words:

[…] enables screen-reader users to obtain a holistic summary of presented information, play sonified versions of the data, and interact with visualizations in a “drill-down” manner using voice-activated commands.

The “drill-down” feature is particularly neat! The repository has a few demos that you can give a spin; you can also read a UW press release on the paper.

I’m curious how we can use tools like VoxLens to make our all sorts of figures accessible. To be clear, our baseline goal should be ensuring that all of our diagrams and figures have accessible text. Beyond that, there are many interesting design and implementation questions:

PLSE (and more broadly, PL & SE) has all sorts of bespoke data visualizations (e.g. e-graphs, flamegraphs, control flow graphs). Broadly speaking, how should we represent this data in artifacts?
do general-purpose accessible visualization tools work in this use-case? e.g. is it sufficient to describe an e-graph with just its nodes and edges?
how should the “accessible version” of a figure change depending on the target audience? e.g. should we present different accessible text to a researcher and a student?
how closely tied should a data structure’s accessible representation be with visual intuition?
separately, can we do better than a “copy code” button for code snippets? what about annotated code snippets (very common in PL papers)?

In the process of answering these questions, I think we can also make more generally-effective figures & visualizations!

Next Steps

In the short-term, there are a few other changes I want to work on:

fixing the color contrast issues I outlined
continue improving alternative text
add a copy code button
adding more static analysis checks (e.g. Alfa or Axe)
doing more manual audits!

In the long-term, I’m curious how we can extend some of this work outside of our website — can we translate some of this into accessible artifacts in publications? Or apply the same approach more broadly (to our project sites, across UW CSE, etc.)?

A potential deadline for some of this work is the the updated interpretation of Title II of the ADA (the “web rule”) — which, among other things, requires our digital course materials to be accessible by April 24, 2026. Ideally we’ll be significantly ahead of this deadline :)

Separately, I’m excited to see the digital accessibility landscape evolve. I think there’s some wonderful intersections between PLSE’s research interests and accessibility — from improved static analysis techniques to DSL design to PL + CHI applied to developer interfaces.

In the meantime, I hope that this was a helpful blog post! Open to any thoughts, suggestions, or concerns. See you in the next one :)

Venkatesh Potluri, Priyan Vaithilingam, Suresh Iyengar, Y. Vidya, Manohar Swaminathan, and Gopal Srinivasa. 2018. CodeTalk: Improving Programming Environment Accessibility for Visually Impaired Developers. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ‘18). Association for Computing Machinery, New York, NY, USA, Paper 618, 1–11. https://doi.org/10.1145/3173574.3174192 ↩
Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock. 2022. VoxLens: Making Online Data Visualizations Accessible with an Interactive JavaScript Plug-In. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ‘22). Association for Computing Machinery, New York, NY, USA, Article 478, 1–19. https://doi.org/10.1145/3491102.3517431 ↩

Keywords:

Back to blog posts...

Post Metadata