Accessibility health scoring

A plain-language guide to how the scan report health gauge score is calculated, what it means, and why it sometimes disagrees with the violation count.

Accessibility health scoring

Sometimes a scan report shows a site with hundreds of violations but a score in the 90s, and that looks like a contradiction. This guide explains how Cascadia Marquee arrives at the accessibility health score, the number in the middle of the gauge on every scan report.

It is written for anyone who looks at a scan report, whether that is your own site's report or one you are helping someone else read. You do not need to be a developer to follow it. By the end, the score should make sense to you, including the cases that feel backwards at first.

What the score is

The health score is a number from 0 to 100 that answers one specific question:

Of all the automated accessibility checks we ran on the pages we sampled, what share of them (weighted by how serious they are) passed?

In short, higher is better. A higher number means more of the important automated checks came back clean.

It is just as useful to know what the score is not:

  • It is not "violations per page." A page can have a few violations and still score well.
  • It is not a count of every element on the site.
  • It is not a measure of whether we scanned the whole site. We only sample pages.
  • It is not a WCAG compliance grade, and it is not a legal or ADA pass/fail.

One more caveat, which the gauge itself shows on finished reports: automated tools cannot catch everything. A human tester reviewing the site can still find issues that no automated scan will ever see. The score reflects the automated surface, not the full accessibility picture.

For the technically curious: if you have access to the codebase, the math lives in src/lib/accessibility-scan-analysis.ts, in a function called computeHealthIndex. The report UI runs analyzeScanFindings the moment a scan finishes. See scan-report-view.tsx and accessibility-portal-data.ts. You can skip this note if code is not your thing.

The three numbers people mix up

Most confusion about scores comes from treating three different numbers as if they were the same thing. They are not. Each one is different:

MetricWhat it is really countingWhere it comes from
Health scoreHow much of the weighted automated work passed, across every check on every sampled pageCalculated when the report renders, from each page's axeViolations and axePasses
totalViolationsThe number of failing rules, counted once per pageapps/scan-worker/src/scan.ts sets violationsCount = axeViolations.length per page; queue.ts adds those up
Flagged elementsThe actual DOM nodes (buttons, links, images, and so on) that a rule flaggedaggregateFindings in accessibility-scan-analysis.ts

Why the violation count balloons but the score barely moves

Imagine a single styling problem, such as low-contrast text in your theme's footer. That same problem shows up on every page. If the site has 200 pages, totalViolations jumps by 200, because it counts the failure once per page.

But it is really one issue. Fix it once in the theme, and all 200 "violations" disappear together.

The health score does not work that way. Instead of counting how many pages a rule failed on, it weighs how many elements failed against how many passed, across everything. So that single footer problem might add a big number to the violation badge while barely moving the score, because on each of those 200 pages thousands of other checks still passed.

For example, a color-contrast failure on every product page looks alarming in the violation count, but it may flag only a handful of elements per page while the rest of the page passes fine.

How the score is calculated, step by step

This section covers modern scans. Older scans work a little differently; see the fallback section below.

Newer scans save a lightweight summary of everything that passed on each page, stored as axePasses. Whenever that pass data is present, we score using a weighted pass ratio, which is basically good stuff divided by all stuff. We build it like this.

Step 1: Give every check a weight based on how serious it is

Not all problems are equal. A missing button label that breaks navigation matters more than a minor cosmetic nit. So each passing or failing check earns a weight based on its impact level:

ImpactWeight
critical10
serious7
moderate3
minor1

Passes and failures use the same weights, defined in the IMPACT_WEIGHT table in accessibility-scan-analysis.ts. A check's weight is also multiplied by how many elements it touched, so a rule affecting 50 elements counts more than one affecting 2. For efficiency, the worker compacts the pass data first, in compactAxePasses in apps/scan-worker/src/scan.ts.

Step 2: Sort each check into a category

Every check gets filed into one of six categories that map to the rings on the gauge. For example, color-contrast goes into Text & contrast, and image-alt goes into Images & media. When we hit a rule we do not recognize, we fall back to some sensible guesses (categorizeAxeRule).

Step 3: Do the actual division

With weights assigned, the overall score is one ratio:

score = round( passed weight / (passed weight + failed weight) × 100 )

In words: of all the weighted check-work, how much of it passed? The two pieces break down like this:

  • Failed weight: for every finding, multiply its impact weight by the number of elements it flagged, then add them all up.
  • Passed weight: for every page's pass summary, multiply each rule's impact weight by how many nodes it covered, then add them all up.

Step 4: A few final adjustments

After the raw ratio, three small tweaks finish the job:

  1. Never show a perfect 100 if anything is still broken. If there is any failing weight left at all, the score is capped at 99. A site with open issues should not look flawless.
  2. No accessibility statement means subtract 8 points. If the scan did not find a dedicated accessibility statement page (statementFound === false), we take 8 points off. This is a common reason a score looks lower than expected.
  3. Keep it in bounds. Finally, clamp the result to the 0 to 100 range.

How the individual ring scores work

Each of the six categories on the gauge gets its own score using the same pass-ratio math, just limited to the checks in that category. Like the overall score, any category with flagged elements is capped at 99.

There is also a seventh segment, Accessibility statement, scored on its own simple scale: 100 if a statement was found, 35 if it definitely was not, and 70 if we could not tell. This segment affects the center score only through the 8-point adjustment above. It is not blended into the main pass-ratio math.

When there's no pass data

Some older scans ran before we started saving axePasses, so there is nothing to build a pass ratio from. In those cases we fall back, in order:

  1. If we have the six category scores, the center score is simply their average. The statement segment is left out of this average.
  2. If we do not have those either, we use a density fallback: densityToHealthScore(totalFailedWeight / pagesScanned). This turns "how much weighted failure per page" into a 0 to 100 score.

We prefer the pass-ratio method whenever the data is there, because it reflects how much of the automated work actually passed, not just how many failures we happened to find.

What the bands mean

Once there is a final number, it lands in one of four bands. This is what drives the label and color you see on the report:

ScoreBandUI label
90-100ExcellentExcellent
75-89GoodGood
50-74NeedsWorkNeeds work
0-49PoorPoor

Two situations that look backwards (and why they aren't)

These are the cases that come up most often. Both make sense once you remember the score is a pass ratio, not a violation count.

A big site scores in the 90s despite hundreds of violations

This is normal. The usual story:

  • The site has lots of sampled pages, sometimes hundreds.
  • One or a few rules repeat on most of them, which pushes up totalViolations.
  • But each failure flags only a few elements per page, often at moderate or minor weight.
  • Meanwhile, every page runs many checks, and each passing check can cover many elements (all the links, buttons, text, and so on).

So the pool of passed weight is huge compared to the failures. The ratio stays high, and the score does too.

A small site scores around 88 with hardly any violations

Also normal, for the opposite reasons:

  • With fewer pages, the pass pool is smaller, so the same handful of failures moves the ratio more.
  • A serious or critical failure weighs 7 to 10 times as much as a minor passing check.
  • Even one type of issue can hit many elements across pages. Think of a single problem in the global header.
  • A missing accessibility statement on its own accounts for 8 points, which can turn a 96 into an 88.

A common gotcha: whenever a score looks "too high" or "too low" for the violation badge, the first thing to check is whether the site has an accessibility statement. The 8-point adjustment explains a surprising number of off-looking scores on its own.

What the score leaves out

A few things do not move the number at all:

  • Pages we did not scan. The gap between discoveredUrls and sampledUrls shows up in the report's explanatory text, but it does not change the numeric score.
  • IBM Equal Access (ACE) results, even when that engine is enabled. Scoring today is based on axe.
  • Manual audit checklist items. Those are a separate workflow that happens after the automated scan.
  • Legal or ADA compliance. The score is not a pass or fail verdict.

Making sense of a surprising score

If a score looks off, whether you are reviewing your own report or helping someone else with theirs, these steps untangle almost every case. The first few work for anyone. The last one needs internal admin or code access, so it is mainly for staff digging deeper.

  1. Open the scan report gauge and note three things: the health score, the total violations count, and the impact breakdown (how many critical, serious, moderate, and minor issues there are).
  2. Compare pages scanned against pages discovered in the report. A score only reflects the pages that were actually sampled.
  3. Check whether the site has an accessibility statement. If it does not, the score is 8 points lower than it otherwise would be, which is often the whole mystery.
  4. In the findings, look at the element counts and impact levels for the top rules, not just the raw violation number. A high count of low-impact issues affects the score far less than a few serious ones.
  5. Ask whether the repeated rules are global or theme-level, such as a single header or footer problem showing up everywhere. If so, one fix can clear violations across many pages at once.
  6. For staff doing a deeper dive: confirm whether axePasses exists on the page rows. That tells you whether the report used the modern pass-ratio scoring or the older fallback method, which can explain differences between otherwise similar sites.