Automated Accessibility Testing on the Web: Possibilities and Limitations
Tools can help check the accessibility of web applications – but human understanding is required in many areas.
(Image: Dilok Klaisataporn/Shutterstock.com)
- Maria Korneeva
With the European Accessibility Act (EAA) and its national implementation through the Accessibility Strengthening Act (Barrierefreiheitsstärkungsgesetz, BFSG), binding requirements for numerous digital products and services have been in effect in Germany since June 2025. In parallel, the organizational prerequisites have been created: the responsible market surveillance authority has been established and is gradually commencing its work. This brings accessibility into the concrete focus of compliance, risk assessment, and product responsibility for many companies for the first time.
And the desire for efficient, scalable solutions is growing. Many organizations hope for automated accessibility checks as a quick and as complete as possible path to conformity. Accordingly popular are linters, browser extensions, CI/CD integrations, and AI-supported testing tools. Automation is an important and sensitive tool, but it has clear technical limitations.
Because numerous barriers cannot be detected by machines. They arise from a lack of context, unclear meaning, complex interactions, or a lack of understandability – aspects that require human judgment. This article shows what types of accessibility tools exist, what tasks they can meaningfully perform, and why a significant portion of barriers remain invisible even with the most modern automation.
Videos by heise
Testing Tools and Their Limitations
To understand the possibilities and limitations of automated accessibility checks, it is first worthwhile to look at the different types of tools and what aspects of accessibility they can each capture.
Linters are static analysis tools for source code. They detect syntactic or structural errors, such as whether an alt attribute is missing or a button has no label. However, they have no knowledge of how pages behave in the browser, how focus flows work, or whether interactive components react correctly. Static tools only see code – not usage.
Browser extensions, on the other hand, analyze the Document Object Model (DOM) in its rendered state and can therefore detect more than static analyzers. Nevertheless, they remain "snapshot tools": they evaluate a state, but not the interaction over multiple steps. Complex focus changes, keyboard traps, or dynamically updated content typically remain invisible.
Unit test plug-ins are useful for checking specific individual components for barriers. However, unit tests only cover the functionality (e.g., keyboard operability) of a single component and typically do not represent complete user flows.
End-to-end test tools offer broader coverage. They can simulate more complex interactions, such as focus control when opening and closing a modal, which is a dialog that overlays the page content and often involves an additional action. However, developers themselves must think of such test scenarios. If accessibility plug-ins are integrated, some aspects can be checked automatically. The most comprehensive result is obtained by writing test cases for important processes yourself and additionally having various states of your website checked by automated plug-ins. But even then, a fundamental problem remains: end-to-end tests do not know if an operation is "logical" or "understandable." They execute commands – but they do not "experience" usage as a human does.
CI/CD scanners automate checks in the build or deployment process. They are particularly well-suited for detecting typical error patterns early and preventing regressions. However, their limitations are the same as those of the underlying tools. Whether linters, browser extensions, unit tests, or end-to-end tests are integrated: they check code, structure, and simple interactions – but not complex navigation flows or content meanings.
All these tools make valuable contributions to the development process. But how much testing effort do they still leave?
(Image:Â jaboy/123rf.com)
You can learn more about Accessibility on the Web at enterJS 2026 on June 16 and 17 in Mannheim. The conference revolves around JavaScript/TypeScript development in the enterprise sector. Discounted early bird tickets are available in the online ticket shop.
How Much Automation Can Really Achieve: Insights from Studies and Practice
Several studies have investigated the proportion of barriers that automated tools can actually detect. A comprehensive analysis by accessibility software provider Deque Systems found in 2024 that its automated tests could identify approximately 57 percent of all accessibility problems in real audits. With AI support, the company claims to have achieved as much as 80 percent.
Accessibility practitioners see the effectiveness of automated tools as significantly more limited, estimating that only 20 to 40 percent of potential barriers can be technically detected. Several experts, including Adrian Roselli and Steven Faulkner, report from extensive field tests that automated checks detect only 4 to 10 percent of actual problems.
What explains this discrepancy in estimates? Of course, the figures from the marketing department and independent accessibility consultancies differ because they pursue different goals. The intentionally inserted bugs that the test pages contain also differ, and thus the test results. WCAG versions (Web Content Accessibility Guidelines), tools used – all of this leads to high variance in estimates.
Despite the differences, these figures clearly show that existing tools cannot yet fully assess whether a website is accessible. Even formal checking of WCAG criteria is not yet 100 percent automatable.
Typical Limitations and Case Studies
Even though accessibility requires much more than just blindly adhering to WCAG success criteria, these guidelines provide a solid checklist for getting started. The requirements they contain relate to both the technical and semantic aspects of content, meaning how it is programmatically made available and how clearly it is formulated.
Automated accessibility tools can primarily check structures, patterns, and technical properties. They detect missing attributes, incorrect roles, or syntactic errors – but they do not understand what content means, how users interact with an application, or how logically an interface is structured.
Therefore, it is worthwhile to look at the WCAG criteria from the following perspective: Which requirements relate not only to structures and the formal presence of certain elements, but also to aspects such as intuitive use, interpretation, context, and relevance? The focus is on the criteria of conformance levels A and AA (see info box), as they are legally required by all accessibility laws. The WCAG guidelines are based on the fundamental principles of web content accessibility – perceivable, operable, understandable, and robust. The examples in this article are grouped around these principles.
The Web Content Accessibility Guidelines (WCAG) are an international standard from the World Wide Web Consortium (W3C) for making web content accessible. They define testable success criteria based on four guiding principles: perceivable, operable, understandable, and robust.
WCAG distinguishes three conformance levels, which describe different levels of impact, effort, and technical complexity of the requirements:
- Level A
Requirements with a basic impact on accessibility and relatively low implementation effort. Without their fulfillment, usage is difficult or impossible for many people with disabilities (e.g., alternative texts for images, keyboard operability). - Level AA
Requirements with a high impact for a broad user group, addressing central barriers in the usage context, but requiring a higher level of design, editorial, and technical accessibility know-how (e.g., sufficient color contrast, understandable labels, consistent navigation).
This level is considered the benchmark legal standard in practice and is required by almost all accessibility laws. - Level AAA
Requirements with a very specific impact for individual user groups, involving high conceptual, technical, or organizational effort and therefore not realistically achievable for all content (e.g., sign language versions).