Skip to main content

A Beginner’s Guide to Automated Accessibility Testing: Open Source SDKs

What’s Going On

When we began building the accessibility team at Ally Group, I was immediately drawn to the potential of automation and how it can help our delivery team. There are many claims in the industry that teams can achieve over 80% automation coverage. It’s an enticing idea and the thought of catching almost every barrier with the click of a button made me very curious. As I went deep into the idea of custom automation, I found that while 80% might be possible in very specific, isolated scenarios, a consistent range of 40% to 60% is a much more realistic and still highly effective goal for most organizations.

Where Teams May Get Stuck

The struggle for most teams isn't a lack of tools, but a lack of clarity on how they work together. We often see teams get overwhelmed trying to find a "silver bullet" solution. Many wonder if they should build their own internal testing engine to gain more control or buy a high-priced external tool.

However, building in-house often leads to a heavy maintenance burden and is often not feasible for teams that are not large enough to tackle internal efficiencies. Accessibility standards also evolve constantly, and a custom tool can quickly become outdated if it is kept as a lower priority.

When an in-house tool misses an issue or flags a "false positive," it creates doubt in the entire process. Furthermore, the documentation-heavy nature of accessibility can make it difficult for anyone but a specialized developer to maintain a custom codebase. This leads to a gap between the goal of accessibility and the daily reality of the development workflow.

How to Think About It Instead

Instead of looking for a single tool to reach 80% coverage, we should think of it as building a "safety net" made of different layers. No single open-source SDK is perfect, but when you layer them, they complement each other’s strengths. By combining these foundations with AI-integrated code reviews, we can create a robust system that catches common errors early. This allows your team to stop worrying about simple "typos" and misses in the code and start focusing on the actual human experience of the product.

4cb96070970132f4ecc9e0df868fffb8041c0a7f.webp

A Three-Layered Approach

I looked into some of the open-source SDKs in detail and integrated them into our code check-in process. Here are my early observations:

  1. Axe-core 

    Axe-core is widely considered the most reliable SDK available. Its greatest strength is its accuracy since it produces almost no false positives. This makes it the perfect tool to integrate into a CI/CD workflow, as it reduces the back-and-forth between developers and QA.

    • The Input: You can set evaluation criteria like WCAG 2.2 AA, WCAG 2.1 AA, etc.
    • The Output: It provides detailed feedback, including the severity of the issue and specific DOM node information.
    • The Benefit: It can even be used as a linter (axe-linter) in your IDE, which can flag certain issues before the code is even committed, thereby reducing iterations during the code review and QA phases.
  2. IBM Equal Access Toolkit

    The IBM Node SDK often identifies specific issues that other tools might overlook. What makes this tool unique is that the structure of the output is organized around clear accessibility checkpoints. It also utilizes IBM’s own accessibility policy, which can provide a different perspective if needed.

    • The Input: Accepts specific policies and "fail levels" (violations vs. warnings). You can also set evaluation criteria.
    • The Output: Reports can be generated in various formats like JSON, CSV, and even PDF, which is helpful for official documentation. It does, however, lack severity levels in the report.
    • The Benefit: It provides excellent snippets of the problematic code, making it very easy to present for legal requirements.
  3. Pa11y 

    Pa11y acts as a helpful coordinator, wrapping around Axe-core and HTML Codesniffer to fill in the gaps. While it lacks some of the detailed "severity" levels found in Axe, it is an excellent fit for testing "workflows."

    • The Input: You can pick your engine (Axe, HTMLCS, or both) and configure specific user actions to test certain workflows.
    • The Output: A simpler response format that is easy to parse, though it lacks severity information.
    • The Benefit: It helps ensure the site remains accessible during complex events, like a user navigating a multi-step form.

After all, our goal isn't just to clear a project dashboard of minor flags, but to leverage automation to solve the problems that impact the most users.

The Role of AI-Integrated Code Reviews

To push your coverage closer to that 60% mark, I recommend pairing these SDKs with AI-driven code reviews. While SDKs are great at finding "hard" errors (like a missing label), AI can help identify "soft" errors—such as whether a label actually makes sense in context. This combination bridges the gap between technical compliance and actual usability. AI may also be used as part of your PR lifecycle. I will be covering this aspect in detail in another blog post in the near future.

A Simple Strategy for Success

In our experience, the most effective setup uses Axe-core as your primary guardrail, the IBM Toolkit as your secondary auditor, and Pa11y to support specific user journeys. You may use any testing framework like Playwright, Cypress, or Selenium for this purpose.

You don't need to be an expert to start; you simply need a clear strategy for how these tools work together to protect your users and your brand. This provides an initial level of automated defense prior to in-depth manual testing, helping us achieve compliance by reducing iterations.

It is also important to remember that manual testing remains a critical component for achieving full accessibility.

The core process remains unchanged: the Product, QA, Design, and Development teams must review these automated results to make informed decisions on what to prioritize. After all, our goal isn't just to clear a project dashboard of minor flags, but to leverage automation to solve the problems that impact the most users.