Game UI Is About 15 Years Behind the Web

Here is the uncomfortable observation at the center of this argument: modern video games are among the most technically advanced products we build, yet when it comes to accessibility infrastructure for interfaces, many of them still operate more like the web did in the late 2000s and early 2010s than like the web does today. When I say “15 years,” I do not mean it as a random provocation. I mean that WAI-ARIA 1.0 reached Candidate Recommendation on 18 January 2011 and became a W3C Recommendation on 20 March 2014. That was the period when accessibility on the web stopped being mostly good intentions and scattered implementation patterns and started becoming standardized, machine-readable infrastructure that browsers, assistive technologies, and tooling could rely on. Games still largely keep equivalent interaction metadata locked inside proprietary engines.

That gap is starting to matter. Accessibility is exposing it first. AI will expose it next.

Accessibility in games has improved dramatically

To be clear, accessibility in games has improved dramatically. Naughty Dog said that The Last of Us Part II shipped with more than 60 accessibility settings, including expanded options for fine motor and hearing, as well as completely new features for low-vision and blind players. Santa Monica Studio revealed God of War Ragnarök with more than 60 accessibility features before launch, and Sony’s current official accessibility page for the game now says it offers over 70 accessibility features. That matters because it shows real progress, but it also reveals something deeper: once a game can guide you to an objective, announce interactables, reshape puzzle support, or expose deep subtitle and readability controls, it is already depending on structured internal knowledge about the game world and its interface.

The hidden structure inside modern games

Take The Last of Us Part II. Naughty Dog’s official accessibility breakdown includes navigation assistance, traversal and combat audio cues, enhanced listen mode, high-contrast display, and a wider set of options explicitly aimed at low-vision and blind players. Those features only work if the engine already knows what objects exist, which of them matter, where the player is, what is interactable, and which state those systems are currently in. The player sees a feature. The engine sees structured state.

God of War Ragnarök points in the same direction. Sony and Santa Monica Studio describe subtitle and caption improvements, critical gameplay captions, remapping, puzzle and navigation assistance, HUD adjustments, and broader readability and interaction options. Again, those are not just cosmetic settings bolted on at the end. They imply that the game is already modeling important parts of its world and UI in ways that can be surfaced, filtered, and translated into player support. In other words, modern game engines already contain a large amount of machine-readable information about the world and the interface. Accessibility features do not magically create that structure. They reveal that it is already there.

The web solved this problem fifteen years ago

The web solved a version of this problem years ago. A modern web application is not just pixels rendered to a screen. It exposes semantics through the accessibility tree, which in turn can be consumed by assistive technologies, browser tooling, and increasingly by automation and AI systems. That structured layer is one of the quiet reasons the web became something machines can navigate with much less guesswork than raw visual analysis alone.

Games, by contrast, still mostly expose pixels. Menus, inventories, dialogue options, HUD elements, and interaction prompts are rendered visually, but from outside the runtime there is usually no public, standardized, machine-consumable interface comparable to the web’s accessibility stack. That is the real gap. The issue is not that games have no structure internally. They clearly do. The issue is that external tools usually cannot consume that structure in a standard way. That makes accessibility support, automation, and AI interaction far more engine-specific and brittle in games than on the web.

The engines are moving

To be fair, the engines are moving. Unity in particular has moved further than many people realize. Unity’s current documentation says its Assistive Support API lets developers create an active accessibility hierarchy, notify the screen reader about changes in the UI, and respond to events based on user actions. Its scripting API defines AccessibilityHierarchy as the hierarchy data model that the screen reader uses to navigate and interact with a Unity application, and AccessibilityNode as a node representing a UI element that the screen reader can read, focus, and execute actions on. That is not just mobile settings support. It is a tree-like internal model with labels, roles, states, and hierarchical relationships.

Unity 6.3 then pushed this further by extending its screen reader support APIs to Windows and macOS, enabling compatibility with Narrator and VoiceOver on desktop, and Unity’s own 6.3 materials present this as native desktop screen reader support using the same unified APIs across Android, iOS, Windows, and macOS. That is meaningful progress and it deserves to be described fairly. But it still strengthens the main point rather than weakening it: Unity is moving toward something tree-like internally, yet it remains Unity infrastructure, not a cross-engine standard for machine-readable game interfaces.

Unreal has also made real moves. Epic’s documentation says Unreal includes APIs that allow third-party screen readers to read UI text and that this supports a number of common UMG widgets. Epic also positions Common UI as a framework for advanced cross-platform interfaces with layered navigation, submenus, and popups. Again, that is real infrastructure. But it is still Unreal infrastructure. It does not amount to a public, cross-engine equivalent of the web’s accessibility tree.

The proprietary silo phase

That is why the best description of the current moment is not “games have no structure.” It is that we are in the proprietary silo phase. Big studios and engine vendors are building increasingly impressive internal systems: accessibility layers, interaction tagging, navigation graphs, gameplay metadata, screen reader support, and specialized UI frameworks. But the pipes are not shared. The structure exists, yet most of it remains locked inside particular engines, toolchains, and platform ecosystems.

Two forces are starting to apply pressure

Two forces are now starting to apply pressure to that model.

The first is regulation, but this part needs precision. The European Accessibility Act is real and the European Commission states that it came into effect in June 2025, aiming to improve accessibility for certain products and services across the EU. But that does not mean gameplay itself is now regulated in the same blanket way as the web. In the United States, the FCC is explicit that the CVAA is relevant to accessibility of communications in video games. That matters because modern games are often communication spaces and service layers, not just isolated products, but it is still not the same as saying all gameplay is now comprehensively regulated. The honest version of the argument is that regulation is not yet forcing game accessibility in the same broad way it has shaped the web, but the pressure is no longer hypothetical. Parts of the game ecosystem are already in scope, and the general direction is toward greater accessibility expectations for digital services and communication features.

The second force is efficiency, especially around AI and automation. Studios increasingly want more help from automation in testing, playthrough analysis, and content validation. But AI cannot reliably test a system it cannot understand. If the only thing an automated system gets is pixels, then it has to infer interface state, object importance, and interaction possibilities through vision. That is slower, more brittle, and more expensive than consuming structured state directly. This is partly an inference, but it is a grounded one: systems that can read explicit state generally do less guessing than systems that have to reconstruct it from rendered output. The moment it becomes cheaper to expose gameplay structure than to keep teaching AI to infer it from pixels, the industry will pivot.

Imagine an AI assistant for players

Push that one step further and the connection to accessibility becomes even more obvious. Imagine a blind player, or a player with cognitive impairments, using an AI assistant during play. If the assistant only gets pixels, it has to guess. If it gets structured information about nearby objects, threats, objectives, and interaction states, it can explain the environment, guide the player, warn about risks, and reduce cognitive load in a much more reliable way. Conceptually, that is very close to what the accessibility tree did for the web. The blocker is not imagination. The blocker is infrastructure.

How standards actually win

This is the part people usually avoid saying out loud. Accessibility infrastructure rarely wins because an industry reaches moral consensus. It wins when the ecosystem makes it unavoidable. On the web, standardized accessibility became normal because standards matured, browsers implemented platform APIs, regulations applied pressure, and platform expectations hardened over time. It did not happen because everybody suddenly chose virtue.

Games will likely follow the same pattern. Studios will not expose structured interface metadata because it is ethically beautiful. They will do it when it becomes the easiest way to pass platform certification, integrate AI testing pipelines, satisfy accessibility obligations in the parts of the ecosystem that fall in scope, and support cross-platform tooling without rebuilding everything from scratch for every stack. That is the historical pattern, not just in accessibility but in software more broadly.

The real question

The real trigger probably will not be a pure open standard appearing out of nowhere. More likely, one of three things happens first. A dominant engine or platform creates a de facto standard. Certification requirements from major platform holders start demanding more machine-readable accessibility support. Or AI testing economics make exposed structure cheaper than vision-first guesswork. That middle scenario matters more than people think. If Sony, Microsoft, or another major platform holder ever makes machine-readable accessibility support part of what it takes to move cleanly through certification, adoption will accelerate fast. Not because the industry suddenly agreed, but because the incentives changed.

So the real question is not whether games will move toward something functionally similar to an accessibility tree. Game engines already know a lot about their interfaces. They know what is interactable, what state systems are in, where the player is supposed to go, and what can be surfaced as prompts, guidance, warnings, and accessibility features. The real question is who forces that transition first: engine vendors, platform holders, regulators, or the economics of AI itself.

The web went through this pressure point roughly fifteen years ago. Games are now approaching the same one. Accessibility is the first signal. AI will likely be the second.

The short version

Games already know a huge amount about their interfaces. They just do not expose that knowledge in a shared, machine-readable way. Accessibility is starting to crack that shell. AI might be what finally breaks it open.