AWS WorkSpaces: AI agents control legacy apps

AWS lets AI agents control its WorkSpaces desktops – via clicks, keyboard, and image analysis, without application APIs.

listen Print view
An old computer screen with desktop icons and a mouse pointing to a folder.

(Image: Moritz Förster / KI / iX)

3 min. read
Contents

AWS is expanding its virtual desktop service WorkSpaces with a feature for AI agents. In the public preview, agents can access managed cloud desktops and operate applications there – without companies having to retrofit interfaces or modernize legacy software. Access takes place within the existing WorkSpaces environment. There are no additional costs during the preview phase.

According to AWS, the background is that business processes often depend on legacy applications, specialized clients, or other desktop interfaces for which there are no modern APIs. AWS cites a Gartner report stating this applies to 75 percent of all companies; 71 percent of Fortune 500 companies run critical processes on mainframe systems without sufficient programmatic access. Instead of redeveloping applications, agents are now intended to operate the existing interface – similar to how a human user clicks, types, scrolls, and evaluates screen content.

Technically, AWS connects agent access to the existing WorkSpaces infrastructure. The agents authenticate via AWS Identity and Access Management (IAM) and connect to a managed MCP endpoint. The Model Context Protocol (MCP) forms the standardized layer between the agent and the tools. Specifically, this means an agent can request a screenshot, interpret the visible interface, and then trigger mouse or keyboard inputs.

AWS names Computer Input, Computer Vision, and Screenshot Storage as central functions. Computer Input includes the actual inputs on the virtual desktop, such as clicks, text entries, and scrolling. Computer Vision means the agent does not read the application via an API but “sees” it through screen captures – for example, forms, buttons, or tables in an existing specialized application. Screenshot Storage allows these captures to be saved for audits and troubleshooting.

The governance aspect should also be of interest to companies. Since the agents run in the managed WorkSpaces environment and do not operate on local systems or directly on backend systems, existing security controls continue to apply. AWS also points to audit trails via CloudTrail and CloudWatch, which can be used to log and track the agents' activities.

Videos by heise

The function is configured via a WorkSpaces Applications stack, which now offers a new option to activate AI agents. Agent functions, screenshot storage, and display parameters such as resolution and image format can then be set there. The resolution is more than just a display detail: Dense interfaces with many UI elements benefit from more image information, while simple terminal-like environments require less.

To demonstrate the new function, AWS shows an agent processing a repeat prescription in a sample pharmacy system. The agent retrieves the patient data, searches for the medication, initiates the order, and confirms completion. The process is intended to illustrate that the application itself did not need to be adapted – the agent works with the existing interface.

According to AWS, WorkSpaces supports the Model Context Protocol and can thus be connected to common agent frameworks such as LangChain, CrewAI, and Strands Agents. The public preview is available in several regions, including Frankfurt.

(fo)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.