Beyond chat: solving real user problems with LLMs

In addition to chat interfaces, there are other interaction options with LLMs that developers can integrate into their applications and offer added value.

listen Print view
A network with data points. A finger of a hand points to a data point.

(Image: Sergey Nivens/Shutterstock.com)

13 min. read
By
  • Sascha Lehmann
Contents

Since the release of ChatGPT, the chat window has been the central user interface for interacting with artificial intelligence. But is a chat really the best way to interact – or are there possibly better ways to integrate AI into applications?

Sascha Lehmann
Sascha Lehmann

Sascha Lehmann war mit seinem ersten PC schon klar, in welche Richtung die Reise geht. Durch Desktop- und Backend-Entwicklung im .NET-Umfeld fand er über die Jahre hinweg zu seiner wahren Leidenschaft, der Webentwicklung. Als Experte im Angular- und im UI/UX-Umfeld hilft er bei der Thinktecture AG in Karlsruhe tagtäglich Kunden bei ihren Herausforderungen und Vorhaben.

In recent years, AI tools have taken the world by storm. AI functions have found their way into everyday software – be it in development environments (IDEs), office programs or even in the preparation of tax returns. And you can chat with the software almost everywhere. But why actually?

The strengths of large language models lie in their ability to process different types of information and communicate with users in natural language. To do this, they also require input – in natural language. So what could be more obvious than interacting with them via text input?

From a user experience (UX) point of view, chat is also an obvious interface. Almost everyone is familiar with this mental model, i.e. the basic functionality and appearance of a chat window and can use it intuitively without prior training. This low threshold was one of the decisive factors for the resounding success of ChatGPT and comparable applications.

Videos by heise

On closer inspection, however, the "chat" interaction model cannot simply be transferred just as successfully to other areas of application. As helpful as it can be to discuss any questions in an open chat with an AI, this model loses its appeal all the more quickly as soon as it is used in a clearly defined application context. The framework is usually much narrower there, which raises new challenges – for example:

  • How can a chat be meaningfully integrated into the application context?
  • What specific added value does the AI function offer compared to established workflows?
  • How can subject-specific information be integrated in context?

Without targeted support –, such as information on possible interactions or available domain knowledge and its use in the chat –, many users quickly feel overwhelmed. If initial interactions are also unsuccessful, this often leads to frustration – and the advertised AI feature is only used hesitantly or not at all. The impression is created that the new technology has only been integrated for its own sake.

Such a user experience must be avoided at all costs. AI functions – and all other features – must offer clear added value. Be it by expanding the range of functions or by simplifying previously tedious tasks.

Just like the infamous blank page when writing a term paper – a blank chat – creates too much cognitive load, i.e. a kind of overload or paralysis. To counteract this, suggestions can be helpful: small containers with concrete prompt hints.

"Suggestion cards" (here for chat GPT) help to reduce the initial overwhelm and provide interaction tips.

(Image: Shape of AI)

These suggestions are part of a collection of UX patterns (Shape of AI) relating to the use of AI and chat integrations. As artificial intelligence is still a young field, more and more of these design patterns will emerge in the coming years, which developers can draw on when designing and developing. Nevertheless, it is advisable to use such patterns today to enable users to get started easily and intuitively.

The cognitive load is not the only weak point of chat-based interfaces. During longer conversations, the context window – may exhaust the LLM's short-term memory, so to speak, to retain information from the conversation – of the language model currently being used. In such cases, users have to switch to a new chat. However, as LLMs do not have a permanent memory, it is necessary to provide a summary of what has been said so far during this switch. This is the only way to build on previous results.

In addition, LLMs occasionally tend to hallucinate during conversations or get lost in an inefficient back-and-forth if their input is imprecise. This becomes particularly problematic if the user already has a clear idea of the desired result. The challenge lies in formulating your own intentions so clearly that the model interprets them correctly – in line with the motto: "Do what I want – not what I say."

So, beyond the classic chat interface, are there cleverer ways to make AI functions accessible to users – preferably in small, easily digestible units so that they are not overwhelming in the first place?

A closer look at the strengths of large language models reveals skills that can be particularly helpful in everyday life:

  • Understanding and processing natural language
  • Extensive knowledge of the world
  • diverse areas of application and enormous adaptability
  • Multimodality – Processing of text, audio and image data (without changing models)
  • real-time language processing
  • Recognition and analysis of patterns

There are always application scenarios in which data needs to be extracted from documents, images or videos and processed in a structured form – Forms, for example. Filling out long forms is not usually one of the most popular tasks in everyday life.

This is precisely where there is significant potential to improve the user experience. But what could an optimized "filling workflow" look like in concrete terms?

Extensive interfaces (Application Programming Interfaces, APIs) are available on the web and in established frameworks for working with forms. The underlying structure of a form is often defined in the form of a JSON object (JavaScript Object Notation).

The listing shows an example declaration of a FormGroup (including validators) within an Angular application.

 personalData: this.fb.group({
      firstName: ['', Validators.required],
      lastName: ['', Validators.required],
      street: ['', Validators.required],
      zipCode: ['', Validators.required],
      location: ['', Validators.required],
      insuranceId: ['', Validators.required],
      dateOfBirth: [null as Date | null, Validators.required],
      telephone: ['', Validators.required],
      email: ['', [Validators.required, Validators.email]],
      licensePlate: ['', Validators.required],
    }),

This JSON object represents the first building block of the workflow and also defines the target structure into which the system transfers the extracted information. The second building block is the source data in the form of text, images or audio. To simplify the presentation, in the following scenario they are available in text form and are to be transferred to the system via the clipboard.

A third aspect remains: developers must instruct the language model – by giving it a precise task description in order to perform the desired processing step correctly. This instruction takes place in the background, hidden from the user.

Even if developers deliberately do without a chat interface, language models continue to work on the basis of instructions in natural language. In order to relieve users of the formulation and restriction work, these instructions can be stored in the program code in advance as so-called system messages or system prompts.

The advantage of this approach is that the commands can be transmitted to the LLM in a standardized and consistent quality. It is also possible to provide these prompts with guards – supplementary instructions to prevent hallucinations or counteract potential misuse.

The following is an example of a system prompt with a specific task for the LLM:

Each response line must follow this format:
FIELD identifier^^^value

Provide a response consisting only of the following lines and values derived from USER_DATA:

${fieldString}END_RESPONSE

Do not explain how the values are determined.
For fields without corresponding information in USER_DATA, use the value NO_DATA.
For fields of type number, use only digits and an optional decimal separator.

In modern front-end applications, it is common for interfaces to provide their responses in JSON format, as this data structure can be easily processed further.

For the most precise and reliable results possible, the expected target structure can be defined using the JSON mode – in the form of a JSON schema. It describes the fields not only structurally, but also with precise type information. This saves detailed textual explanations and makes it easier to process the results in the front end.

To ensure type safety in the application, Zod is often used – a validation library geared towards TypeScript, with which data structures, from simple strings to complex nested objects, can be declaratively defined and reliably checked.

The following listing from OpenAI shows an example call to the OpenAI API to extract data in a specific JSON format.

import OpenAI from "openai";
import { zodTextFormat } from "openai/helpers/zod";
import { z } from "zod";

const openai = new OpenAI();

// JSON-Schema-Definition mithilfe von Zod
const CalendarEvent = z.object({
  name: z.string(),
  date: z.string(),
  participants: z.array(z.string()),
});

const response = await openai.responses.parse({
  model: "gpt-4o-2024-08-06",
  input: [
    { role: "system", content: "Extract the event information." },
    {
      role: "user",
      content: "Alice and Bob are going to a science fair on Friday.",
    },
  ],
  text: {
    format: zodTextFormat(CalendarEvent, "event"),
  },
});

const event = response.output_parsed;

Depending on the provider, various SDKs (Software Development Kits) are available for transferring system prompts and source data to an LLM. The listing above, for example, shows the use of the OpenAI SDK. Other examples of leading providers are Anthropic and Google. They each offer extensive functions, high performance and a user-friendly developer experience that facilitates the use of SDKs.

Of course, the use of AI models is not limited to web-based offerings from large providers. Those who can manage with smaller models for their tasks can also use locally running models or models integrated in the browser such as WebLLM.

Once the SDK calls have been successfully implemented and abstracted, a three-liner is sufficient for complete parsing.

The following is an example of how an extraction process works using a FormGroup defined in Angular:

/* User Message – Datenquelle, aus der Daten zum Befüllen des Formulars extrahiert werden sollen. Diese werden in die Zwischenablage kopiert

Max Mustermann
77777 Musterstadt
Kfz-Kennzeichen: KA-SL-1234
Versicherungsnummer: VL-123456

*/

// Angular FormGroup zum Erfassen persönlicher Daten
personalData: this.fb.group({
  firstName: ['', Validators.required],
  lastName: ['', Validators.required],
  street: ['', Validators.required],
  zipCode: ['', Validators.required],
  location: ['', Validators.required],
  insuranceId: ['', Validators.required],
  dateOfBirth: [null as Date | null, Validators.required],
  telephone: ['', Validators.required],
  email: ['', [Validators.required, Validators.email]],
  licensePlate: ['', Validators.required],
}),

// JSON-Schema, das mit Zod anhand der FormGroup erstellt wurde
{
    "firstName": {
        "type": "string"
    },
    "lastName": {
        "type": "string"
    },
    "street": {
        "type": "string"
    },
    "zipCode": {
        "type": "string"
    },
    "location": {
        "type": "string"
    },
    "insuranceId": {
        "type": "string"
    },
    "dateOfBirth": {
        "type": "object"
    },
    "telephone": {
        "type": "string"
    },
    "email": {
        "type": "string"
    },
    "licensePlate": {
        "type": "string"
    }
}

// Antwort des LLM
[
    {
        "key": "firstName",
        "value": "Max"
    },
    {
        "key": "lastName",
        "value": "Mustermann"
    },
    {
        "key": "location",
        "value": "Musterstadt"
    },
    {
        "key": "zipCode",
        "value": "77777"
    },
    {
        "key": "licensePlate",
        "value": "KA-SL-1234"
    },
    {
        "key": "insuranceId",
        "value": "VL-123456"
    }
]

// Befüllen des Formulars mit den Ergebnissen (hier eine Angular FormGroup --> personalData)
try {
  const text = await navigator.clipboard.readText();
  const completions = await this.openAiBackend.getCompletions(fields, text);
  completions.forEach(({ key, value }) => this.personalData.get(key)?.setValue(value));
} catch (err) {
  console.error(err);
}

Time-consuming fill-in work is now a thing of the past and can be completed effortlessly thanks to skillfully deployed AI support.

This example shows an executed extraction process: First the text with information is copied to the clipboard, then the extraction process is started, and finally automatically filled form fields are available based on the text information.

Illustration of the extraction process from the user's perspective (in three steps, from top to bottom).

This integration alone improves the UX enormously. However, from a UX designer's perspective, a closer look reveals even more possibilities:

What about traceability, for example? Currently, the fields of the form are filled in automatically based on the text or image submitted. Users can also customize and edit the form as they wish. This may be sufficient and unproblematic in most cases. However, in certain contexts, this alone is not enough – for legally binding topics such as insurance or banking. Here, it may be necessary to show which fields were filled in by a human and which were filled in with the help of AI support. For UX reasons, it also makes sense to communicate transparently to users how individual field values were created.

A look at the big players shows: When it comes to the visualization of AI-generated content, color gradients, glow and glitter effects are often used. The following examples show the visual design of AI content based on the design language of Apple and Google.

Examples of the design languages of Apple (above) and Google (below) in relation to their AI products.

(Image: Apple; Google)

So why not pick up on this pattern and use it for your own integrations? The large providers have UI/UX research budgets that smaller companies can only dream of. It makes sense to take inspiration from this, especially as the wide reach is already shaping new visual standards – Users are increasingly familiar with these types of displays.

An exemplary implementation in the form scenario shown could be to provide automatically filled fields with a glowing frame (glow effect). This simple measure creates a clear visual distinction – and improves the user experience at the same time.

Automatically filled fields are highlighted by a glowing frame (glow effect).

To further improve traceability, developers can incorporate a history function: It shows which automatic extractions happened when – including the sources used (texts, language or images). This gives users an overview at all times and, if necessary, they can simply undo/redo to return to a previous state.

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.