Practical use of artificial intelligence in software architecture

Page 2: The memory of the machine

Contents

The LLM's neural network has been trained. On the one hand, this is good, because it is based on a large part of human knowledge. But it is no longer learning. So it doesn't know the latest technologies or my preferences in architectural work and it doesn't learn from dialogs or its own mistakes.

The knowledge base on which most LLMs are trained is a black box: it is unclear what information is actually contained in the training data and what is not. However, a few assumptions can be made. For example, we can assume that the majority of open source projects (code, issues, discussions, tutorials and instructions) are included. In theory, this is a good basis for architecture work.

However, outdated information also flows into the training of the model and it provides this. It doesn't have to fall into the infamous hallucination to produce bad answers. Another interesting effect is the bias created by the training data. Popular programming languages and frameworks appear more frequently in the training data. Commercial software whose code is not public and whose instructions are behind a login, on the other hand, will not be known to the model.

On the one hand, AI expands the solution space for decisions and problems with its extreme knowledge, but on the other hand, architects must always validate answers and take knowledge gaps in the model into account.

Every dialog with the machine starts from scratch. The context is empty, the model has forgotten all previous conversations. But there are some approaches that work against this.

  • In-context learning: Within a session, the chat front end is able to learn about the context, which is why it is important to work with the context correctly. It learns when users give it precise instructions on what they expect. It also learns from feedback on its responses. As mentioned earlier, it makes more sense to specify a query more precisely than to expand the context via feedback so as not to overload it. It also learns from its own answers, so it makes sense to work on a problem together with the model instead of just giving it instructions and hoping for a good answer. Using the chain-of-thought approach, where you ask the model to first formulate a solution strategy and then work out the individual solution steps, you work out a solution with the model step by step and give it more time to think.
  • System prompt: Things that the system should remember beyond the limits of a dialog can be given to it in the system prompt. This is a special prompt that appears at the beginning of every dialog and is normally hidden. With ChatGPT, however, the system prompt can be changed via the "Configure ChatGPT individually" dialog. This allows information and contexts to be transferred to a long-term memory (Figure 3). Outside of Europe, ChatGPT has a reminder function. It allows users within a dialog to ask the machine to remember something. The LLM can then independently formulate its findings and save them across dialogs. This means that it is no longer necessary to constantly expand the system prompt. It happens almost incidentally in dialog. The system prompt and the reminder feature –, if available in this country – permanently convey preferences regarding architectural styles or technologies to the model. However, users should not be surprised if the model's answers then lose variation.

The system prompt that precedes each individual prompt can be varied via "Configure ChatGPT individually" (Fig. 3).
  • System prompt for arc42: In many cases it is expedient to use part of the system prompt for the arc42 template for documentation (see box "Example of a system prompt related to arc42"). This prompt is structured in such a way that it can be part of a larger system prompt. It utilizes the model's ability to ask questions rather than simply follow instructions. This helps with targeted brainstorming. The result will not be a perfect architectural description, but it will be a start to help architects overcome the fear of the blank sheet of paper. For many LLMs, it doesn't seem to matter whether this prompt is formulated in German or English. The results are similarly good.
Example of a system prompt related to arc42

When asked to help with arc42, you use the following rules:

You are an expert software architect who knows how to work with arc42 and how to create quality driven architectures with architecture decision records (ARDs) which are based on quality goals and scenarios.

Help me to create an arc42 architecture for my current project by asking me the right questions.

Use my answers to create an arc42 asciidoc document chapter by chapter.

Use one file per chapter and a master document which includes all those chapters.

Use plantUML diagrams to visualize ideas.

Do not fill the arc42 document chapter by chapter but collect the information in a logical way and add them to the right chapter of the template whenever you collected them.

Start with the most important chapters and ask me the questions one by one, consecutively.

Chapter 1, 2 and 3 are the most important ones.

Continue with the Quality Goals and Scenarios.

Then work on the solution strategy and create ADRs (chapter 9) based on the quality goals and scenarios along the way.

Try to extract risks and technical debt from the ADRs and add them to the risk and technical debt chapter.

Fill the remaining chapters with information whenever you get it.

End every output with a question or a recommended next step.

In this context, two further advantages of the LLM stand out. Firstly, it can formulate better than many users. The model transforms scraps of thought into clear sentences. Secondly, it thinks for itself. When formulating an Architectural Decision Record (ADR), architects can ask the model whether it sees any risks or technical debts and what the consequences are from its point of view.

However, caution is required here: The model does not have a complete overview of the project and will only assess the risks, technical debts and consequences with the locally available knowledge and its broad but generalized knowledge base from its training data. The architect remains responsible for verifying the proposals, finding errors and contributing their own project knowledge.

It is often argued that the trained models are very good at generating credible, well-formulated but incorrect answers. As an expert, I see this as a challenge rather than a problem. With every answer the model gives, I have the ambition to improve the answer with my expertise and beat the model. Often not an easy task, but a good exercise.

New ideas and concepts are usually developed in an architecture. One risk here is that the ideas do not deliver the desired results. This is where validating the idea through a short proof of concept (PoC) with executable code can help. If the model is given the chance to validate its own statements by being allowed to execute the generated code, it can iterate over its own ideas and improve them until they work. This generates working code and saves frustration (Figure 4).

ChatGPT can check and improve its own architecture concepts with its own error correction (Fig. 4).

In ChatGPT, this approach can be demonstrated with the simple prompt (see box "Prompt example for self-correction through testing"). If the AI does not immediately recognize that the lists are not sorted alphabetically but by the number of letters, the first tests will fail. The AI thus becomes a valuable sparring partner when creating a PoC to validate the architecture.

Prompt example for self-correction through tests

Create a method for sorting a list of strings in Python. Test the method using the following examples. Execute the tests with the code interpreter. Correct the code if the tests fail.

Examples:

Input: banana, orange, kiwi, apple, tomato. Expected output: Kiwi, apple, orange, tomato, bananas

Input: grapes, nuts, melon, apricot. Expected output: Nuts, melon, apricot, grapes

This principle can also be applied to other architecture tasks. For example, it can be helpful to have the AI create an initial draft of an architecture diagram. This works quite well using tools such as PlantUML, which convert a textual description in a Domain-Specific Language (DSL) into a diagram. Although LLMs know the syntax of PlantUML as a DSL, they occasionally make mistakes when creating the code. If you use ChatGPT as a front end, you have to manually copy the textual diagram to PlantUML for rendering. If an error occurs, the model is made aware of it – a slow, often frustrating cycle. A better way is to let the LLM render the diagram itself via the PlantUML API. Syntax errors are returned directly to the model, whereupon it can correct the error itself (Figure 5).

ChatGPT generates a simple C4 diagram that can be further varied based on this (Fig. 5).