Create simple UX for domain-specific languages with VS Code

Tax returns, processes, or construction planning: Many things can be represented with a domain-specific language. Editors with LSP make implementation easy.

listen Print view
Mouth, from which letters float

(Image: lassedesignen / Shutterstock.com)

13 min. read
By
  • Georg Hinkel
Contents

Is it worth creating your own programming language with syntax highlighting, code completion, and other features in the editor just for a single project? What might sound like a lot of work at first glance has become significantly easier thanks to the Language Server Protocol (LSP). Due to the rich possibilities for interaction with text editors like Visual Studio Code, Domain Specific Languages (DSLs) are a fully-fledged and often underestimated way to design a UX.

Georg Hinkel
Portrait Georg Hinkel

(Image: 

Georg Hinkel

)

Georg Hinkel conducts research and teaches at RheinMain University of Applied Sciences, focusing on software engineering, model-driven software development, and distributed systems. He is also the maintainer of the open-source project NMF.

Users outside of computer science frequently specify requirements using forms. The more complex the requirements, the more complicated the implementation via form becomes. Figure 1 shows a grotesque example: specifying instructions for a compiler via a form.

Programming via form looks grotesque (Fig. 1)

Several reasons make this idea absurd:

  • The specification is completely bloated: The still incomplete implementation of Heron's method for calculating square roots can be achieved with just three lines of code.
  • As a direct consequence, the form is significantly harder to understand.
  • For experienced users, a programming language is much more efficient because they can write significantly more specifications in the same amount of time.
  • It is difficult to imagine how a form-based program specification can be versioned. Text-based algorithms, like those used by Git, probably won't work here.

Complex specifications are not unique to computer science; other fields also have difficult problems that are increasingly being solved with the help of computers. This can include workflow automation, the planning of components or buildings, the documentation of manual processes, and much more. A success factor for programming languages is certainly that programs are abstract anyway, while physical components or building plans have always been representable graphically. However, the example of process automation shows that abstract concepts also occur in many areas.

Within computer science, textual languages have proven their worth beyond programming. Admins have long since stopped creating cloud infrastructures via forms (this practice is often derided as ClickOps), but rather in text form using Infrastructure-as-Code. The reasons are the same: the specification is lighter, more understandable, easier to version, and faster.

Generic formats like JSON or XML are easier to version than forms. However, they were not originally intended to be human-readable but machine-readable. Since many applications require a machine-readable language anyway, it is convenient to offer it as an alternative interface.

But even such configuration files do not achieve the conciseness, readability, and efficiency of a domain-specific language, especially since validation technologies like XML Schema or JSON Schema only validate the structure of the documents. They contribute little to the domain analysis.

Complex processes that people want to express very concisely, readably, and versionably can be advantageously specified with a domain-specific language, with the same editor support as for programming languages. Examples include travel expense reports, ancillary cost statements, tax returns, sales queries, process automations, or construction planning.

Videos by heise

LLM-based chat functions are not a substitute for domain-specific languages; conversely, domain-specific languages can make the intention of an AI system easier to verify. Instead of letting agent-based systems perform critical actions (which is strictly regulated, at least in the EU by the AI Act), AI systems can also be made to reformulate a description in natural language into a DSL.

Developing a domain-specific language is not just about creating a parser: users are accustomed to the convenience of syntax highlighting, code completion, and easily jumping to definitions or references, which are essential for good UX and increased productivity. In recent years, this has become significantly easier through the Language Server Protocol (LSP) developed by Microsoft. This principle has expanded to other areas, such as graphical languages with the Graphical Language Server Protocol (GLSP). LSP and GLSP allow developers to largely forgo the development of UI components and focus instead on the semantics of a language.

For this reason, editors like Visual Studio Code can handle any programming language: while they always use the same UI as the frontend (in the case of Visual Studio Code, the Monaco editor), an LSP server provides the language-specific editor support. Since LSP is based on JSON-RPC, which does not define a transport layer, LSP can be operated either via stdin/stdout or via WebSockets. This allows for many deployment options: the client can run in a desktop application (in the case of Visual Studio Code via Electron) or in the browser, while the server can be embedded in the IDE or run on a remote machine.

Meanwhile, there are several frameworks for developing LSP servers, depending on the programming language with which you want to develop the DSL. A DSL is also used to describe the grammar of the DSL. The framework then generates parsers, classes for the abstract syntax tree, and editor support from this. Developers can supplement or override the result with manual code. Examples of such tools include Xtext for Java (see also: [1]), Langium for TypeScript, and AnyText for .NET (see also [2]).

The starting point for developing a new DSL with all three mentioned frameworks is a grammar that also expresses the abstract syntax of the language. The abstract syntax here is the definition of the concepts that make up the language. For the grammar, Xtext and Langium use context-free LL(k) grammars, while AnyText uses Parse Expression Grammars (PEGs). Both classes of grammars work with non-terminals (= placeholders) and production rules that determine how a non-terminal can be replaced. These production rules can be conveniently specified using the Extended Backus-Naur Form (EBNF) metalanguage, which is additionally augmented with assignments to also specify the abstract syntax.

Conveniently, Langium offers the Playground, a way to start developing a DSL directly in the browser without installing any software. For Xtext or AnyText, the tutorials offer good starting points.

For example, to specify the (simplified here) declaration of a class in a programming language, the following fragment is sufficient:

Class: ‘class’ name=ID ‘{‘ members+=ClassMember* ‘}’;

This example assumes that there are other non-terminals ID and ClassMember that define what an identifier looks like and what valid members of a class can be. The postfix operator * allows for any number of members. Alternatively, + or ? can be used to describe at least one or at most one occurrence. The operator | also allows for alternatives, enabling different types of members. If a rule consists exclusively of alternatives, this is represented by inheritance in the abstract syntax.

The assignment = or += further instructs the system to assign the result of the non-terminal ID or ClassMember to the abstract element of the non-terminal. This can be either as a single-valued property (e.g., that the result of ID should form the name) or as a multi-valued property, that the result of ClassMember should be added to the list members.

From these assignments, classes can then be derived to represent the abstract elements of the language in memory. Both Xtext and Langium, as well as AnyText, support the operator [], which can be used to map references, meaning that only a reference to another syntactic element should appear at a given position.

While the construction of the parser remains largely hidden from developers, some types of grammars have limitations that DSL developers must consider. For example, productions in context-free grammars are unordered, while in PEGs they are ordered, which is why there are no ambiguities due to the construction. Common parsers for context-free grammars do not support left recursion, but PEGs do.

Effects of whether alternatives for productions are ordered.

The order of alternatives means that an alternative is only considered if the preceding alternatives fail. This results in no ambiguities, and for example, in C-like languages, an else block would always be associated with the inner if statement. This can be desirable in many cases because the language is never ambiguous, but it can also go against intuition. For instance, the grammar S: ‚a’S’a’ | ‚aa’ matches all words with an even number of the letter a as a context-free grammar, but as a PEG, it only matches words consisting solely of a whose length is a power of two greater than 1. Especially if you are used to context-free grammars, this behavior is counterintuitive. In my opinion, however, this only affects rather pathological cases.

Support for left recursion is particularly important for expressions, as it allows for very intuitive formula implementation. In particular, binary expressions are themselves expressions, but they also begin with an expression. For PEGs, Warth and others [3] have developed an extension of packrat parsers that can parse left recursion while maintaining the linear runtime behavior of packrat parsers.

In AnyText, a simple expression grammar can therefore be implemented as shown in Listing 1 below. Here, the keyword returns specifies which class of the abstract syntax the non-terminal returns. In the example, it serves to avoid introducing a class for each individual non-terminal. The keyword enum is used to represent a fixed set of values. Furthermore, the keyword parantheses describes a non-terminal for parentheses, and terminal describes a terminal, represented by a regular expression.

grammar Expressions (exp)
root Expression

Expression:
  AdditiveBinary | Multiplicative;
AdditiveBinary returns BinaryExpression:
  left=Expression operator=AdditiveOperator right=Expression;
enum AdditiveOperator returns BinaryOperator:
  Add => '+'
  Subtract => '-';
Multiplicative returns Expression:
  MultiplicativeBinary | LiteralExpression | VariableExpression | ParanthesisExpression;
MultiplicativeBinary returns BinaryExpression:
  left=Multiplicative operator=MultiplicativeOperator right=Multiplicative;
enum MultiplicativeOperator returns BinaryOperator:
  Multiply => '*'
  Divide => '/';
LiteralExpression:
  value=Number;
VariableExpression:
  variable=Identifier;
parantheses ParanthesisExpression:
  '(' Expression ')';
terminal Number returns nmeta.Integer:
  /\d+/;
terminal Identifier:
  /[a-zA-Z]\w*/;

LSP is not limited to parsing text but also allows for practical interactions with text in the IDE. Features like Code Lenses or Code Fixes are familiar to programmers. They allow analysis results, such as the number of references or the author and date of the last modification of a method, to be easily displayed in the viewport. Interactions such as refactorings can also be initiated directly from the code.

Features of this type include:

  • Diagnostics: In rare cases, restrictions can be expressed solely through grammar; domain-specific analyses can also indicate errors. For example, in a DSL for transfers, one could incorporate an analysis of the recipient's name and report errors if the name does not match the IBAN.
  • Code Lenses: Code Lenses can display any string at any position in the text. This allows many types of analyses to be displayed. Code Lenses can also offer actions; for example, clicking on the display of references usually opens a window with details. Some editors also offer to start unit tests via Code Lens. In a very generic way, Code Lenses could be used to apply the currently described state in a DSL to a modeled system.
  • Code Actions: Represented by editors like the Visual Studio family with a lightbulb icon, Code Actions allow context-appropriate interactions to be performed.
  • Inlays: Inlays display text in the editor that is not actually there. While this feature is primarily used in programming languages to display inferred type signatures or parameter names, it can be used in principle for any analysis.
  • Hover: If a user hovers the mouse over a token, an LSP server can provide context information. A very clever use, for example, is to offer a hover text for keywords with explanations. Especially because a DSL is intended for a limited user group, such support can make the application more accessible.

AnyText makes developing these features particularly easy by generating a class for each rule in the grammar, where developers usually only need to override a corresponding method to activate the respective feature. However, these features can also be easily used with Langium or Xtext.

Domain-specific languages, alongside forms or AI agents, deserve their own place as a way for end-users to specify the problems the computer should solve. DSLs are primarily aimed at experts who need to create such specifications frequently, for whom productivity, easy versioning, and interoperability of textual languages are particularly advantageous.

The development of new DSLs has been dramatically simplified by innovations such as the Language Server Protocol and frameworks like Langium, Xtext, or AnyText. Where developers previously had to write parsers by hand, they can now largely derive them from a grammar specification with editor support, which greatly reduces the effort. Therefore, DSLs should certainly be considered as an alternative UX technology.

[1] M. Eysholdt and H. Behrens, “Xtext: implement your language faster than the quick and dirty way,” in Proceedings of the ACM International Conference Companion on Object Oriented Programming Systems Languages and Applications Companion, Reno/Tahoe, Nevada, USA, Association for Computing Machinery, 2010, pp. 307–309.

[2] G. Hinkel, A. Hert, N. Hettler, and K. Weinert, “AnyText: Incremental, left-recursive Parsing and Pretty-Printing from a single Grammar Definition with first-class LSP support,” Proceedings of the 18th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2025, pp. 98–111, June 12–13, 2025.

[3] A. Warth, J. R. Douglass, and T. Millstein, “Packrat parsers can support left recursion,” in Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, San Francisco, California, USA, Association for Computing Machinery, 2008, pp. 103–110.

(mki)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.