Web development: The new TypeScript compiler in Go
Page 2: Details on the rewrite
The rewrite of TypeScript in Go is not actually a rewrite, but a 1:1 port of the existing source code. In this way, the development team wants to ensure that the existing features are also available in the same form in the new version and, above all, that they behave in the same way for error messages and edge cases. The JavaScript engine is intended for use in the browser and is less suitable for CPU-intensive tasks such as those that occur in the course of TypeScript's compile process. In addition, real parallelization and the use of shared memory in JavaScript are only possible to a very limited extent.
In contrast to the combination of TypeScript and JavaScript, Go offers a whole range of advantages. For example, Go is a statically compiled language whose source code is translated directly into machine code. JavaScript, on the other hand, is a just-in-time language whose source code is only converted into bytecode by the JavaScript engine during execution and then interpreted. Go has automatic memory management with a garbage collector, allows parallelization of tasks and has efficient options for data exchange such as channels and shared memory.
According to Anders Hejlsberg, there are two reasons for the impressive ten times faster performance of the TypeScript Go implementation. Firstly, the native Go code is significantly faster compared to the JavaScript code of the original implementation and secondly, the compiler is able to parallelize certain tasks with the new architecture.
To achieve this, the new compiler intervenes in the phases of the compiler's processing. The compiler can easily parallelize the processing of individual files by the parser, binder and emitter. The responsibilities of these three tools are
- Parser: The parser takes over the first step in the compile process of TypeScript. It reads the TypeScript source code and converts it into a tree structure, the Abstract Syntax Tree (AST), based on the rules of the programming language. The AST stands for syntactic structures such as functions, classes and instructions.
- Binder: The binder links related declarations such as different parts of an interface definition, functions and modules with the same name and uses symbols for this purpose. With this information, the system can collect and analyze type information about the declarations to ensure consistency and correctness.
- Emitter: The emitter is the last step in the processing chain and is responsible for converting the code processed by the binder and checked by the checker into JavaScript code.
Between the binder and the emitter is the type checker, which checks the type consistency of the source code. If type annotations are incorrect or variables or functions with the wrong types are used, it issues an error message. This element of the compiler cannot be parallelized like the other components. Instead, the development team takes a different approach here and processes the source code with several instances of the checker. Each instance is responsible for a part of the code. Although this leads to overlaps during checking with some memory overhead, it speeds up the process overall.
The first part of TypeScript that the developers have ported to Go is the command line compiler tsc, which checks the TypeScript code of an application and translates it into JavaScript. This tool is primarily used during the build process of an application.
However, TypeScript does not only consist of tsc. One of the most important other components is the Language Service. This part is jointly responsible for the convenience features in modern development environments. The Language Service is a long-lived background process that initially reads and processes the entire source code. Based on the information collected, it provides the basis for the tooltips when hovering over code structures with the mouse, helps with navigation in the code and ensures that errors in the code are highlighted.
The Language Service is of paramount importance for a good developer experience. A slow processing process is reflected in a lengthy opening of projects in the IDE, delayed response times limit the features of the IDE and excessive memory consumption slows down the IDE and the overall system.
The Language Service follows directly after the command line compiler in the order of priority for porting. A prototype of the Language Service is already available for Visual Studio Code. The first implemented features include the display of errors, support for tooltips and jumping to the definition of a variable, function or type.
In addition to the command line compiler and the Language Service, the developers are working on an interface via which other processes can communicate with the Language Service using interprocess communication and query the type information. There is still some conceptual work to be done on this interface. What is certain is that it should support the LSP, the Language Server Protocol.
Try out TypeScript Corsa for yourself
The team is developing the Go version of TypeScript on GitHub in a separate repository called typescript-go. It is publicly available and makes it possible to test the early development version. The current version is explicitly not suitable for production use and is not yet fully feature-compatible with the original TypeScript version. In this way, the development team hopes to receive early feedback from the community and also involve the maintainers of tools, libraries and frameworks at a very early stage.
At the time of writing, typescript-go has to be built in-house. The target system must fulfill a few requirements for this:
- Git: To build typescript-go, the source code of the project, including its git submodules, is required. This can also be downloaded manually, but it is easier to use Git to clone the repository locally.
- Go: For compilation, the documentation recommends Go version 1.24 or higher.
- js: The JavaScript dependencies are managed with npm. Therefore, Node.js and npm should be installed in a current version on the target system.
- hereby: The project hereby uses the npm package as a task runner, for example to start the build or execute the tests.
The official Docker image of Go serves as the basis for the following tests. The advantage of this setup is that no additional software is required on the target system and the experiments are reproducible. The only requirement for this is a container runtime such as Docker.
The commands in Listing 1 ensure that a built and executable version of typescript-go is available in the container. The version of typescript-go is able to build TypeScript projects and then execute them.
docker run -it --rm golang:bookworm
apt update && apt install -y nodejs npm
git clone --recurse-submodules https://github.com/microsoft/typescript-go.git
npm ci
npx hereby build
Listing 1: Building ts-go in a container
The command sequence starts a container based on the go image and installs Node.js. The git command then clones the typescript-go repository with all associated submodules. npm ci stands for clean install and ensures that all npm dependencies are installed. Calling npx hereby build starts the build task defined in the hereby configuration and uses it to build typescript-go.
After this procedure, the command tsgo is available in the built/local directory. By default, it behaves like the TypeScript compiler on the command line with the --diagnostics option. It compiles the TypeScript source code based on an existing TypeScript configuration and outputs additional diagnostic information such as the runtime of the individual phases or the memory consumption.
As a first test run, the typescript-go compiler should compile the code from Listing 2.
function add(a: number, b: number): number {
return a + b;
}
const result = add(1, 2);
console.log(result);
Listing 2: TypeScript source code for the translation
A prerequisite for a functioning build is the existence of a suitable tsconfig.json file. Unlike the original tsc, the Go variant can not yet generate such a configuration. A call to npx tsc --init uses the TypeScript variant to fulfill the requirement. A subsequent call to built/local/tsgo translates the code of the example into JavaScript and makes it executable. Even this simple example shows a clear difference between the TypeScript and Go variants of the compiler.
Table 1 compares the results. It shows how much memory the compile process required for the original TypeScript variant as well as for tsgo in single-threaded mode and with parallelization. It also lists the time taken by the individual phases and the total time of the compile process.
| Â | tsc | tsgo -singleThreaded | tsgo |
| Verbrauchter Speicher | 54,411K | 18,536K | 19,494K |
| Parse-Zeit | 0,10s | 0,025s | 0,019s |
| Bind-Zeit | 0,06s | 0,008s | 0,006s |
| Check-Zeit | 0,01s | 0,001s | 0,002s |
| Emit-Zeit | 0,00s | 0,00s | 0,00s |
| Gesamtzeit | 0,17s | 0,034s | 0,027s |
Table 1: Compile process of a simple file
This picture continues, not only for small TypeScript applications such as a simple Nest.js application initialized with the Nest CLI, but also for large codebases such as Visual Studio Code written in TypeScript.
Table 2 contains the results of the compile process of date-fns, a TypeScript library for date and time handling with a total of 334,000 lines. The table breaks down the memory consumption and runtimes for the TypeScript, single-threaded and multi-threaded variants.
| Â | tsc | tsgo -singleThreaded | tsgo |
| Verbrauchter Speicher | 526,871K | 270,145K | 291,984K |
| Parse-Zeit | 0,79s | 0,246s | 0,164s |
| Bind-Zeit | 0,24s | 0,057s | 0,048s |
| Check-Zeit | 1,35s | 0,317s | 0,216s |
| Emit-Zeit | 0,36s | 0,184s | 0,096s |
| Gesamtzeit | 2,75s | 0,804s | 0,523s |
Table 2: Compile process of date-fns
The optimizations of the new compiler therefore affect both the runtime and the memory consumption. The command tsgo -singleThreaded switches off the parallelization of the tasks in the compiler so that only the improvements made by the native Go code are visible. Without this option, the compiler parallelizes the processing of the source code and thus achieves even better processing times. However, this comes at the cost of slightly higher memory consumption.