Dynamically adjust data with implicit incrementalization – a C# example
How do you change data in a running program? Incremental change propagation and the .NET library NMF Expressions help with this complex problem.
(Image: erzeugt mit Dalle-E durch iX)
- Georg Hinkel
Computer programs receive an input and produce an output. This very old basic principle is still valid, but has always raised the question of how the system should react if the input subsequently changes.
With the large amounts of data and the large number of analyses and transformations that today's systems calculate, many questions arise when the underlying data changes: Is the analysis result still valid? Does the system really have to recalculate everything? Are there intermediate results not affected by the change that can be cached and reused? According to Phil Karlton, cache validation is one of the most difficult challenges in computer science. So how do you make sure you've done everything right?
Algorithms and data structures that can adapt to changes in the underlying data are referred to as either incremental or fully dynamic, the difference being whether the data only ever grows and new data is added incrementally or whether existing data is also dropped. Since, in practice, new changes are often only ever made incrementally to the data, the term incremental change propagation is often used. The process of converting an existing algorithm into one that supports incremental change propagation is called incrementalization.
The University of Budapest has published a rail network model [1] that can be used as an example to explain the problem. It contains elements for rails, switches, and signals as well as routes that run through the rail network. Each route starts and ends at a signal and runs along several switches. A small section of such a model is shown in Figure 1:
(Image:Â iX)
An important analysis in such a network checks whether all switches along a route set to GO are set as intended by the route. In the example, the input signal of the route marked in blue in Figure 1 should better show STOP, as some switches are not set so that they lead to the destination of the route.
Videos by heise
Algorithms without incremental change propagation
If there is a type-safe API for accessing the model elements, it is often easy to formulate a check algorithm without incremental change propagation. The following Listing 1 shows an implementation with the query syntax in C#:
from route in routes
where route.Entry != null && route.Entry.Signal == Signal.GO
from swP in route.Follows
where swP.Switch.CurrentPosition != swP.Position
select swP
The listing first filters the list of relevant routes for those whose input signal is GO. It then determines all switch positions for those routes where the referenced switch is set differently than intended by the route. There is nothing wrong with such an implementation, it is short, easy to understand and easy to maintain. The algorithm could be formulated similarly in other programming languages; Java's Streams API, for example, requires only slightly more boilerplate code.
The only problem is that the algorithm does not support incremental change propagation, and since every change to the network model can influence the result, the algorithm must be re-executed after every change. However, since the vast majority of changes only affect very few elements, a complete recalculation is inefficient for larger models.
Representation of changes and incrementalization
If developers want to incrementalize this algorithm, the first question that arises is how the system even notices that parts of the model have changed. Depending on the platform, there are more or less standardized interfaces for this. In .NET, the INotifyPropertyChanged and INotifyCollectionChanged interfaces have become established with the advent of WPF. The EMF framework exists for Java, which provides a standardized interface for changing a model.
The second question is what the result of incrementalization should look like, i.e., how to describe a value that can change – if the input data changes accordingly –.
Mathematically, incrementalization is best described using a functor from category theory [2]: A functor I consists of two parts. Firstly, a type A is mapped to a type I(A), which describes an incremental value of type A . Secondly, a functor maps a function f:A ⇾ B to a function I(f):I(A) ⇾ I (B). We also need a natural transformation val:I ⇾ id, which contains a component valA:I(A) ⇾ A for each type, which returns the current value of an incremental value, and a transformation const:id ⇾ I, which takes a value of any type as a constant (which never changes).
In many programming languages, functors are implemented with generic interfaces, such as in .NET IEnumerable. The illustration of how to map a function from f:A ⇾ B to I(f): IEnumerableA ⇾ IEnumerableB is called Select in .NET, for example, and often map in other programming languages.
To represent an incremental value in memory, a generic interface is therefore required, let's call it INotifyValue. For example, it could look like the following in Listing 2:
public interface INotifyValue<out T>
{
T Value { get; }
INotifyValue<U> Select<U>(Func<T, U> selector);
event EventHandler ValueChanged;
}
The interface also has an event that sends a message as soon as the value has changed.
Collections are a special case, as here it is typically not only interesting that the collection has changed, but also how it has changed. In .NET, there is the INotifyCollectionChanged interface for this. However, this is not generic, which is why it should be combined with the existing, type-safe interface IEnumerable. Listing 3 below shows the result with the name INotifyEnumerable:
public interface INotifyEnumerable<out T>
: IEnumerable<T>, INotifyCollectionChanged { }
Manual incrementalization
The third question in incrementalization is what types of changes affect the result of the algorithm, combined with the question of which intermediate results to save and when and how to invalidate them. There are usually many types of changes that affect the algorithm and the intermediate results in different ways. If the position of a turnout changes, this is only relevant if the turnout is also set to GO along a route with an input signal. If, on the other hand, the input signal of a route changes, all turnout positions along the route must be analyzed.
Anyone implementing the incrementalization algorithm manually must explicitly write code for every possible change to update the final result of the algorithm or the intermediate results. The result is much more extensive than the code in Listing 1 and also much more incomprehensible and difficult to maintain. In addition, there is what led Phil Karlton to say that cache invalidation is one of the most difficult problems in computer science: if you discard the intermediate results too often, the hoped-for performance benefits are reduced or even reversed. Worse still, if you forget to invalidate the intermediate results at some point, the result is often slightly incorrect and thus a subtle, hard-to-find bug that tests are highly unlikely to detect.
Dynamic dependency graphs
The goal of implicit incrementalization is to achieve incremental change propagation for algorithms without changing the specification of the algorithm. For this purpose, a dynamic dependency graph [3] is used, which shows the calculation steps of the algorithm with the current values as nodes. In the simplest and at the same time most comprehensive case, each instruction that is executed for the algorithm is a node in the graph. The nodes can then be typed according to the type of instruction, for example as explicit node types for accessing constants, binary operations, access to properties, etc. Each node also stores its current value. Each node also stores its current value.
The edges (i.e., the arrows) indicate data dependencies between the calculation steps. The aim is to make it possible for each individual node to determine when the calculation step must be repeated and what the current value is. The scope of the individual steps depends on the incrementalization system, usually at the abstraction level of a syntax tree: nodes represent individual binary expressions, references to properties or function calls.
(Image:Â Georg Hinkel)
Figure 2 shows a dynamic dependency graph at instruction level (in which each node represents the execution of exactly one instruction) for the sub-expression that the input signal of a route is set to GO. This figure also shows the change propagation when the displayed input signal of the route changes from STOP to FAILURE: The node representing access to the property is re-evaluated, continues to report that its value has changed, and notifies subsequent nodes such as the subsequent equality operator. This node checks the equality, and since the value does not change, the change propagation ends.
Dynamic dependency graphs can reshape themselves by processing a change; for example, the right-hand sides of a logical shorthand operator (i.e., && or ||) can be dropped or newly created, depending on how the value of the left-hand side develops.
While it is clear for binary operators how the result changes when the input data changes, this is initially unclear for function calls. To integrate function calls into dynamic dependency graphs for your functions, you can create templates for dynamic dependency graphs for these functions and copy them when calling the function. The placeholders for the parameters are replaced by the nodes of the argument.
Dynamic dependency graphs can quickly become very large. At the same time, the best dynamic algorithm for a problem is often very different from an algorithm that solves the problem without taking changes into account. It is therefore advantageous to store a specific dynamic algorithm for frequently used functions to achieve a smaller dynamic dependency graph that uses less memory but processes changes more efficiently.
For example, to dynamically calculate the sum of a collection of numbers, it is more efficient to simply add an added number to the last sum than to recalculate the sum from the index of the inserted number. If you look at queries like the one in Listing 1, it is clear for which functions an explicit dynamic algorithm is particularly interesting: the query operators, which make the syntax from Listing 1 possible in the first place.
Implicit incrementalization in C#
Figure 2 also shows that the creation of a dynamic dependency graph requires knowledge of the structure of a function, ideally in the form of an abstract syntax tree. Many approaches to implicit incrementalization (e.g. Self-adjusting Computation [3]) therefore work as compilers. Others (such as Active Operations [4]) merely provide an API that allows a dynamic dependency graph to be created by developers specifying the abstract syntax tree as an object model. C# allows an abstract syntax tree of a function to be obtained at runtime via the Expression API: For a function to get the expression tree of a function argument instead of a compiled method, it must be declared as an expression.
The C# compiler now allows lambda expressions to be implicitly converted into both a suitable delegate type and an expression of this delegate type, see Listing 4:
void Process(Func<T, bool> predicate); // kein Zugriff auf AST möglich
void Process(Expression<Func<T, bool>> predicate); // erlaubt Zugriff auf AST von predicate
The same language technology is also used in LINQ to translate C# queries to SQL.
NMF Expressions [5] is an open-source library that implements the creation of such dynamic dependency graphs via the Expressions API. For historical reasons, the API is called Observable.Expression. For example, the code in Listing 5 is suitable for creating a dynamic dependency graph that checks whether the input signal is set to GO for a given route:
var expression = Observable.Expression(
() => route.Entry != null && route.Entry.Signal == Signal.GO);
expression.ValueChanged += …
The result of this call is an instance of INotifyValue, which is defined very similarly to Listing 2. In particular, this instance provides a Property Value that can be used to retrieve the current value (which is always kept up-to-date by a Dynamic Dependency Graph) and an Event that is triggered as soon as the underlying model changes in such a way that the evaluation of the expression would have a different result.
To speed up the creation of dynamic dependency graphs and make their use more flexible, NMF Expressions defines a family of classes ObservingFunc equivalent to the delegate types Func, which each receive a function expression and use it to create a template for a dynamic dependency graph, which in turn retrieves changes to the data model via the INotifyPropertyChanged and INotifyCollectionChanged interfaces and then propagates them incrementally.
To do this, you can use the Observe method to replace any placeholders for parameters in the graph, either with a separate node or with a constant. To simplify the syntax, there is also an API here that improves type inference with static methods.
As an alternative to Listing 5, the code from Listing 6 can therefore be used to create a dynamic dependency graph that checks whether the input signal is set to GO. This form also has the advantage that the framework can make optimizations and only has to perform analyses of methods once. Listing 6:
var entryCheck = Observable.Func(
(IRoute route) => route.Entry != null && route.Entry.Signal == Signal.GO);
var expression = entryCheck.Observe(route);
NMF Expressions also implements dynamic algorithms for many of the query operators of the C# query syntax. This makes it possible to retain the example analysis from Listing 1 and to obtain an incremental algorithm from the specification of this query that generates a dynamic dependency graph. This means that changes to the results can not only be calculated quickly, but events can also be triggered as soon as changes to the model affect the result of a query.
Uncertain performance gain
Unfortunately, no general statement can be made about the performance of incrementalized algorithms. It and the algorithmic complexity depend largely on the dynamic algorithms used, but also on the type and implications of the changes. Some changes do not affect a dynamic algorithm at all and therefore do not cause any effort, while others affect a branch that requires the recalculation of a large part. In order to be able to estimate the effort involved, it is therefore necessary to clarify how much the respective change affects. If a large part of a calculation has to be repeated very frequently, it may not be worth maintaining a dynamic dependency graph.
There are many very efficient dynamic algorithms, especially for queries on enumerations where the order is irrelevant. If the filter condition or the selection result changes for one element, this has no effect on the calculation of the filter condition for the other elements, provided the calculation does not generate any side effects. High acceleration rates can often be achieved here because the changes only recalculate the parts of the analysis that are actually affected by the change. If these are only a few elements compared to the total number, there are hardly any limits to the possible speed-ups.
Benchmarks have also shown this [2], [5]. In the analysis in Listing 1, accelerations by a factor of 100 and more were measured. In other analyses, however, butterfly effects can occur, i.e. certain types of changes can lead to the entire analysis having to be recalculated one way or another.
The price of these accelerations is usually expressed in higher memory consumption for many intermediate results. However, since implicit incrementalization hardly requires any development effort and memory scales more easily than sequential computing power in particular, there are major advantages.
Own functions
NMF Expressions allows you to define your functions and incrementalize them manually using suitable dynamic algorithms – Many of the existing query operators are implemented in this way. NMF Expressions works with the ObservableProxy attribute for this purpose. If, for example, the check whether an input signal shows GO is to be outsourced to a separate function, a function must be specified via the ObservableProxy attribute that manually incrementalizes the separate function. The main reason for this is that NMF Expressions no longer has access to the abstract syntax tree. Listing 7 shows an example:
[ObservableProxy(nameof(IsEntrySignalGoInc))]
public bool IsEntrySignalGo(IRoute route)
=> route.Entry != null && route.Entry.Signal == Signal.GO;
public INotifyValue<bool> IsEntrySignalGoInc(IRoute route) { … }
Therefore, manual incrementalization is not optional: if the ObservableProxy attribute is missing from a method, NMF Expressions only re-evaluates the method if the arguments have changed. The only exception is if the arguments support the INotifyCollectionChanged interface. This heuristic is sufficient for most libraries in the standard library, as many basic data types such as character strings, timestamps or numbers are immutable in .NET. For custom functions, however, a proxy must often be specified via the ObservableProxy attribute, which manually incrementalizes the custom function.
Conclusion
Implicit incrementalization, for example via dynamic dependency graphs, makes it very easy to implement incremental change propagation by composing the algorithm from dynamic algorithms with a dynamic dependency graph. This makes it possible to implement events that are triggered automatically as soon as a value changes. Incremental change propagation is also often much faster than a complete recalculation. However, these performance advantages depend on the analysis, and developers should consider whether it is worth using. The disadvantage: more memory is required, and custom functions require explicit annotation.
Literature
- G. Szárnyas, O. Semeráth and I. Ráth, "The TTC 2015 Train Benchmark Case for Incremental Model Validation", in Proceedings of the 8th Transformation Tool Contest, L'Aquila, 2015, pp. 129-141.
- G. Hinkel, Implicit Incremental Model Analyses and Transformations, Karlsruhe: KIT Scientific Publishing, 2018.
- U. Acar, "Self-adjusting computation:(an overview)", in Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation, ACM, 2009, pp. 1-6.
- T. Le Calvar, F. Jouault, F. Chhel, and M. Clavreul, "Efficient ATL Incremental Transformations," Journal of Object Technology, vol. 18, no. 3, pp. 1-17, 2019.
- G. Hinkel, R. Heinrich, and R. Reussner, "An extensible approach to implicit incremental model analyses," Software & Systems Modeling, no. 18, pp. 3151-3187, 2019.
(vbr)