Python Programming Language: Developing and Optimizing Performant Algorithms
To determine the shortest route between cities, an algorithm is used. If it does not work efficiently enough, it can be optimized.
(Image: heise medien)
- Michael Inden
If you are planning a trip through several cities and want to find the shortest route, you resort to algorithms, a well-defined sequence of deterministic operations. This article accompanies the development process of an algorithm that finds the shortest paths between cities. It shows step by step the way from the first sketch through tests and visualization with Matplotlib and NetworkX to optimization through suitable data structures. This results in a program that not only works functionally correctly but is also performant.
The goal is to find the paths in a road network that connect cities most directly. Graphs can be used for modeling. In Figure 1, circles with labels represent cities, and the connecting lines with numbers correspond to paths with distances.
For this simplified map, the shortest path from A to D is to be found. While with few cities and connections all possibilities can be tried, the approach becomes more complex the more cities and connections exist. The following connections are possible, with 13 being the worst and 6 being the best:
A -> B -> C -> D => 5 + 1 + 7 = 13
A -> C -> B -> D => 2 + 1 + 3 = 6
A -> C -> D => 2 + 7 = 9
A -> B -> D => 5 + 3 = 8
Understanding efficiency with Big O notation
For good usability of programs, it is relevant how quickly calculations and operations can be executed. This is especially true for large amounts of data. Big O notation allows algorithms to be classified and the growth of an algorithm's runtime (or memory requirements) to be described as the input set grows. Thus, effects can be predicted, for example, when a list no longer contains 10 or 20, but 100,000 or more data records.
Videos by heise
Big O notation helps estimate the runtime of operations. It assigns algorithms and functions to complexity classes. With O(nÂł), the number of steps grows with the third power of the input set. With 100 input data points, the effort for the calculation is 100Âł = 1,000,000 steps. The lower the complexity class, the better. Further classes are shown in the table "Big O notation with algorithms classified into complexity classes"; Figure 2 visualizes the effects.
| Big O notation with algorithms classified into complexity classes | ||
| Notation | Meaning, growth | Example |
| O(1) | constant | Accessing a list element |
| O(log n) | logarithmic | Binary search |
| O(n) | linear | Simple loop through all elements |
| O(n log n) | linear-logarithmic | Efficient sorting algorithms (e.g., Mergesort) |
| O(n²) | quadratic | Doubly nested loop |
| O(nÂł) | cubic | Triply nested loop |
Classification with Big O notation is particularly important for comparing the scaling properties of runtimes independently of hardware, implementation details, and programming languages.
For a simplified assessment, only the dominant term is considered during evaluation, as small constants or lower terms and factors are negligible for large input sizes. In the formula n³ + 4n² + 3n + 7, the runtime class O(n³) results from the simplifications.
From idea to program
A systematic approach is the key to functional, maintainable, and performant code, even for smaller programs and especially for complex software projects.
1. Understand and analyze the problem
- clarify what problem needs to be solved and break it down into sub-problems;
- check if there are already proven solutions for sub-problems, such as binary search for efficient searches in sorted data sets, Dijkstra's algorithm for shortest paths;
- define input and output data;
- consider boundary conditions and special cases.
2. Plan and develop a high-level structure
- break down the problem into sub-problems;
- formulate or sketch workflows in natural language;
- choose suitable data structures (lists, dictionaries, heaps).
3. Implementation
- structure source code into clearly separated functions or classes;
- pay attention to readability and understandability, use meaningful names and (if sensible) supplementary comments;
- use existing libraries to save development time (e.g., Matplotlib for visualization).
4. Testing (Dry-run and unit tests)
- test the functionality;
- write unit tests to check functionality and cover edge and special cases.
5. Measure performance
- perform measurements with small, medium, and large data sets, e.g., with 100, 10,000, and 1,000,000 data records;
- identify bottlenecks – however, these usually only become apparent with very large data sets.
6. Optimize
If weaknesses were identified in step 5, the implementation and the chosen algorithms for sub-problems should be examined more closely again.
- Use Big O notation to formally evaluate complexity: What runs in loops? How and where is a search performed – linear or with binary search? For different actions, this can mean runtimes of O(1)
- use more suitable algorithms or more efficient data structures.
Interweave implementation and testing
In practice, steps 3 and 4 do not always run independently. If the results can be predicted well, it makes sense to start by creating test cases. However, sometimes an idea and a prototype of the implementation are needed first. Especially with larger programming projects, further requirements arise during the implementation and testing phase.
The following process has proven effective in practice and can also be applied when developing an algorithm.