The rise of platform engineering – the next big thing?

Page 2: First attempt at a reference model

Contents

A standardized reference model for platform engineering that companies can use does not yet exist. A description therefore remains abstract for the time being. However, there is a consensus as to which parts the platform should comprise. As a rule, it must be integrated into two basic workflow systems: Development (Dev) and Operations (Ops). The former maps the functions and user stories to be developed, while the latter maps the operational workflow with service requests and incidents. Examples of these two systems include Atlassian Jira and the ServiceNow cloud platform.

From a developer's perspective, both workflows provide information for the work to be carried out and require updates as soon as the respective work is completed. Therefore, both must be an integral part of the platform so that updates to the systems can also be fully automated.

The two workflow components usually come together in the IDE on a developer's workstation. Ideally, most interactions with the platform should also take place directly from this workstation or via a dedicated developer portal. In any case, they should be built on a self-service basis and with API support to allow teams and individuals to tune the developer experience

The interface between the developer platform and the development teams is of crucial importance, as changes to the functions offered must be consumable and communication regarding any changes must be seamless.

In addition to the developer-side aspects of the developer platform described above, the areas of network, storage and computing power are particularly important on the infrastructure side. In a pure cloud environment, the services of the public cloud are available to the developer platform in the so-called landing zone. However, since in practice many companies also have to include their own traditional data centers or a private cloud in the platform concept, it is important to work closely with the owners of the respective services from the outset. This is because the standards and templates used as well as the permitted permutations of the infrastructure services can be limiting factors for a developer platform.

Another component that has become firmly established in software development is containers. They contribute to accelerated application development and provide advanced functions for operation - but containers also require additional management. The developer platform should therefore offer an interface via which both the services provided by the container platform and consistent container templates can be used.

A third important dimension is the data platform, which enables access to the company's data. In production, it serves as a database for all of the company's business processes. As part of the developer platform, it provides the data that is required in non-production environments. This includes functions such as test data management and data masking to identify and manipulate data required for the various aspects of software development and testing.

In addition, the data platform can also hold data that is operationally relevant within the developer platform, such as logs, metrics and traces or the SDLC data around the software development life cycle or test metrics. However, the SDLC data is usually stored and managed separately in a platform known as the Software Engineering Intelligence Platform (SEIP) (a term coined by the market research company Gartner).

Last but not least, the developer platform must also take into account the components relevant to application operation, such as observability and monitoring, as well as other business-specific functions.

Setting up a successful developer platform requires comprehensive architecture planning and a high degree of collaboration, as it is at the center of all business functions. The core element for orchestrating a developer platform in the reference architecture described here is the software engineering platform (see Figure 3). Its task is to orchestrate the SDLC functions required for software deployment and to provide them in an economical manner. As there can be several tools and providers for almost every function, the Software Engineering Platform must above all make the associated complexity manageable. It should provide at least the following capabilities:

Functions available for software engineering in a software platform architecture (Fig. 3).

(Image: Micro Hering / Accenture)

In practice, these requirements may lead to confusion, as there are different ways of grouping these capabilities and some companies may also use individual terms for some of these capabilities. In any case, the presented framework of a reference architecture can be used to define an initial developer platform in companies and to supplement or adapt it later if necessary. The reference model also makes it clear where and how platform engineering can be used to tame the problem of growing complexity in a structured way.

Two examples illustrate how implementation can succeed in practice. In the first company, the platform engineering team was responsible for setting up all SDLC tools, including security scans, build, deployment and infrastructure. The company relied on a complex technology stack with a mix of Software-as-a-Service (SaaS), packaged and custom applications hosted both in their own data center and in the public cloud. The platform engineering team worked with the application teams to embed the new developer platform into the way the teams worked, and where changes were required, a compromise was sought and found. During this phase, the company underwent a transition to vertically integrated value stream teams that take independent responsibility for different business processes such as infrastructure or software support. The Developer Platform should help to make these teams more independent.

The status of the SDLC process could be tracked as part of the implementation of the Developer Platform so that the teams could track their development progress in the shared dashboard. This gave them greater independence within a complex organization - one of the strategic goals of the Developer Platform was thus fulfilled. The company was able to create a level of transparency and ownership at the value stream level that was previously not possible. Although progress has been slow in practice, the deliberate involvement of the application teams has contributed to the stability and success of the Developer Platform.

In the second example, a company had the ambitious goal of moving all its applications to the cloud. Specifically, this involved several hundred applications consisting of a mixture of individually developed and pre-packaged software. The company created an integrated developer platform based on common open-source tools. In addition, the technical director pushed the use of cloud-native functions. The selected standards were designed for migration to the cloud, including full automation of the infrastructure and all applications. The team's ambitious flagship project included full end-to-end automation for infrastructure, application and testing.

As the project progressed, the standards apparently proved to be set too high. The barrier to entry was so high that only a few teams actually used the new platform. Teams that had previously mastered the management of their applications without any problems now suddenly had to learn how to use the IaC tool Terraform, deal with Helm charts and interpret the results of security scans. The high learning curve and cognitive load proved to be a real obstacle.

Although all the teams were committed to the shared vision, there was a lack of support for their thorough training to successfully implement the plans. The case shows that a solid technological foundation alone is not enough if there is no customer focus. The needs of the internal teams were not sufficiently met to make the product - the Developer Platform - successful. The company eventually abandoned the platform because it lacked acceptance and use by the teams.

The two case studies make it clear that the creation of a developer platform is not an implementation project in the traditional sense. Rather, it is a capability building exercise that needs to be done in appropriate steps and should be guided by the principles of product development. A crucial aspect of this is to "sell" the developer platform to the development teams in the right way. This requires convincing branding and product change management established throughout the organization. Companies that succeed in all of this at least have a better chance of successfully implementing a developer platform and thus mastering the increasing complexity.

Acceptance is a decisive criterion for the success of new approaches in the company - this also applies to a developer platform. If the introduction of a uniform tooling approach steered by a strong central hand has not worked, why should platform development be any different? However, a number of best practices have already emerged that contribute to greater acceptance of platform engineering. The first step is to treat platform engineers and other stakeholders in the organization as customers, following best practices from product management. The developer platform has an immense influence on how and what the IT department in the company develops. It therefore makes sense to consider it as one of the most important business products.

Companies should therefore first carefully examine which functions are required for internal customers. The tension is between helping platform engineers solve their problems and achieving the required security standards at the same time. Maintaining the balance between standardization and flexibility is an ongoing task for the platform development team. Greater standardization reduces the cost and complexity of the platform. At the same time, however, the risk of platform engineers feeling inadequately supported increases. More flexibility creates a better customer experience, but increases the cost of building and maintaining the platform.

As there is no optimal solution to this problem, an "inhale, exhale" approach is advisable. This method allows for a limited amount of experimentation across the platform, but requires all non-standard components to be questioned and re-evaluated at regular intervals. This results in phases where the platform grows ("breathing in"), followed by phases where it shrinks ("breathing out"), when exceptions for components that have not proven themselves are dropped again. The rigor of this process is a crucial factor that ultimately determines the cost of the developer platform and its maintainability.

Another lesson from product management is branding. The internal developer platform requires a certain level of "sales support" to be perceived positively. Successful platform teams take a true marketing approach by giving the platform a memorable name and launching awareness and retention campaigns. These can be internal podcasts, blog posts or similar media aimed at the platform's stakeholders. Feedback from users to the platform development team is also important for branding (see Figure 4). To this end, the upcoming roadmap should be transparent and, ideally, feature requests should also be collected from all platform users. The careful selection of functions with the necessary improvements from the perspective of the platform and/or security and other stakeholders contributes to the legitimacy of the platform within the company.

As users, developers have a direct influence on the prioritization of new functions (Fig. 4).

(Image: Accenture)

The second notable challenge is the complexity of the platform. Unfortunately, the tools used in an IT organization do not all follow a standard for data and integration. This is not surprising given the fragmented nature of the IT tooling market where new products and vendors are constantly emerging (see Figure 5). This increases the complexity of integrating all the required tools into the platform to a level comparable to integration efforts for business applications.

New products and providers are making the tool landscape increasingly complex (Fig. 5).

(Image: Sapphire Ventures Blog)

Before all IT tools can be integrated into the developer platform, a suitable process and data model should be defined. The necessary workflows in the various tools can only be planned sensibly if it is defined what the processes look like and what data is to be used and recorded. Maintaining a consistent data and process model is particularly important with a growing platform - but it also entails additional creation and maintenance work. It should also be noted that some commercial tools pre-define data models and processes that need to be aligned with your own processes and data model. The ability to integrate and adapt the data model is therefore an important aspect when making architectural decisions regarding the platform.

A standardized company-wide process and data model for IT processes also offers further advantages: for example, the deployment cycles of SAP and Java applications can be compared, even if they use different tools - as long as both follow the same overarching process and data model. In such a scenario, a central engineering dashboard can also be set up that is available to all agile development teams. This creates a common language that facilitates collaboration between the various technology "tribes" in the company.

To keep the complexity of the tool landscape manageable, companies have often stuck to a single tool provider in the past. However, although many providers are constantly adding new functions to their tools, it remains the exception that one provider alone has the right tools for all requirements in the company. It therefore seems sensible to create your own process and data model until an industry standard emerges. Platform engineering has the potential to act as a catalyst on the way to an industry standard.

The final challenge that needs to be overcome is the issue of cost. Who pays for the developer platform and how? Of course, there is always the option of setting aside a separate budget for it, and the most obvious step is to fund the work as a cost component through the overall IT budget. However, this means introducing new costs - which can often make it difficult to provide sufficient funding. On the other hand, if you consider who benefits most from the Developer Platform, it makes sense to use at least part of the budget from the areas of technology, infrastructure and security.

From his own practical experience, the author of this article can confirm that this approach works well if the platform is seen as an important factor for active change in the company. Then it also makes sense to see the platform costs as capital costs and not as operating costs. To finance the ongoing development of the platform, a fee could be charged for each project and each new function. This is comparable to the situation when a technician sits in the agile team and maintains the team's internal DevOps tools, but his compensation is booked to the project. In the case of the developer platform, however, some of the funds go into a central pool to finance the company-wide platform.

However, this model has advantages and disadvantages. On the one hand, with change-based financing, the platform can react dynamically to an increase or decrease in IT activities as a whole, as the inflow of funds increases or decreases accordingly. On the other hand, the dependence on the scope of IT activities can lead to the initial investments in the platform in particular flowing too slowly to ensure continuous expansion. If IT activities even decline sharply overall, there is a risk that insufficient funds will flow to ensure the continued existence of the platform.

It is therefore essential for the platform development team to measure the positive effects of the platform. All three aspects discussed so far come together here: Successful measurement can only be achieved based on a standardized data model, the measured values can be used to justify investments in the platform, and they can also help to sell the "Developer Platform" product more effectively within the company.

An obvious metric for the benefits of the platform is the labor and cost savings achieved through automation. For example, it is possible to determine how many working hours have been eliminated as a result of the introduction of the platform. It also makes sense to measure the reduced risk and improved security situation as a result of standard technology templates. After all, every security risk represents a business risk that can be assessed. In order to formulate concrete values for this, close coordination with the risk and security officers is necessary. The third area that should be included in the measurements is the impact on the end customer. The effects of the platform can be reflected at this point in the form of faster delivery, increased reliability in production or generally in the key business figures.