Introduction
Many vendor whitepapers, industry publications, blog posts, podcasts, and e-books, extol the best practices in software development and delivery. Best practices include industry-standard concepts, such as Agile, DevOps, TTD, continuous integration, and continuous delivery. Generally, these best practices all strive to improve the process of delivering software enhancements and bug fixes to customers.
Rapidly, reliably and repeatedly push out enhancements and bug fixes to customers at low risk and with minimal manual overhead. – Wikipedia
Most learning resources present one of two idealized environments, ‘applications as islands’ and ‘utopian enterprise’. I am also often guilty of tailoring my own materials to one of these two idealized environments. Neither ‘applications as islands’ or ‘utopian enterprise’, best model the typical enterprise software environments in which many of us work.
Applications as Islands
The ‘applications as islands’ environment is one of completely isolated application stacks. These types of environments have multiple application stacks, consisting of web, mobile, and desktop components, services, data sources, utilities and scripts, messaging and reporting components, and so forth. Unrealistically, each application stack is completely isolated from the other application stacks within the same environment.
The Utopian Enterprise
The ‘utopian enterprise’ environments have multiple application stacks with multiple shared components. However, they are built, unrealistically, using consistent and modern architectural patterns and compatible technology stacks. They are designed from the ground up to be compartmentalized, scalable, and highly risk-tolerant to changes. They often avoid the challenges of monolithic legacy applications. The closest things in the real world are probably industry trendsetters, such as Facebook, Etsy, Amazon, and Twitter. We all probably wish we could evolve our own software environments into one of these Utopias.
Complexity and Risk
As an organization continues to evolve their software, they naturally increase the overall complexity, and thereby the challenge of effectively delivering reliable and performant software. In this post, I will explore the challenges of software delivery, as a software environment grows in complexity. Specifically, I will focus on how to evaluate the level of risk based on software changes made to various components within the software environment.
Sensitivity and Impact
As we examine the level of risk introduced by software changes within the environment, two aspects of risk are inescapable, sensitivity and impact. Sensitivity will be defined as the potential degree of which one component, such as an application, service, or data source, is affected by changes to other components within the same software environment. How sensitive is ‘Application A’ to changes made to other components within the same software environment, on which ‘Application A’ is directly or indirectly dependent?
The impact will be defined as the potential effect a component’s changes have on other components within the software environment. Teams tend to only evaluate the impact of changes to the immediate component or application stack. They do not sufficiently consider how those changes impact those components that are directly and indirectly dependent on them. What level of impact do changes to ‘Service B’ have on all other components within the software environment that are directly and indirectly dependent on ‘Service B’?
Notice I use the word potential. Any change has the potential to introduce risk. The level of risk varies, based on the type and volume of changes. A few simple changes should have a low potential for impact, as opposed to a high number of changes, or more complex changes. For example, changing an internal error message logged by a particular service operation should present a very low risk. This, as opposed to rewriting that operation’s complex algorithm for calculating a customer’s creditworthiness. The potential impact of those two types’ changes to dependent components varies significantly.
Measuring Risk
For both sensitivity to change and impact of change, I will use a color-coded scale to subjectively assign a level of potential risk to each component within a given software environment. The scale ranges from ‘Low’, to ‘Moderate’, to ‘High’, to ‘Very High’. Using the scale, it is possible to ‘heat map’ a software environment, based on the level of risk from changes.
Independent Aspects of Risk
Sensitivity and impact are two independent aspects of risk. Changes to one component may have a ‘Low’ potential impact on all other components within the environment. While at the same time, that same component may have a ‘High’ sensitivity to changes made to other components within the environment. Alternatively, a component may have a ‘Very High’ risk for potential impact on multiple components within the environment. At the same time, that same component may have a ‘Low’ potential sensitivity to changes made to other components. Sensitivity and risk do not parallel each other.
Growing Complexity
Let’s look at how sensitivity and impact change as we increase the software environment’s complexity. In the first example, we will look at one of the two environments I described earlier, isolated applications. Applications may have their own web and mobile components, SOAP or RESTful services, data sources, utilities, scheduled tasks, and so forth. However, the applications do not depend on each other or components outside their own immediate application stack; the applications are self-contained.
When making changes in this type of environment, the real potential impact is to the overall stability, security, and performance of the individual applications, themselves. As long as they are in isolation, the applications will have no impact on each other. Therefore, applications potential sensitivity to changes and their impact on other applications is ‘Low’.
Shared Components
A slightly more complex example is a software environment in which one or more applications have a dependency on a component outside their immediate application stack. For example, a healthcare provider develops a Windows-based application to track their employee’s work schedules (Application A). In addition, they develop a web application to track patient appointments (Application B). Lastly, they offer a client-facing mobile application for patients to track personal fitness and nutrition goals (Application C). Applications B and C share a common set of services and a database for managing patient data.
Software changes made to Applications A, B, and C, should have no effect on other components within the software environment. However, Applications B and C are potentially impacted by changes made to either the Services Layer or Data Layer. The Services Layer has ‘High’ potential impact to the software environment. Lastly, the Data Layer should not be directly impacted by changes made to the Services Layer or Applications. However, the Data Layer has the potential to directly affect the Services Layer, and indirectly affect Applications B and C. Therefore, the Data Layer’s potential impact on other dependent components within the environment is ‘Very High’.
Multiple Shared Components
An even more complex example is a software environment in which multiple applications have one or more dependencies on multiple components outside their immediate application stack (many-to-many).
Take, for example, a small financial institution. They have a ‘legacy’ COBOL-based application for managing their commercial mortgage business (Application A). They also have an older J2EE-based application, they acquired through a business merger, for managing their commercial banking relationships (Application B). Next, they have a relatively new Java EE-based investment banking application to manage their retail customers (Application C). Lastly, they have web-based, client-facing application for secure, online retail banking.
Since both Application A and B serve commercial clients, it is necessary to send financial data between the two application stacks. Since both applications are built on different, older technologies, the development team built a Custom Messaging Middleware component to connect the two applications. The Custom Messaging Middleware component receives, transforms, and delivers messages between the two applications.
Changes made to Applications C and D should have no impact on other components within the software environment. However, changes made to either Application A or B has the potential to indirectly affect the ability to successfully communicate with the other application, via the Custom Messaging Middleware. Changes to the Custom Messaging Middleware have the potential to affect both Applications A and B. The Custom Messaging Middleware has a ‘Moderate’ potential sensitivity to risk, versus ‘Low’, because one could argue that changes to either Application A or Application B’s messaging format could impact the Custom Messaging Middleware’s ability to properly process that application’s messages and successfully deliver them to opposite application.
Applications B, C, and D have a direct dependency on the Services Layer, and indirectly on the Data Layer. Therefore, the potential impact of changes to the Services Layer on other components is arguably higher than in the last example. The Services Layer’s potential impact on other components is ‘Very High’.
Since Application B has a direct dependency on both the Messaging Middleware and the Services Layer, it has a higher sensitivity to changes then the other three applications. Application B’s potential sensitivity to changes by other components is ‘Very High’.
Changes made to the Services Layer or the Applications will not affect the Data Layer. However, the Data Layer has the potential to directly affect the Services Layer, and indirectly affect Applications B, C, and D. Therefore, the Data Layer’s potential impact on the software environment is ‘Very High’.
Small Enterprise
The last example of increasing complexity is an environment in which even more applications are dependent on even more components. Additionally, there may be different types of components, such as a common UI and third-party APIs, which only increase the complexity of the dependencies. Although this example is nowhere near as complex as many enterprise software environments, it does begin to reflect their intricate, inner-dependent structure.
Let’s use an example of a large web-based retailer. The retailer has a standalone ERM application for managing their wholesale purchasing and product distribution (Application A). Next, they have their primary client-facing storefront (Application B). They also have a separate application to handle customer accounts (Application C). Lastly, they have an application that manages their online media retail business and media storage (Application D).
In addition to the Common Services Layer, Common Data Layer, and Custom Messaging Middleware, as seen in earlier examples, the retailer has two other components in their environment, a Common Web User Interface (UI) and a Web API. The Web UI provides the customer with a seamless branded experience, no matter which application they use – Application B, C, or D. The customer enters the Common Web UI and has all three application’s features seamlessly available to them.
The retailer also exposes a RESTful Web API for its marketing affiliates. Third parties can develop a variety of applications that drive sales to the retailer, in return for a sales commission.
In the earlier examples, individual applications had separate points of entry. However, in this example, the Common Web UI provides a single point of entry for users of Applications B, C, and D. Having a single point of entry also introduces a single point of failure for all three applications. Thus, the potential risk to the retailer and their customers is much greater. The Common Web UI’s potential impact on other components is ‘Very High’.
A single point of entry also introduces a single point of failure.
The potential sensitivity of the Common Web UI to changes comes from its direct dependency on the Services Layer, and indirectly on the Data Layer. Additionally, one could argue, since the Common Web UI displays the three Applications, it is also sensitive to changes made by those applications. If one of those applications becomes impaired due to a bad change, that application would seem to affect the Web UI’s functionality. The Common UI’s potential sensitivity to change is ‘High’.
The Web API is similar to the Common Web UI, in terms of potential sensitivity and impact. The potential impact of changes to the Web API is ‘Very High’, since a defect there could result in the potential impairment of the retailer’s affiliate applications. The potential sensitivity of the Web API to changes comes from its direct dependency on the Services Layer, and indirectly on the Data Layer. The Web API’s potential sensitivity to change is ‘High’. There is very little chance of potential impact to the Web API from the retailer’s affiliate applications.
Impact of Key Components
Lastly, as systems grow in complexity, certain components often become so key, they have the potential to impact the entire environment, a true single point of failure. Below, note the potential impact of changes to the Common Services Layer on all other components. As the software environment has grown in complexity, the Common Services Layer sits at the heart of the system. The Services Layer has multiple components directly dependent on it (i.e. Application C), as well as other components indirectly dependent on it (i.e. Third-Party Applications). It is also the only point of access to and from the Common Data Layer.
There are steps organizations can take to mitigate the potential risk caused by changes to key components, like the Services Layer. Areas organizations commonly focus on to reduce risk are higher code quality, increased test coverage, and improved performance, fault tolerance, system redundancy, and rollback capabilities. Additionally, management should more thoroughly scrutinize proposed software changes to key components, balancing new features with the need for stability, availability, and performance.
Management must balance the need for new features with need for stability, availability, and performance.
Specific to services, organizations often look to decouple larger services, creating smaller, more focused services. Better separation of concerns increases the likelihood that potential impairments caused by code defects are isolated to a smaller subset of functionality.
Conclusion
In this brief post, we examined a potential risk to delivering reliable software, the impact of software changes. There are many risks to delivering reliable software. Once all sources of risk are identified and quantified, the overall level of risk to delivering reliable software can be assessed, and steps taken to reduce the potential impact.