Posts Tagged CD

Infrastructure as Code Maturity Model

Systematically Evolving an Organization’s Infrastructure

Infrastructure and software development teams are increasingly building and managing infrastructure using automated tools that have been described as “infrastructure as code.” – Kief Morris (Infrastructure as Code)

The process of managing and provisioning computing infrastructure and their configuration through machine-processable, declarative, definition files, rather than physical hardware configuration or the use of interactive configuration tools. – Wikipedia (abridged)

Convergence of CD, Cloud, and IaC

In 2011, co-authors Jez Humble, formerly of ThoughtWorks, and David Farley, published their ground-breaking book, Continuous Delivery. Humble and Farley’s book set out, in their words, to automate the ‘painful, risky, and time-consuming process’ of the software ‘build, deployment, and testing process.

cd_image_02

Over the next five years, Humble and Farley’s Continuous Delivery made a significant contribution to the modern phenomena of DevOps. According to Wikipedia, DevOps is the ‘culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes.

In parallel with the growth of DevOps, Cloud Computing continued to grow at an explosive rate. Amazon pioneered modern cloud computing in 2006 with the launch of its Elastic Compute Cloud. Two years later, in 2008, Microsoft launched its cloud platform, Azure. In 2010, Rackspace launched OpenStack.

Today, there is a flock of ‘cloud’ providers. Their services fall into three primary service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Since we will be discussing infrastructure, we will focus on IaaS and PaaS. Leaders in this space include Google Cloud Platform, RedHat, Oracle Cloud, Pivotal Cloud Foundry, CenturyLink Cloud, Apprenda, IBM SmartCloud Enterprise, and Heroku, to mention just a few.

Finally, fast forward to June 2016, O’Reilly releases Infrastructure as Code
Managing Servers in the Cloud
, by Kief Morris, ThoughtWorks. This crucial work bridges many of the concepts first introduced in Humble and Farley’s Continuous Delivery, with the evolving processes and practices to support cloud computing.

cd_image_03

This post examines how to apply the principles found in the Continuous Delivery Maturity Model, an analysis tool detailed in Humble and Farley’s Continuous Delivery, and discussed herein, to the best practices found in Morris’ Infrastructure as Code.

Infrastructure as Code

Before we continue, we need a shared understanding of infrastructure as code. Below are four examples of infrastructure as code, as Wikipedia defined them, ‘machine-processable, declarative, definition files.’ The code was written using four popular tools, including HashiCorp Packer, Docker, AWS CloudFormation, and HashiCorp Terraform. Executing the code provisions virtualized cloud infrastructure.

HashiCorp Packer

Packer definition of an AWS EBS-backed AMI, based on Ubuntu.

{
  "variables": {
    "aws_access_key": "",
    "aws_secret_key": ""
  },
  "builders": [{
    "type": "amazon-ebs",
    "access_key": "{{user `aws_access_key`}}",
    "secret_key": "{{user `aws_secret_key`}}",
    "region": "us-east-1",
    "source_ami": "ami-fce3c696",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "packer-example {{timestamp}}"
  }]
}

Docker

Dockerfile, used to create a Docker image, and subsequently a Docker container, running MongoDB.

FROM ubuntu:16.04
MAINTAINER Docker
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
RUN echo "deb http://repo.mongodb.org/apt/ubuntu" \
$(cat /etc/lsb-release | grep DISTRIB_CODENAME | cut -d= -f2)/mongodb-org/3.2 multiverse" | \
tee /etc/apt/sources.list.d/mongodb-org-3.2.list
RUN apt-get update && apt-get install -y mongodb-org
RUN mkdir -p /data/db
EXPOSE 27017
ENTRYPOINT ["/usr/bin/mongod"]

AWS CloudFormation

AWS CloudFormation declaration for three services enabled on a running instance.

services:
  sysvinit:
    nginx:
      enabled: "true"
      ensureRunning: "true"
      files:
        - "/etc/nginx/nginx.conf"
      sources:
        - "/var/www/html"
    php-fastcgi:
      enabled: "true"
      ensureRunning: "true"
      packages:
        yum:
          - "php"
          - "spawn-fcgi"
    sendmail:
      enabled: "false"
      ensureRunning: "false"

HashiCorp Terraform

Terraform definition of an AWS m1.small EC2 instance, running NGINX on Ubuntu.

resource "aws_instance" "web" {
  connection { user = "ubuntu" }
instance_type = "m1.small"
Ami = "${lookup(var.aws_amis, var.aws_region)}"
Key_name = "${aws_key_pair.auth.id}"
vpc_security_group_ids = ["${aws_security_group.default.id}"]
Subnet_id = "${aws_subnet.default.id}"
provisioner "remote-exec" {
  inline = [
    "sudo apt-get -y update",
    "sudo apt-get -y install nginx",
    "sudo service nginx start",
  ]
 }
}

Cloud-based Infrastructure as a Service

The previous examples provide but the narrowest of views into the potential breadth of infrastructure as code. Leading cloud providers, such as Amazon and Microsoft, offer hundreds of unique offerings, most of which may be defined and manipulated through code — infrastructure as code.

cd_image_05

cd_image_04

What Infrastructure as Code?

The question many ask is, what types of infrastructure can be defined as code? Although vendors and cloud providers have their unique names and descriptions, most infrastructure is divided into a few broad categories:

  • Compute
  • Databases, Caching, and Messaging
  • Storage, Backup, and Content Delivery
  • Networking
  • Security and Identity
  • Monitoring, Logging, and Analytics
  • Management Tooling

Continuous Delivery Maturity Model

We also need a common understanding of the Continuous Delivery Maturity Model. According to Humble and Farley, the Continuous Delivery Maturity Model was distilled as a model that ‘helps to identify where an organization stands in terms of the maturity of its processes and practices and defines a progression that an organization can work through to improve.

The Continuous Delivery Maturity Model is a 5×6 matrix, consisting of six areas of practice and five levels of maturity. Each of the matrix’s 30 elements defines a required discipline an organization needs to follow, to be considered at that level of maturity within that practice.

Areas of Practice

The CD Maturity Model examines six broad areas of practice found in most enterprise software organizations:

  • Build Management and Continuous Integration
  • Environments and Deployment
  • Release Management and Compliance
  • Testing
  • Data Management
  • Configuration Management

Levels of Maturity

The CD Maturity Model defines five level of increasing maturity, from a score of -1 to 3, from Regressive to Optimizing:

  • Level 3: Optimizing – Focus on process improvement
  • Level 2: Quantitatively Managed – Process measured and controlled
  • Level 1: Consistent – Automated processes applied across whole application lifecycle
  • Level 0: Repeatable – Process documented and partly automated
  • Level -1: Regressive – Processes unrepeatable, poorly controlled, and reactive

cd_image_06

Maturity Model Analysis

The CD Maturity Model is an analysis tool. In my experience, organizations use the maturity model in one of two ways. First, an organization completes an impartial evaluation of their existing levels of maturity across all areas of practice. Then, the organization focuses on improving the overall organization’s maturity, attempting to achieve a consistent level of maturity across all areas of practice. Alternately, the organization concentrates on a subset of the practices, which have the greatest business value, or given their relative immaturity, are a detriment to the other practices.

cd_image_01

* CD Maturity Model Analysis Tool available on GitHub.

Infrastructure as Code Maturity Levels

Although infrastructure as code is not explicitly called out as a practice in the CD Maturity Model, many of it’s best practices can be found in the maturity model. For example, the model prescribes automated environment provisioning, orchestrated deployments, and the use of metrics for continuous improvement.

Instead of trying to retrofit infrastructure as code into the existing CD Maturity Model, I believe it is more effective to independently apply the model’s five levels of maturity to infrastructure as code. To that end, I have selected many of the best practices from the book, Infrastructure as Code, as well as from my experiences. Those selected practices have been distributed across the model’s five levels of maturity.

The result is the first pass at an evolving Infrastructure as Code Maturity Model. This model may be applied alongside the broader CD Maturity Model, or independently, to evaluate and further develop an organization’s infrastructure practices.

IaC Level -1: Regressive

Processes unrepeatable, poorly controlled, and reactive

  • Limited infrastructure is provisioned and managed as code
  • Infrastructure provisioning still requires many manual processes
  • Infrastructure code is not written using industry-standard tooling and patterns
  • Infrastructure code not built, unit-tested, provisioned and managed, as part of a pipeline
  • Infrastructure code, processes, and procedures are inconsistently documented, and not available to all required parties

IaC Level 0: Repeatable

Processes documented and partly automated

  • All infrastructure code and configuration are stored in a centralized version control system
  • Testing, provisioning, and management of infrastructure are done as part of automated pipeline
  • Infrastructure is deployable as individual components
  • Leverages programmatic interfaces into physical devices
  • Automated security inspection of components and dependencies
  • Self-service CLI or API, where internal customers provision their resources
  • All code, processes, and procedures documented and available
  • Immutable infrastructure and processes

IaC Level 1: Consistent

Automated processes applied across whole application lifecycle

  • Fully automated provisioning and management of infrastructure
  • Minimal use of unsupported, ‘home-grown’ infrastructure tooling
  • Unit-tests meet code-coverage requirements
  • Code is continuously tested upon every check-in to version control system
  • Continuously available infrastructure using zero-downtime provisioning
  • Uses configuration registries
  • Templatized configuration files (no awk/sed magic)
  • Secrets are securely management
  • Auto-scaling based on user-defined load characteristics

IaC Level 2: Quantitatively Managed

Processes measured and controlled

  • Uses infrastructure definition files
  • Capable of automated rollbacks
  • Infrastructure and supporting systems are highly available and fault tolerant
  • Externalized configuration, no black box API to modify configuration
  • Fully monitored infrastructure with configurable alerting
  • Aggregated, auditable infrastructure logging
  • All code, processes, and procedures are well documented in a Knowledge Management System
  • Infrastructure code uses declarative versus imperative programming model, maybe…

IaC Level 3: Optimizing

Focus on process improvement

  • Self-healing, self-configurable, self-optimizing, infrastructure
  • Performance tested and monitored against business KPIs
  • Maximal infrastructure utilization and workload density
  • Adheres to Cloud Native and 12-Factor patterns
  • Cloud-agnostic code that minimizes cloud vendor lock-in

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

3 Comments

Software Delivery: Evaluating Risk within the Enterprise

As software evolves from less-complex applications to enterprise platforms, how does increasing complexity raise the potential risk of delivering unreliable software?
  

Cover Drawing

 Introduction

Many vendor whitepapers, industry publications, blog posts, podcasts, and e-books, extol the best practices in software development and delivery. Best practices include industry-standard concepts, such as Agile, DevOps, TTD, continuous integration, and continuous delivery. Generally, these best practices all strive to improve the process of delivering software enhancements and bug fixes to customers.

Rapidly, reliably and repeatedly push out enhancements and bug fixes to customers at low risk and with minimal manual overhead. – Wikipedia

Most learning resources present one of two idealized environments, ‘applications as islands’ and ‘utopian enterprise’. I am also often guilty of tailoring my own materials to one of these two idealized environments. Neither ‘applications as islands’ or ‘utopian enterprise’, best model the typical enterprise software environments in which many of us work.

Applications as Islands

The ‘applications as islands’ environment is one of completely isolated application stacks. These types of environments have multiple application stacks, consisting of web, mobile, and desktop components, services, data sources, utilities and scripts, messaging and reporting components, and so forth. Unrealistically, each application stack is completely isolated from the other application stacks within the same environment.

The Utopian Enterprise

The ‘utopian enterprise’ environments have multiple application stacks with multiple shared components. However, they are built, unrealistically, using consistent and modern architectural patterns and compatible technology stacks. They are designed from the ground up to be compartmentalized, scalable, and highly risk-tolerant to changes. They often avoid the challenges of monolithic legacy applications. The closest things in the real world are probably industry trendsetters, such as Facebook, Etsy, Amazon, and Twitter. We all probably wish we could evolve our own software environments into one of these Utopias.

Complexity and Risk

As an organization continues to evolve their software, they naturally increase the overall complexity, and thereby the challenge of effectively delivering reliable and performant software. In this post, I will explore the challenges of software delivery, as a software environment grows in complexity. Specifically, I will focus on how to evaluate the level of risk based on software changes made to various components within the software environment.

Sensitivity and Impact

As we examine the level of risk introduced by software changes within the environment, two aspects of risk are inescapable, sensitivity and impact. Sensitivity will be defined as the potential degree of which one component, such as an application, service, or data source, is affected by changes to other components within the same software environment. How sensitive is ‘Application A’ to changes made to other components within the same software environment, on which ‘Application A’ is directly or indirectly dependent?

The impact will be defined as the potential effect a component’s changes have on other components within the software environment. Teams tend to only evaluate the impact of changes to the immediate component or application stack. They do not sufficiently consider how those changes impact those components that are directly and indirectly dependent on them. What level of impact do changes to ‘Service B’ have on all other components within the software environment that are directly and indirectly dependent on ‘Service B’?

Notice I use the word potential. Any change has the potential to introduce risk. The level of risk varies, based on the type and volume of changes. A few simple changes should have a low potential for impact, as opposed to a high number of changes, or more complex changes. For example, changing an internal error message logged by a particular service operation should present a very low risk. This, as opposed to rewriting that operation’s complex algorithm for calculating a customer’s creditworthiness. The potential impact of those two types’ changes to dependent components varies significantly.

Measuring Risk

For both sensitivity to change and impact of change, I will use a color-coded scale to subjectively assign a level of potential risk to each component within a given software environment. The scale ranges from ‘Low’, to ‘Moderate’, to ‘High’, to ‘Very High’. Using the scale, it is possible to ‘heat map’ a software environment, based on the level of risk from changes.

Independent Aspects of Risk

Sensitivity and impact are two independent aspects of risk. Changes to one component may have a ‘Low’ potential impact on all other components within the environment. While at the same time, that same component may have a ‘High’ sensitivity to changes made to other components within the environment. Alternatively, a component may have a ‘Very High’ risk for potential impact on multiple components within the environment. At the same time, that same component may have a ‘Low’ potential sensitivity to changes made to other components. Sensitivity and risk do not parallel each other.

Growing Complexity

Let’s look at how sensitivity and impact change as we increase the software environment’s complexity. In the first example, we will look at one of the two environments I described earlier, isolated applications. Applications may have their own web and mobile components, SOAP or RESTful services, data sources, utilities, scheduled tasks, and so forth. However, the applications do not depend on each other or components outside their own immediate application stack; the applications are self-contained.


When making changes in this type of environment, the real potential impact is to the overall stability, security, and performance of the individual applications, themselves. As long as they are in isolation, the applications will have no impact on each other. Therefore, applications potential sensitivity to changes and their impact on other applications is ‘Low’.

Shared Components

A slightly more complex example is a software environment in which one or more applications have a dependency on a component outside their immediate application stack. For example, a healthcare provider develops a Windows-based application to track their employee’s work schedules (Application A). In addition, they develop a web application to track patient appointments (Application B). Lastly, they offer a client-facing mobile application for patients to track personal fitness and nutrition goals (Application C). Applications B and C share a common set of services and a database for managing patient data.

Software changes made to Applications A, B, and C, should have no effect on other components within the software environment. However, Applications B and C are potentially impacted by changes made to either the Services Layer or Data Layer. The Services Layer has ‘High’ potential impact to the software environment. Lastly, the Data Layer should not be directly impacted by changes made to the Services Layer or Applications. However, the Data Layer has the potential to directly affect the Services Layer, and indirectly affect Applications B and C. Therefore, the Data Layer’s potential impact on other dependent components within the environment is ‘Very High’.

Multiple Shared Components

An even more complex example is a software environment in which multiple applications have one or more dependencies on multiple components outside their immediate application stack (many-to-many).

Take, for example, a small financial institution. They have a ‘legacy’ COBOL-based application for managing their commercial mortgage business (Application A). They also have an older J2EE-based application, they acquired through a business merger, for managing their commercial banking relationships (Application B). Next, they have a relatively new Java EE-based investment banking application to manage their retail customers (Application C). Lastly, they have web-based, client-facing application for secure, online retail banking.

Since both Application A and B serve commercial clients, it is necessary to send financial data between the two application stacks. Since both applications are built on different, older technologies, the development team built a Custom Messaging Middleware component to connect the two applications. The Custom Messaging Middleware component receives, transforms, and delivers messages between the two applications.

Changes made to Applications C and D should have no impact on other components within the software environment. However, changes made to either Application A or B has the potential to indirectly affect the ability to successfully communicate with the other application, via the Custom Messaging Middleware. Changes to the Custom Messaging Middleware have the potential to affect both Applications A and B. The Custom Messaging Middleware has a ‘Moderate’ potential sensitivity to risk, versus ‘Low’, because one could argue that changes to either Application A or Application B’s messaging format could impact the Custom Messaging Middleware’s ability to properly process that application’s messages and successfully deliver them to opposite application.

Applications B, C, and D have a direct dependency on the Services Layer, and indirectly on the Data Layer. Therefore, the potential impact of changes to the Services Layer on other components is arguably higher than in the last example. The Services Layer’s potential impact on other components is ‘Very High’.

Since Application B has a direct dependency on both the Messaging Middleware and the Services Layer, it has a higher sensitivity to changes then the other three applications. Application B’s potential sensitivity to changes by other components is ‘Very High’.

Changes made to the Services Layer or the Applications will not affect the Data Layer. However, the Data Layer has the potential to directly affect the Services Layer, and indirectly affect Applications B, C, and D. Therefore, the Data Layer’s potential impact on the software environment is ‘Very High’.

Small Enterprise

The last example of increasing complexity is an environment in which even more applications are dependent on even more components. Additionally, there may be different types of components, such as a common UI and third-party APIs, which only increase the complexity of the dependencies. Although this example is nowhere near as complex as many enterprise software environments, it does begin to reflect their intricate, inner-dependent structure.

Let’s use an example of a large web-based retailer. The retailer has a standalone ERM application for managing their wholesale purchasing and product distribution (Application A). Next, they have their primary client-facing storefront (Application B). They also have a separate application to handle customer accounts (Application C). Lastly, they have an application that manages their online media retail business and media storage (Application D).

In addition to the Common Services Layer, Common Data Layer, and Custom Messaging Middleware, as seen in earlier examples, the retailer has two other components in their environment, a Common Web User Interface (UI) and a Web API. The Web UI provides the customer with a seamless branded experience, no matter which application they use – Application B, C, or D. The customer enters the Common Web UI and has all three application’s features seamlessly available to them.

The retailer also exposes a RESTful Web API for its marketing affiliates. Third parties can develop a variety of applications that drive sales to the retailer, in return for a sales commission.

In the earlier examples, individual applications had separate points of entry. However, in this example, the Common Web UI provides a single point of entry for users of Applications B, C, and D. Having a single point of entry also introduces a single point of failure for all three applications. Thus, the potential risk to the retailer and their customers is much greater. The Common Web UI’s potential impact on other components is ‘Very High’.

A single point of entry also introduces a single point of failure.

The potential sensitivity of the Common Web UI to changes comes from its direct dependency on the Services Layer, and indirectly on the Data Layer. Additionally, one could argue, since the Common Web UI displays the three Applications, it is also sensitive to changes made by those applications. If one of those applications becomes impaired due to a bad change, that application would seem to affect the Web UI’s functionality. The Common UI’s potential sensitivity to change is ‘High’.

The Web API is similar to the Common Web UI, in terms of potential sensitivity and impact. The potential impact of changes to the Web API is ‘Very High’, since a defect there could result in the potential impairment of the retailer’s affiliate applications. The potential sensitivity of the Web API to changes comes from its direct dependency on the Services Layer, and indirectly on the Data Layer. The Web API’s potential sensitivity to change is ‘High’. There is very little chance of potential impact to the Web API from the retailer’s affiliate applications.

Impact of Key Components

Lastly, as systems grow in complexity, certain components often become so key, they have the potential to impact the entire environment, a true single point of failure. Below, note the potential impact of changes to the Common Services Layer on all other components. As the software environment has grown in complexity, the Common Services Layer sits at the heart of the system. The Services Layer has multiple components directly dependent on it (i.e. Application C), as well as other components indirectly dependent on it (i.e. Third-Party Applications). It is also the only point of access to and from the Common Data Layer.

There are steps organizations can take to mitigate the potential risk caused by changes to key components, like the Services Layer. Areas organizations commonly focus on to reduce risk are higher code quality, increased test coverage, and improved performance, fault tolerance, system redundancy, and rollback capabilities. Additionally, management should more thoroughly scrutinize proposed software changes to key components, balancing new features with the need for stability, availability, and performance.

Management must balance the need for new features with need for stability, availability, and performance.

Specific to services, organizations often look to decouple larger services, creating smaller, more focused services. Better separation of concerns increases the likelihood that potential impairments caused by code defects are isolated to a smaller subset of functionality.

Conclusion

In this brief post, we examined a potential risk to delivering reliable software, the impact of software changes. There are many risks to delivering reliable software. Once all sources of risk are identified and quantified, the overall level of risk to delivering reliable software can be assessed, and steps taken to reduce the potential impact.

, , , , , , , , , , , , ,

Leave a comment