Posts Tagged Analysis

Infrastructure as Code Maturity Model

Systematically Evolving an Organization’s Infrastructure

Infrastructure and software development teams are increasingly building and managing infrastructure using automated tools that have been described as “infrastructure as code.” – Kief Morris (Infrastructure as Code)

The process of managing and provisioning computing infrastructure and their configuration through machine-processable, declarative, definition files, rather than physical hardware configuration or the use of interactive configuration tools. – Wikipedia (abridged)

Convergence of CD, Cloud, and IaC

In 2011, co-authors Jez Humble, formerly of ThoughtWorks, and David Farley, published their ground-breaking book, Continuous Delivery. Humble and Farley’s book set out, in their words, to automate the ‘painful, risky, and time-consuming process’ of the software ‘build, deployment, and testing process.

cd_image_02

Over the next five years, Humble and Farley’s Continuous Delivery made a significant contribution to the modern phenomena of DevOps. According to Wikipedia, DevOps is the ‘culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes.

In parallel with the growth of DevOps, Cloud Computing continued to grow at an explosive rate. Amazon pioneered modern cloud computing in 2006 with the launch of its Elastic Compute Cloud. Two years later, in 2008, Microsoft launched its cloud platform, Azure. In 2010, Rackspace launched OpenStack.

Today, there is a flock of ‘cloud’ providers. Their services fall into three primary service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Since we will be discussing infrastructure, we will focus on IaaS and PaaS. Leaders in this space include Google Cloud Platform, RedHat, Oracle Cloud, Pivotal Cloud Foundry, CenturyLink Cloud, Apprenda, IBM SmartCloud Enterprise, and Heroku, to mention just a few.

Finally, fast forward to June 2016, O’Reilly releases Infrastructure as Code
Managing Servers in the Cloud
, by Kief Morris, ThoughtWorks. This crucial work bridges many of the concepts first introduced in Humble and Farley’s Continuous Delivery, with the evolving processes and practices to support cloud computing.

cd_image_03

This post examines how to apply the principles found in the Continuous Delivery Maturity Model, an analysis tool detailed in Humble and Farley’s Continuous Delivery, and discussed herein, to the best practices found in Morris’ Infrastructure as Code.

Infrastructure as Code

Before we continue, we need a shared understanding of infrastructure as code. Below are four examples of infrastructure as code, as Wikipedia defined them, ‘machine-processable, declarative, definition files.’ The code was written using four popular tools, including HashiCorp Packer, Docker, AWS CloudFormation, and HashiCorp Terraform. Executing the code provisions virtualized cloud infrastructure.

HashiCorp Packer

Packer definition of an AWS EBS-backed AMI, based on Ubuntu.

{
  "variables": {
    "aws_access_key": "",
    "aws_secret_key": ""
  },
  "builders": [{
    "type": "amazon-ebs",
    "access_key": "{{user `aws_access_key`}}",
    "secret_key": "{{user `aws_secret_key`}}",
    "region": "us-east-1",
    "source_ami": "ami-fce3c696",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "packer-example {{timestamp}}"
  }]
}

Docker

Dockerfile, used to create a Docker image, and subsequently a Docker container, running MongoDB.

FROM ubuntu:16.04
MAINTAINER Docker
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
RUN echo "deb http://repo.mongodb.org/apt/ubuntu" \
$(cat /etc/lsb-release | grep DISTRIB_CODENAME | cut -d= -f2)/mongodb-org/3.2 multiverse" | \
tee /etc/apt/sources.list.d/mongodb-org-3.2.list
RUN apt-get update && apt-get install -y mongodb-org
RUN mkdir -p /data/db
EXPOSE 27017
ENTRYPOINT ["/usr/bin/mongod"]

AWS CloudFormation

AWS CloudFormation declaration for three services enabled on a running instance.

services:
  sysvinit:
    nginx:
      enabled: "true"
      ensureRunning: "true"
      files:
        - "/etc/nginx/nginx.conf"
      sources:
        - "/var/www/html"
    php-fastcgi:
      enabled: "true"
      ensureRunning: "true"
      packages:
        yum:
          - "php"
          - "spawn-fcgi"
    sendmail:
      enabled: "false"
      ensureRunning: "false"

HashiCorp Terraform

Terraform definition of an AWS m1.small EC2 instance, running NGINX on Ubuntu.

resource "aws_instance" "web" {
  connection { user = "ubuntu" }
instance_type = "m1.small"
Ami = "${lookup(var.aws_amis, var.aws_region)}"
Key_name = "${aws_key_pair.auth.id}"
vpc_security_group_ids = ["${aws_security_group.default.id}"]
Subnet_id = "${aws_subnet.default.id}"
provisioner "remote-exec" {
  inline = [
    "sudo apt-get -y update",
    "sudo apt-get -y install nginx",
    "sudo service nginx start",
  ]
 }
}

Cloud-based Infrastructure as a Service

The previous examples provide but the narrowest of views into the potential breadth of infrastructure as code. Leading cloud providers, such as Amazon and Microsoft, offer hundreds of unique offerings, most of which may be defined and manipulated through code — infrastructure as code.

cd_image_05

cd_image_04

What Infrastructure as Code?

The question many ask is, what types of infrastructure can be defined as code? Although vendors and cloud providers have their unique names and descriptions, most infrastructure is divided into a few broad categories:

  • Compute
  • Databases, Caching, and Messaging
  • Storage, Backup, and Content Delivery
  • Networking
  • Security and Identity
  • Monitoring, Logging, and Analytics
  • Management Tooling

Continuous Delivery Maturity Model

We also need a common understanding of the Continuous Delivery Maturity Model. According to Humble and Farley, the Continuous Delivery Maturity Model was distilled as a model that ‘helps to identify where an organization stands in terms of the maturity of its processes and practices and defines a progression that an organization can work through to improve.

The Continuous Delivery Maturity Model is a 5×6 matrix, consisting of six areas of practice and five levels of maturity. Each of the matrix’s 30 elements defines a required discipline an organization needs to follow, to be considered at that level of maturity within that practice.

Areas of Practice

The CD Maturity Model examines six broad areas of practice found in most enterprise software organizations:

  • Build Management and Continuous Integration
  • Environments and Deployment
  • Release Management and Compliance
  • Testing
  • Data Management
  • Configuration Management

Levels of Maturity

The CD Maturity Model defines five level of increasing maturity, from a score of -1 to 3, from Regressive to Optimizing:

  • Level 3: Optimizing – Focus on process improvement
  • Level 2: Quantitatively Managed – Process measured and controlled
  • Level 1: Consistent – Automated processes applied across whole application lifecycle
  • Level 0: Repeatable – Process documented and partly automated
  • Level -1: Regressive – Processes unrepeatable, poorly controlled, and reactive

cd_image_06

Maturity Model Analysis

The CD Maturity Model is an analysis tool. In my experience, organizations use the maturity model in one of two ways. First, an organization completes an impartial evaluation of their existing levels of maturity across all areas of practice. Then, the organization focuses on improving the overall organization’s maturity, attempting to achieve a consistent level of maturity across all areas of practice. Alternately, the organization concentrates on a subset of the practices, which have the greatest business value, or given their relative immaturity, are a detriment to the other practices.

cd_image_01

* CD Maturity Model Analysis Tool available on GitHub.

Infrastructure as Code Maturity Levels

Although infrastructure as code is not explicitly called out as a practice in the CD Maturity Model, many of it’s best practices can be found in the maturity model. For example, the model prescribes automated environment provisioning, orchestrated deployments, and the use of metrics for continuous improvement.

Instead of trying to retrofit infrastructure as code into the existing CD Maturity Model, I believe it is more effective to independently apply the model’s five levels of maturity to infrastructure as code. To that end, I have selected many of the best practices from the book, Infrastructure as Code, as well as from my experiences. Those selected practices have been distributed across the model’s five levels of maturity.

The result is the first pass at an evolving Infrastructure as Code Maturity Model. This model may be applied alongside the broader CD Maturity Model, or independently, to evaluate and further develop an organization’s infrastructure practices.

IaC Level -1: Regressive

Processes unrepeatable, poorly controlled, and reactive

  • Limited infrastructure is provisioned and managed as code
  • Infrastructure provisioning still requires many manual processes
  • Infrastructure code is not written using industry-standard tooling and patterns
  • Infrastructure code not built, unit-tested, provisioned and managed, as part of a pipeline
  • Infrastructure code, processes, and procedures are inconsistently documented, and not available to all required parties

IaC Level 0: Repeatable

Processes documented and partly automated

  • All infrastructure code and configuration are stored in a centralized version control system
  • Testing, provisioning, and management of infrastructure are done as part of automated pipeline
  • Infrastructure is deployable as individual components
  • Leverages programmatic interfaces into physical devices
  • Automated security inspection of components and dependencies
  • Self-service CLI or API, where internal customers provision their resources
  • All code, processes, and procedures documented and available
  • Immutable infrastructure and processes

IaC Level 1: Consistent

Automated processes applied across whole application lifecycle

  • Fully automated provisioning and management of infrastructure
  • Minimal use of unsupported, ‘home-grown’ infrastructure tooling
  • Unit-tests meet code-coverage requirements
  • Code is continuously tested upon every check-in to version control system
  • Continuously available infrastructure using zero-downtime provisioning
  • Uses configuration registries
  • Templatized configuration files (no awk/sed magic)
  • Secrets are securely management
  • Auto-scaling based on user-defined load characteristics

IaC Level 2: Quantitatively Managed

Processes measured and controlled

  • Uses infrastructure definition files
  • Capable of automated rollbacks
  • Infrastructure and supporting systems are highly available and fault tolerant
  • Externalized configuration, no black box API to modify configuration
  • Fully monitored infrastructure with configurable alerting
  • Aggregated, auditable infrastructure logging
  • All code, processes, and procedures are well documented in a Knowledge Management System
  • Infrastructure code uses declarative versus imperative programming model, maybe…

IaC Level 3: Optimizing

Focus on process improvement

  • Self-healing, self-configurable, self-optimizing, infrastructure
  • Performance tested and monitored against business KPIs
  • Maximal infrastructure utilization and workload density
  • Adheres to Cloud Native and 12-Factor patterns
  • Cloud-agnostic code that minimizes cloud vendor lock-in

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

3 Comments

Operational Readiness Analysis

“Analysis: a detailed examination of the elements or structure of something, typically as a basis for discussion or interpretation.” — Google

Introduction

Recently, I had the opportunity, along with a colleague, to perform an operational readiness analysis of a client’s new application platform. The platform was in late-stage development, only a few weeks from going live. Being relatively new to the project team, our objective was to determine the amount of work remaining to make the platform operational, from a technical perspective. As well, we wanted to ensure there were no gaps in the project’s scope, as it related to technical operational readiness.

There were various approaches we could have taken to establish the amount of unfinished work and any existing gaps. We could have reviewed the state of the project in the client’s Agile project management system. We could have discussed the project’s state with the Product Owner, Project Manager, Team Lead, and Business Analyst. Neither of these methods on their own would have provided us with a complete picture.

The approach we ultimately took was what we’ve coined, an Operational Readiness Analysis, or our trendier title, the ‘State of the DevOps’. This approach is particularly effective and highly interactive. The analysis may be conducted at any stage of the Software Development Lifecycle (SDLC), to review a project’s operational state of readiness. The analysis involves five simple steps:

  1. Categorize
  2. Itemize
  3. Organize
  4. Prioritize
  5. Document

Grab Your Post-It Notes

We began, where all good ThoughtWorkers tend to begin, with a meeting room, whiteboard, dry-erase markers, and lots of brightly colored Post-It notes. ThoughtWorkers do love their Post-It notes. We gathered a small subset of the project team, including the Product Owner, Project Manager, and the team’s DevOps Engineer.

Step 1: Categorize

To start, we broadly categorized the operational requirements into buckets of work. In my experience, categorization typically takes one of two forms, either the use of role-based descriptors, like ‘Security’ and ‘Monitoring’, or the use of the ‘ilities’, such as ‘Scalability’ and ‘Maintainability’.

The ‘ilities’ are often referenced in regards to software architecture and the functional and nonfunctional requirements (NFRs) of a project. Although there are many ‘ilities’, project requirements frequently include Usability, Scalability, Maintainability, Reliability, Testability, Availability, and Serviceability. Although I enjoy referring to the ‘ilities’ when drafting project requirements, I find a broad audience more readily understands the high-level category descriptions.

We collectively agreed on eight categories, in addition to a catch-all ‘Other’ category, and distributed them on the whiteboard.

whiteboard-step-1

Role-based categories tend to include the following:

  • Security and Compliance
  • Monitoring and Alerting
  • Logging
  • Continuous Integration and Delivery
  • Release Management
  • Configuration
  • Environment Management
  • Infrastructure
  • Networking
  • Data Management
  • High Availability and Fault Tolerance
  • Performance
  • Backup and Restoration
  • Support and Maintenance
  • Documentation and Training
  • Technical Debt
  • Other

Step 2: Itemize

Next, each person placed as many Post-It notes as they wanted, below the categories on the whiteboard. Each Post-It represented an item someone felt needed to be addressed as part the analysis. Items might represent requirements currently in progress, or requirements still untouched in the project’s backlog. A Post-It note might contain a new requirement. Maybe there is a question about an aspect of the project’s relative importance, or there were concerns about how an aspect of the project had been implemented.

Participants placed an average of five to ten Post-It notes on the whiteboard. The end product was a collective mind-map of the project’s operational state (the following is only an example and doesn’t represent the actual results of any real client analysis).

whiteboard-step-2

Whereas categories tend to be generic in nature, the items are usually particular to the project, and it’s technology choices. The ‘Monitoring/Logging’ category might include individual items such as ‘Log Rotation Policy’ or ‘Splunk vs. ELK?’. The ‘Security’ category might include individual items such as ‘Need Password Rotation’ or ‘SSL Certificate Management’.

Participants often use one or two words to summarize more complex thoughts. For example, ‘Rollback’ might be stating, ‘we still don’t have a good rollback strategy for the database’. ‘Release Frequency’ may be questioning, ‘why can’t we release more frequently?’. Make sure to discuss and capture the participant’s full thoughts behind each Post-It, the during the Organize, Prioritize, and Document stages.

Don’t let participants give up too soon placing Post-It notes on the board. Often, one participant’s Post-It will spark additional thoughts other team members. Don’t be afraid to suggest a few items, if the group needs some initial motivation. Lastly, fear not, it’s never too late to go back and add missing items, uncovered in later stages of analysis.

Step 3: Organize

Quickly and non-judgmentally, review all items on the whiteboard. Group duplicates together. Ask for clarification on items where necessary. Move miscategorized items, if the author and team agree there is a more appropriate category.

Examine the ‘Other’ category. If there are items in the ‘Other’ category that are similar, consider adding a new category to capture them. Maybe the team missed identifying a key category, earlier. Conversely, there is nothing wrong with leaving a few stragglers in the ‘Other’ category; don’t overthink the exercise.

Participants should leave this stage with a general understanding of each item on the whiteboard.

Step 4: Prioritize

With all items identified, organized, and understood, discuss each item’s status. Does the item represent incomplete requirements? Does everyone agree an item’s requirements are complete? Team members may not agree on the completeness of particular items. Inconsistencies may reflect unclear requirements. More often, I find, disagreements are a result of an inconsistent or incomplete implementation of requirements across environments — Development, QA, Performance, Staging, and Production.

Why are operational requirements frequently implemented inconsistently? Often, a story is played early in the development stage, prior to all environments being available. For example, a story is played to add monitoring to Development and QA. The story is completed, tested, and closed. Afterward, the Performance environment is created. Later yet, the Staging and Production environments are created. But, there are no new stories to address monitoring the new environments.

One way to avoid this common issue with operational requirements is an effective DevOps automation strategy. In this example, an effective strategy would mean all new environments would get monitoring, automatically. The story should broadly address monitoring, and not, short-sightedly, monitoring of specific environments.

Often, only one or two project team members are responsible for ‘all the DevOps stuff’. They alone possess an accurate perspective on an item’s current status.

Gain agreement from the Product Owner, Project Manager, and key team members on the priority of incomplete requirements. Are they ‘Day One’ must-haves, or ‘Day Two’ and can be completed after the application goes live? Flag all high-priority items.

If the status of an item is unknown, such as ‘why is the QA environment always down?!’, note it as an open question and move on. If the item simply requires a quick answer to resolve, like ‘do we backup the database?’, answer it (‘the database is replicated and backed up daily’) and move on.

whiteboard-step-4

Step 5: Document

Snap a picture of the whiteboard and document its contents. We chose the client’s knowledge management system as a vehicle to share the whiteboard’s results. We created a table with columns for Category, Item, Description, Priority, Owner, and Notes. Also, it’s helpful to have a column for the project’s corresponding Story ID or Defect ID. Along with the whiteboard, capture key discussion points, questions, and concerns, raised by the participants.

Step 5: Document

Snap a picture of the whiteboard and document its contents. We chose Confluence to share the whiteboard results, using a table with columns for Category, Item, Description, Priority, Owner, and Notes. We also bulleted all key discussion points, questions, and concerns, raised by the participants.

operational-readiness-spreadsheet

Make the analysis results available to all team members. Let the team ask questions and poke holes in the results. Adjust the results if necessary.

Next Steps

The actions you take, given the results of the analysis, are up to you. In our case, we ensured all high-priority items were called out on the team’s Agile board. Any new items were captured in the client’s Agile project management system. Finally, we ensured open questions and concerns were addressed in a timely fashion. We continue to track each item’s status weekly, throughout the launch and post-launch periods.

We anticipate conducting a follow-up analysis, thirty days after launch. The goal will be to evaluate the effectiveness of the first analysis and identify additional operational needs as the application enters a business-as-usual (BAU) application lifecycle phase.


Postscript: the ‘ilities

The ‘ilities’, courtesy of codesqueeze.com and en.wikipedia.org

  • Scalability
    The capability of a system, network, or process to handle a growing amount of work or its potential to be enlarged to accommodate that growth.
  • Testability
    The practical feasibility of observing a reproducible series of such counterexamples if they do exist.
  • Reliability (Resilience)
    The ability of a system or component to perform its required functions under stated conditions for a specified time.
  • Usability (Performance)
    The degree to which software can be used by specified consumers to achieve quantified objectives with effectiveness, efficiency, and satisfaction in a quantified context of use.
  • Serviceability (Supportability)
    The ability of technical support personnel to install, configure, and monitor computer products, identify exceptions or faults, debug or isolate faults to root cause analysis, and provide hardware or software maintenance in pursuit of solving a problem and restoring the product into service.
  • Availability
    The proportion of time a system is in a functioning condition. Defined in the service-level agreement (SLA).
  • Maintainability
    The ease with which a product can be maintained. Isolate and correct defects or their cause, prevent unexpected breakdowns, maximize a product’s useful life, maximize efficiency, reliability, and safety, meet new requirements, make future maintenance easier, and cope with a changed environment.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , ,

Leave a comment