Posts Tagged DevOps

Effective DevOps Technical Analysis

mindmap_analysisBuilding a successful DevOps team is a challenge. Building a successful Agile DevOps team is an enormous challenge. Many Software Developers have experience working on Agile teams, often starting in college. On the contrary, most Operations Engineers and System Administrators have not participated on an Agile team. They frequently cut their teeth in highly reactive, support ticket-driven environments. Agile methodologies can often seem foreign and uncomfortable to them.

Adding to the challenge, most DevOps teams are expected to execute on a broad range of requirements. DevOps user stories can span the entire software development, delivery, and support lifecycle. Stories often include logging, monitoring, alerting, infrastructure provisioning, networking, security, continuous integration and deployment, and support.

Worst of all, many teams don’t apply the same rigor to developing high-quality user stories for DevOps-related requirements as they do for software requirements. On teams where DevOps Engineers are integrated with Software Developers, I find this is often due to a lack of understanding of DevOps requirements. On stand-alone DevOps teams, I find DevOps Engineers often believe their stories don’t need the same attention to details as their software development peers.

Successful Teams

I’ve been lucky enough to be part of multiple DevOps teams, both integrated and stand-alone. Those teams were a mix of engineers from both software development, system administration, and operations backgrounds. One of the reasons these teams were effective, was the presence of a Technical Analyst. The Technical Analyst role falls somewhere in-between an Agile Business Analyst and a traditional Systems Analyst; probably closer to the latter, on a DevOps team. The Technical Analyst ensures that user stories are Ready for Development, they ensure the stories meet the Definition of Ready.

There is a common perception that the Technical Analyst on an Agile team must be an experienced technologist. DevOps is a very broad and quickly evolving discipline. While a good understanding of Agile practices and software development are beneficial, I’ve found organization and investigation skills to be most advantageous to the Technical Analyst.

In this post, we will examine one particular methodology for optimizing the effectiveness of the Technical Analyst, through the use of a consistent approach to functional requirement analysis and story development.

Inadequate Requirements

Let’s start with a typical greenfield requirement frequently assigned to DevOps teams, centralized logging.

Title
Centralized Application Logging

User Story
As a Software Developer,
I need to view application logs,
So that I can monitor and troubleshoot running applications.

Acceptance Criterion 1
I can view the latest logs from multiple applications.

Acceptance Criterion 2
I can view application logs without having to log directly into each host on which the applications are running.

Acceptance Criterion 3
I can view the log entries in exact chronological order, based on timestamps.

I know some DevOps engineers that would gladly pick up this user story, make a handful of assumptions based on previous experiences, and manage to hack out a usable centralized application logging system. The ability to meet the end user’s expectations would be entirely based on the experience (and luck) of the engineer, certainly not the vague and incomplete user story and acceptance criteria, and lack of any technical background details.

Multi-Faceted Analysis

Consistently delivering well-defined user stories and comprehensive acceptance criteria, which provide maximum business value with minimal resource utilization, requires a systematic analytical approach.

Based on my experience, I have found certain facets of the software development, delivery, and support lifecycles, affect or are affected by DevOps-related functional requirements. Every requirement should be uniformly analyzed against those facets to improve the overall quality of user stories and acceptance criteria. In no particular order, these facets include:

  1. Logging
  2. Monitoring
  3. Alerting
  4. Ownership
  5. Support
  6. Maintenance
  7. Backup and Recovery
  8. Security and Compliance
  9. Continuous Integration and Delivery
  10. Service Level Agreements (SLAs)
  11. Configuration Management
  12. Environment Management
  13. High Availability and Fault Tolerance
  14. Release Management
  15. Life Expectancy
  16. Architectural Approval
  17. Technology Preferences
  18. Infrastructure
  19. Data Management
  20. Training and Documentation
  21. Computer Networking
  22. Licensing and Legal
  23. Cost
  24. Technical Debt
  25. Other

For simplicity, I prefer to organize these facets into five general categories: Technology, Infrastructure, Delivery, Support, and Organizational.

mindmap2_tech_analysisLet’s take a closer look at each facet, and how they could impact DevOps requirements.

Logging

Do the requirements introduce changes, which would or should produce logs? How should those logs be handled? Do those changes impact existing logging processes and systems?

Monitoring

Do the requirements introduce changes, which would or should be monitored? Do those changes impact existing monitoring processes and systems?

Alerting

Do the requirements introduce changes, which would or should produce alerts? Do the requirements introduce changes, which impact existing alerting processes and systems?

Ownership

Do the requirements introduce changes, which require an owner? Do those changes impact existing ownership models? An example of this might be regarding who owns the support and maintenance of new infrastructure.

Support

Do the requirements introduce changes, which would or should require internal, external, or vendor support? Do those changes impact existing support models? For example, adding new service categories to a support ticketing system, like ServiceNow.

Maintenance

Do the requirements introduce changes, which would or should require regular and on-going maintenance? Do those changes impact existing maintenance processes and schedules? An example of maintenance might be regarding how upgrading and patching would be completed for a new monitoring system.

Backup and Recovery

Do the requirements introduce changes, which would or should require backup and recovery? Do the requirements introduce changes, which impact existing backup and recovery processes and systems?

Security and Compliance

Do the requirements introduce changes, which have security and compliance implications? Do those changes impact existing security and compliance requirements, processes, and systems? Compliance in IT frequently includes Regulatory Compliance with HIPAA, Sarbanes-Oxley, and PCI-DSS. The Security and Compliance facet is closely associated with Licensing and Legal.

Continuous Integration and Delivery

Do the requirements introduce changes, which require continuous integration and delivery of application or infrastructure code? Do those changes impact existing continuous integration and delivery processes and systems?

Service Level Agreements (SLAs)

Do the requirements introduce changes, which are or should be covered by Service Level Agreements (SLA)? Do the requirements introduce changes, which impact existing SLAs Often, these requirements are system performance- or support-related, centered around Mean Time Between Failures (MTBF), Mean Time to Repair, and Mean Time to Recovery (MTTR).

Configuration Management

Do the requirements introduce changes, which requires the management of new configuration or multiple configurations? Do those changes affect existing configuration or configuration management processes and systems? Configuration management also includes feature management (aka feature switches). Will the changes turned on or turned off for testing and release? Configuration Management is closely associated with Environment Managment and Release Management.

Environment Management

Do the requirements introduce changes, which affect existing software environments or require new software environments? Do those changes affect existing environments? Software environments typically include Development, Test, Performance, Staging, and Production.

High Availability and Fault Tolerance

Do the requirements introduce changes, which require high availability and fault tolerant configurations? Do those changes affect existing highly available and fault-tolerant systems?

Release Management

Do the requirements introduce changes, which require releases to the Production or other client-facing environments? Do those changes impact existing release management processes and systems?

Life Expectancy

Do the requirements introduce changes, which have a defined lifespan? For example, will the changes be part a temporary fix, a POC, an MVP, pilot, or a permanent solution? Do those changes impact the lifecycles of existing processes and systems? For example, do the changes replace an existing system?

Architectural Approval

Do the requirements introduce changes, which require the approval of the Client, Architectural Review Board (ARB), Change Approval Board (CAB), or other governing bodies? Do the requirements introduce changes, which impact other existing or pending technology choices? Architectural Approval is closely related to Technology Preferences, below. Often an ARB or similar governing body will regulate technology choices.

Technology Preferences

Do the requirements introduce changes, which might be impacted by existing preferred vendor relationships or previously approved technologies? Are the requirements impacted by blocked technology, including countries or companies blacklisted by the Government, such as the Specially Designated Nationals List (SDN) or Consolidated Sanctions List (CSL)? Does the client approve of the use of Open Source technologies? What is the impact of those technology choices on other facets, such as Licensing and Legal, Cost, and Support? Do those changes impact other existing or pending technology choices?

Infrastructure

Do the requirements introduce changes, which would or should impact infrastructure? Is the infrastructure new or existing, is it physical, is it virtualized, is it in the cloud? If the infrastructure is virtualized, are there coding requirements about how it is provisioned? If the infrastructure is physical, how will it impact other facets, such as Technology Preferences, Cost, Support, Monitoring, and so forth? Do the requirements introduce changes, which impact existing infrastructure?

Data Management

Do the requirements introduce changes, which produce data or require the preservation of state? Do those changes impact existing data management processes and systems? Is the data ephemeral or transient? How is the data persisted? Is the data stored in a SQL database, NoSQL database, distributed key-value store, or flat-file? Consider how other aspects, such as Security and Compliance, Technology Preferences, and Architectural Approval, impact or are impacted by the various aspects of Data?

Training and Documentation

Do the requirements introduce changes, which require new training and documentation? This also includes architectural diagrams and other visuals. Do the requirements introduce changes, which impact training and documentation processes and practices?

Computer Networking

Do the requirements introduce changes, which require networking additions or modifications? Do those changes impact existing networking processes and systems? I have separated out Computer Networking from Infrastructure since networking is itself, so broad, involving such things as DNS, SDN, VPN, SSL, certificates, load balancing, proxies, firewalls, and so forth. Again, how do networking choices impact other aspects, like Security and Compliance?

Licensing and Legal

Do the requirements introduce changes, which requires licensing? Should or would the requirements require additional legal review? Are there contractual agreements that need authorization? Do the requirements introduce changes, which impact licensing or other legal agreements or contracts?

Cost

Do the requirements introduce changes, which requires the purchase of equipment, contracts, outside resources, or other goods which incur a cost? How and where will the funding come from to purchase those items? Are there licensing or other legal agreements and contracts required as part of the purchase? Who approves these purchases? Are they within acceptable budgetary expectations? Do those changes impact existing costs or budgets?

Technical Debt

Do the requirements introduce changes, which incur technical debt? What is the cost of that debt? How will impact other requirements and timelines? Do the requirements eliminate existing technical debt? Do those changes impact existing or future technical debt caused by other processes and systems?

Other

Depending on your organization, industry, and DevOps maturity, you may prefer different aspects critical to your requirements.

Closer Analysis

Let’s take a second look at our logging story, applying what we’ve learned. For each facet, we will ask the following three questions.

Is there any facet of _______ that impacts the logging requirement?
Will the logging requirement impact any facet of _______?
If yes, will the impacts be captured as part of this user story, or elsewhere?

For the sake of brevity, we will only look at the half the facets.

Logging

Although the requirement is about centralized logging, there is still analysis that should occur around Logging. For example, we should determine if there is a requirement that the centralized logging application should consume its own logs for easier supportability. If so, we should add an acceptance criterion to the original user story around the logging application’s logs being captured.

Monitoring

If we are building a centralized application logging system, we will assume there will be a centralized logging application, possibly a database, and infrastructure on which the application is running. Therefore, it is reasonable to also assume that the logging application, database, and infrastructure need to be monitored. This would require interfacing with an existing monitoring system or creating a new monitoring system. If and how the system will be monitored needs to be clarified.

Interfacing with a current monitoring system will minimally require additional acceptance criteria. Developing a new monitoring system would undoubtedly require additional user story or stories, with their own acceptance criteria.

Alerting

If we are monitoring the system, then we could assume that there should be a requirement to alert someone if monitoring finds the centralized application logging system becomes impaired. Similar to monitoring, this would require interfacing with an existing alerting system or creating a new alerting system. A bigger question, who do the alerts go to? These details should be clarified first.

Interfacing with an existing alerting system will minimally require additional acceptance criteria. Developing a new alerting system would require additional user story or stories, with their own acceptance criteria.

Ownership

Who will own the new centralized application logging system? Ownership impacts many facets, such as Support, Maintenance, and Alerting. Ownership should be clearly defined, before the requirement’s stories are played. Don’t build something that does not have an owner, it will fail as soon as it is launched.

No additional user stories or acceptance criteria should be required.

Support

We will assume that there are no Support requirements, other than what is already covered in Monitoring, Ownership, and Training and Documentation.

Maintenance

Similarly, we will assume that there are no Maintenance requirements, other than what is already covered in Ownership and Training and Documentation. Requirements regarding automating system maintenance in any way should be an additional user story or stories, with their own acceptance criteria.

Backup and Recovery

Similar to both Support and Maintenance, Backup and Recovery should have a clear owner. Regardless of the owner, integrating backup into the centralized application logging system should be addressed as part of the requirement. To create a critical support system such as logging, with no means of backup or recovery of the system or the data, is unwise.

Backup and Recovery should be an additional user story or stories, with their own acceptance criteria.

Security and Compliance

Since we are logging application activity, we should assume there is the potential of personally identifiable information (PII) and other sensitive business data being captured. Due to the security and compliance risks associated with improper handling of PII, there should be clear details in the logging requirement regarding such things as system access.

At a minimum, there are usually requirements for a centralized application logging system to implement some form of user, role, application, and environment-level authorization for viewing logs. Security and Compliance should be an additional user story or stories, with their own acceptance criteria. Any requirements around the obfuscation of sensitive data in the log entries would also be an entirely separate set of business requirements.

Continuous Integration and Delivery

We will assume that there are no Continuous Integration and Delivery requirements around logging. Any requirements to push the logs of existing Continuous Integration and Delivery systems to the new centralized application logging system would be a separate use story.

Service Level Agreements (SLAs)

A critical aspect of most DevOps user stories, which is commonly overlooked, Service Level Agreements. Most requirements should at least a minimal set of performance-related metrics associated with them. Defining SLAs also help to refine the scope of a requirement. For example, what is the anticipated average volume of log entries to be aggregated and over what time period? From how many sources will logs be aggregated? What is the anticipated average size of each log entry, 2 KB or 2 MB? What is the anticipated average volume of concurrent of system users?

Given these base assumptions, how quickly are new log entries expected to be available in the new centralized application logging system? What is the expected overall response time of the system under average load?

Directly applicable SLAs should be documented as acceptance criteria in the original user story, and the remaining SLAs assigned to the other new user stories.

Configuration Management

We will also assume that there no Configuration Management requirements around the new centralized application logging system. However, if the requirements prescribed standing up separately logging implementations, possibly to support multiple environments or applications, then Configuration Management would certainly be in scope.

Environment Management

Based on further clarification, we will assume we are only creating a single centralized application logging system instance. However, we still need to learn which application environments need to be captured by that system. Is this application only intended to support the Development and Test environments? Are the higher environments of Staging and Production precluded due to PII?

We should add an explicit acceptance criterion to the original user story around which application environments should be included and which should not.

High Availability and Fault Tolerance

The logging requirement should state whether or not this system is considered business critical. Usually, if a system is considered critical to the business, then we should assume the system requires monitoring, alerting, support, and backup. Additionally, the system should be highly available and fault tolerant.

Making the system highly available would surely have an impact on facets like Infrastructure, Monitoring, and Alerting. High Availability and Fault Tolerance could be a separate user story or stories, with its own acceptance criteria, as long as the original requirement doesn’t dictate those features.

Other Aspects

We skipped half the facets. They would certainly expand our story list and acceptance criteria. We didn’t address how the end user would view the logs. Should we assume a web browser? Which web browsers? We didn’t determine if the system should be accessible outside the internal network. We didn’t determine how the system would be configured and deployed. We didn’t investigate if there are any vendor or licensing restrictions, or if there are any required training and documentation needs. Possibly, we will need Architectural Approval before we start to develop.

Analysis Results

Based on our limited analysis, we uncovered some further needs:

Further Clarification

We need additional information and clarification regarding Logging, Monitoring, Alerting, and Security and Compliance. We should also determine ownership of the resulting centralized application logging system, before it is built, to ensure Alerting, Maintenance, and Support, are covered.

Additional Acceptance Criteria

We need to add additional acceptance criteria to the original user story around SLAs, Security and Compliance, Logging, Monitoring, and Environment Management.I prefer to write acceptance criteria in the

I prefer to write acceptance criteria in the Given-When-Then format. I find the ‘Given-When-Then’ format tends to drive a greater level of story refinement and detail than the ‘I’ format. For example, compare:

I expect to see the most current log entries with no more than a two-minute time delay.

With:

Given the centralized application logging system is operating normally and under average load,
When I view log entries for running applications, from my web browser, 
Then I should see the most current log entries with no more than a two-minute time delay.

New User Stories

Following Agile best practices, best illustrated by the INVEST mnemonic, we need to add new, quality user stories, with their own acceptance criteria, around Monitoring, Alerting, Backup and Recovery, Security and Compliance, and High Availability and Fault Tolerance.

Conclusion

Examining all possible facets of each DevOps requirement will improve the overall quality of user stories and acceptance criteria, maximizing business value and minimizing resource utilization.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , ,

Leave a comment

Infrastructure as Code Maturity Model

Systematically Evolving an Organization’s Infrastructure

Infrastructure and software development teams are increasingly building and managing infrastructure using automated tools that have been described as “infrastructure as code.” – Kief Morris (Infrastructure as Code)

The process of managing and provisioning computing infrastructure and their configuration through machine-processable, declarative, definition files, rather than physical hardware configuration or the use of interactive configuration tools. – Wikipedia (abridged)

Convergence of CD, Cloud, and IaC

In 2011, co-authors Jez Humble, formerly of ThoughtWorks, and David Farley, published their ground-breaking book, Continuous Delivery. Humble and Farley’s book set out, in their words, to automate the ‘painful, risky, and time-consuming process’ of the software ‘build, deployment, and testing process.

cd_image_02

Over the next five years, Humble and Farley’s Continuous Delivery made a significant contribution to the modern phenomena of DevOps. According to Wikipedia, DevOps is the ‘culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes.

In parallel with the growth of DevOps, Cloud Computing continued to grow at an explosive rate. Amazon pioneered modern cloud computing in 2006 with the launch of its Elastic Compute Cloud. Two years later, in 2008, Microsoft launched its cloud platform, Azure. In 2010, Rackspace launched OpenStack.

Today, there is a flock of ‘cloud’ providers. Their services fall into three primary service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Since we will be discussing infrastructure, we will focus on IaaS and PaaS. Leaders in this space include Google Cloud Platform, RedHat, Oracle Cloud, Pivotal Cloud Foundry, CenturyLink Cloud, Apprenda, IBM SmartCloud Enterprise, and Heroku, to mention just a few.

Finally, fast forward to June 2016, O’Reilly releases Infrastructure as Code
Managing Servers in the Cloud
, by Kief Morris, ThoughtWorks. This crucial work bridges many of the concepts first introduced in Humble and Farley’s Continuous Delivery, with the evolving processes and practices to support cloud computing.

cd_image_03

This post examines how to apply the principles found in the Continuous Delivery Maturity Model, an analysis tool detailed in Humble and Farley’s Continuous Delivery, and discussed herein, to the best practices found in Morris’ Infrastructure as Code.

Infrastructure as Code

Before we continue, we need a shared understanding of infrastructure as code. Below are four examples of infrastructure as code, as Wikipedia defined them, ‘machine-processable, declarative, definition files.’ The code was written using four popular tools, including HashiCorp Packer, Docker, AWS CloudFormation, and HashiCorp Terraform. Executing the code provisions virtualized cloud infrastructure.

HashiCorp Packer

Packer definition of an AWS EBS-backed AMI, based on Ubuntu.

{
  "variables": {
    "aws_access_key": "",
    "aws_secret_key": ""
  },
  "builders": [{
    "type": "amazon-ebs",
    "access_key": "{{user `aws_access_key`}}",
    "secret_key": "{{user `aws_secret_key`}}",
    "region": "us-east-1",
    "source_ami": "ami-fce3c696",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "packer-example {{timestamp}}"
  }]
}

Docker

Dockerfile, used to create a Docker image, and subsequently a Docker container, running MongoDB.

FROM ubuntu:16.04
MAINTAINER Docker
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
RUN echo "deb http://repo.mongodb.org/apt/ubuntu" \
$(cat /etc/lsb-release | grep DISTRIB_CODENAME | cut -d= -f2)/mongodb-org/3.2 multiverse" | \
tee /etc/apt/sources.list.d/mongodb-org-3.2.list
RUN apt-get update && apt-get install -y mongodb-org
RUN mkdir -p /data/db
EXPOSE 27017
ENTRYPOINT ["/usr/bin/mongod"]

AWS CloudFormation

AWS CloudFormation declaration for three services enabled on a running instance.

services:
  sysvinit:
    nginx:
      enabled: "true"
      ensureRunning: "true"
      files:
        - "/etc/nginx/nginx.conf"
      sources:
        - "/var/www/html"
    php-fastcgi:
      enabled: "true"
      ensureRunning: "true"
      packages:
        yum:
          - "php"
          - "spawn-fcgi"
    sendmail:
      enabled: "false"
      ensureRunning: "false"

HashiCorp Terraform

Terraform definition of an AWS m1.small EC2 instance, running NGINX on Ubuntu.

resource "aws_instance" "web" {
  connection { user = "ubuntu" }
instance_type = "m1.small"
Ami = "${lookup(var.aws_amis, var.aws_region)}"
Key_name = "${aws_key_pair.auth.id}"
vpc_security_group_ids = ["${aws_security_group.default.id}"]
Subnet_id = "${aws_subnet.default.id}"
provisioner "remote-exec" {
  inline = [
    "sudo apt-get -y update",
    "sudo apt-get -y install nginx",
    "sudo service nginx start",
  ]
 }
}

Cloud-based Infrastructure as a Service

The previous examples provide but the narrowest of views into the potential breadth of infrastructure as code. Leading cloud providers, such as Amazon and Microsoft, offer hundreds of unique offerings, most of which may be defined and manipulated through code — infrastructure as code.

cd_image_05

cd_image_04

What Infrastructure as Code?

The question many ask is, what types of infrastructure can be defined as code? Although vendors and cloud providers have their unique names and descriptions, most infrastructure is divided into a few broad categories:

  • Compute
  • Databases, Caching, and Messaging
  • Storage, Backup, and Content Delivery
  • Networking
  • Security and Identity
  • Monitoring, Logging, and Analytics
  • Management Tooling

Continuous Delivery Maturity Model

We also need a common understanding of the Continuous Delivery Maturity Model. According to Humble and Farley, the Continuous Delivery Maturity Model was distilled as a model that ‘helps to identify where an organization stands in terms of the maturity of its processes and practices and defines a progression that an organization can work through to improve.

The Continuous Delivery Maturity Model is a 5×6 matrix, consisting of six areas of practice and five levels of maturity. Each of the matrix’s 30 elements defines a required discipline an organization needs to follow, to be considered at that level of maturity within that practice.

Areas of Practice

The CD Maturity Model examines six broad areas of practice found in most enterprise software organizations:

  • Build Management and Continuous Integration
  • Environments and Deployment
  • Release Management and Compliance
  • Testing
  • Data Management
  • Configuration Management

Levels of Maturity

The CD Maturity Model defines five level of increasing maturity, from a score of -1 to 3, from Regressive to Optimizing:

  • Level 3: Optimizing – Focus on process improvement
  • Level 2: Quantitatively Managed – Process measured and controlled
  • Level 1: Consistent – Automated processes applied across whole application lifecycle
  • Level 0: Repeatable – Process documented and partly automated
  • Level -1: Regressive – Processes unrepeatable, poorly controlled, and reactive

cd_image_06

Maturity Model Analysis

The CD Maturity Model is an analysis tool. In my experience, organizations use the maturity model in one of two ways. First, an organization completes an impartial evaluation of their existing levels of maturity across all areas of practice. Then, the organization focuses on improving the overall organization’s maturity, attempting to achieve a consistent level of maturity across all areas of practice. Alternately, the organization concentrates on a subset of the practices, which have the greatest business value, or given their relative immaturity, are a detriment to the other practices.

cd_image_01

* CD Maturity Model Analysis Tool available on GitHub.

Infrastructure as Code Maturity Levels

Although infrastructure as code is not explicitly called out as a practice in the CD Maturity Model, many of it’s best practices can be found in the maturity model. For example, the model prescribes automated environment provisioning, orchestrated deployments, and the use of metrics for continuous improvement.

Instead of trying to retrofit infrastructure as code into the existing CD Maturity Model, I believe it is more effective to independently apply the model’s five levels of maturity to infrastructure as code. To that end, I have selected many of the best practices from the book, Infrastructure as Code, as well as from my experiences. Those selected practices have been distributed across the model’s five levels of maturity.

The result is the first pass at an evolving Infrastructure as Code Maturity Model. This model may be applied alongside the broader CD Maturity Model, or independently, to evaluate and further develop an organization’s infrastructure practices.

IaC Level -1: Regressive

Processes unrepeatable, poorly controlled, and reactive

  • Limited infrastructure is provisioned and managed as code
  • Infrastructure provisioning still requires many manual processes
  • Infrastructure code is not written using industry-standard tooling and patterns
  • Infrastructure code not built, unit-tested, provisioned and managed, as part of a pipeline
  • Infrastructure code, processes, and procedures are inconsistently documented, and not available to all required parties

IaC Level 0: Repeatable

Processes documented and partly automated

  • All infrastructure code and configuration are stored in a centralized version control system
  • Testing, provisioning, and management of infrastructure are done as part of automated pipeline
  • Infrastructure is deployable as individual components
  • Leverages programmatic interfaces into physical devices
  • Automated security inspection of components and dependencies
  • Self-service CLI or API, where internal customers provision their resources
  • All code, processes, and procedures documented and available
  • Immutable infrastructure and processes

IaC Level 1: Consistent

Automated processes applied across whole application lifecycle

  • Fully automated provisioning and management of infrastructure
  • Minimal use of unsupported, ‘home-grown’ infrastructure tooling
  • Unit-tests meet code-coverage requirements
  • Code is continuously tested upon every check-in to version control system
  • Continuously available infrastructure using zero-downtime provisioning
  • Uses configuration registries
  • Templatized configuration files (no awk/sed magic)
  • Secrets are securely management
  • Auto-scaling based on user-defined load characteristics

IaC Level 2: Quantitatively Managed

Processes measured and controlled

  • Uses infrastructure definition files
  • Capable of automated rollbacks
  • Infrastructure and supporting systems are highly available and fault tolerant
  • Externalized configuration, no black box API to modify configuration
  • Fully monitored infrastructure with configurable alerting
  • Aggregated, auditable infrastructure logging
  • All code, processes, and procedures are well documented in a Knowledge Management System
  • Infrastructure code uses declarative versus imperative programming model, maybe…

IaC Level 3: Optimizing

Focus on process improvement

  • Self-healing, self-configurable, self-optimizing, infrastructure
  • Performance tested and monitored against business KPIs
  • Maximal infrastructure utilization and workload density
  • Adheres to Cloud Native and 12-Factor patterns
  • Cloud-agnostic code that minimizes cloud vendor lock-in

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

3 Comments

Operational Readiness Analysis

“Analysis: a detailed examination of the elements or structure of something, typically as a basis for discussion or interpretation.” — Google

Introduction

Recently, I had the opportunity, along with a colleague, to perform an operational readiness analysis of a client’s new application platform. The platform was in late-stage development, only a few weeks from going live. Being relatively new to the project team, our objective was to determine the amount of work remaining to make the platform operational, from a technical perspective. As well, we wanted to ensure there were no gaps in the project’s scope, as it related to technical operational readiness.

There were various approaches we could have taken to establish the amount of unfinished work and any existing gaps. We could have reviewed the state of the project in the client’s Agile project management system. We could have discussed the project’s state with the Product Owner, Project Manager, Team Lead, and Business Analyst. Neither of these methods on their own would have provided us with a complete picture.

The approach we ultimately took was what we’ve coined, an Operational Readiness Analysis, or our trendier title, the ‘State of the DevOps’. This approach is particularly effective and highly interactive. The analysis may be conducted at any stage of the Software Development Lifecycle (SDLC), to review a project’s operational state of readiness. The analysis involves five simple steps:

  1. Categorize
  2. Itemize
  3. Organize
  4. Prioritize
  5. Document

Grab Your Post-It Notes

We began, where all good ThoughtWorkers tend to begin, with a meeting room, whiteboard, dry-erase markers, and lots of brightly colored Post-It notes. ThoughtWorkers do love their Post-It notes. We gathered a small subset of the project team, including the Product Owner, Project Manager, and the team’s DevOps Engineer.

Step 1: Categorize

To start, we broadly categorized the operational requirements into buckets of work. In my experience, categorization typically takes one of two forms, either the use of role-based descriptors, like ‘Security’ and ‘Monitoring’, or the use of the ‘ilities’, such as ‘Scalability’ and ‘Maintainability’.

The ‘ilities’ are often referenced in regards to software architecture and the functional and nonfunctional requirements (NFRs) of a project. Although there are many ‘ilities’, project requirements frequently include Usability, Scalability, Maintainability, Reliability, Testability, Availability, and Serviceability. Although I enjoy referring to the ‘ilities’ when drafting project requirements, I find a broad audience more readily understands the high-level category descriptions.

We collectively agreed on eight categories, in addition to a catch-all ‘Other’ category, and distributed them on the whiteboard.

whiteboard-step-1

Role-based categories tend to include the following:

  • Security and Compliance
  • Monitoring and Alerting
  • Logging
  • Continuous Integration and Delivery
  • Release Management
  • Configuration
  • Environment Management
  • Infrastructure
  • Networking
  • Data Management
  • High Availability and Fault Tolerance
  • Performance
  • Backup and Restoration
  • Support and Maintenance
  • Documentation and Training
  • Technical Debt
  • Other

Step 2: Itemize

Next, each person placed as many Post-It notes as they wanted, below the categories on the whiteboard. Each Post-It represented an item someone felt needed to be addressed as part the analysis. Items might represent requirements currently in progress, or requirements still untouched in the project’s backlog. A Post-It note might contain a new requirement. Maybe there is a question about an aspect of the project’s relative importance, or there were concerns about how an aspect of the project had been implemented.

Participants placed an average of five to ten Post-It notes on the whiteboard. The end product was a collective mind-map of the project’s operational state (the following is only an example and doesn’t represent the actual results of any real client analysis).

whiteboard-step-2

Whereas categories tend to be generic in nature, the items are usually particular to the project, and it’s technology choices. The ‘Monitoring/Logging’ category might include individual items such as ‘Log Rotation Policy’ or ‘Splunk vs. ELK?’. The ‘Security’ category might include individual items such as ‘Need Password Rotation’ or ‘SSL Certificate Management’.

Participants often use one or two words to summarize more complex thoughts. For example, ‘Rollback’ might be stating, ‘we still don’t have a good rollback strategy for the database’. ‘Release Frequency’ may be questioning, ‘why can’t we release more frequently?’. Make sure to discuss and capture the participant’s full thoughts behind each Post-It, the during the Organize, Prioritize, and Document stages.

Don’t let participants give up too soon placing Post-It notes on the board. Often, one participant’s Post-It will spark additional thoughts other team members. Don’t be afraid to suggest a few items, if the group needs some initial motivation. Lastly, fear not, it’s never too late to go back and add missing items, uncovered in later stages of analysis.

Step 3: Organize

Quickly and non-judgmentally, review all items on the whiteboard. Group duplicates together. Ask for clarification on items where necessary. Move miscategorized items, if the author and team agree there is a more appropriate category.

Examine the ‘Other’ category. If there are items in the ‘Other’ category that are similar, consider adding a new category to capture them. Maybe the team missed identifying a key category, earlier. Conversely, there is nothing wrong with leaving a few stragglers in the ‘Other’ category; don’t overthink the exercise.

Participants should leave this stage with a general understanding of each item on the whiteboard.

Step 4: Prioritize

With all items identified, organized, and understood, discuss each item’s status. Does the item represent incomplete requirements? Does everyone agree an item’s requirements are complete? Team members may not agree on the completeness of particular items. Inconsistencies may reflect unclear requirements. More often, I find, disagreements are a result of an inconsistent or incomplete implementation of requirements across environments — Development, QA, Performance, Staging, and Production.

Why are operational requirements frequently implemented inconsistently? Often, a story is played early in the development stage, prior to all environments being available. For example, a story is played to add monitoring to Development and QA. The story is completed, tested, and closed. Afterward, the Performance environment is created. Later yet, the Staging and Production environments are created. But, there are no new stories to address monitoring the new environments.

One way to avoid this common issue with operational requirements is an effective DevOps automation strategy. In this example, an effective strategy would mean all new environments would get monitoring, automatically. The story should broadly address monitoring, and not, short-sightedly, monitoring of specific environments.

Often, only one or two project team members are responsible for ‘all the DevOps stuff’. They alone possess an accurate perspective on an item’s current status.

Gain agreement from the Product Owner, Project Manager, and key team members on the priority of incomplete requirements. Are they ‘Day One’ must-haves, or ‘Day Two’ and can be completed after the application goes live? Flag all high-priority items.

If the status of an item is unknown, such as ‘why is the QA environment always down?!’, note it as an open question and move on. If the item simply requires a quick answer to resolve, like ‘do we backup the database?’, answer it (‘the database is replicated and backed up daily’) and move on.

whiteboard-step-4

Step 5: Document

Snap a picture of the whiteboard and document its contents. We chose the client’s knowledge management system as a vehicle to share the whiteboard’s results. We created a table with columns for Category, Item, Description, Priority, Owner, and Notes. Also, it’s helpful to have a column for the project’s corresponding Story ID or Defect ID. Along with the whiteboard, capture key discussion points, questions, and concerns, raised by the participants.

Step 5: Document

Snap a picture of the whiteboard and document its contents. We chose Confluence to share the whiteboard results, using a table with columns for Category, Item, Description, Priority, Owner, and Notes. We also bulleted all key discussion points, questions, and concerns, raised by the participants.

operational-readiness-spreadsheet

Make the analysis results available to all team members. Let the team ask questions and poke holes in the results. Adjust the results if necessary.

Next Steps

The actions you take, given the results of the analysis, are up to you. In our case, we ensured all high-priority items were called out on the team’s Agile board. Any new items were captured in the client’s Agile project management system. Finally, we ensured open questions and concerns were addressed in a timely fashion. We continue to track each item’s status weekly, throughout the launch and post-launch periods.

We anticipate conducting a follow-up analysis, thirty days after launch. The goal will be to evaluate the effectiveness of the first analysis and identify additional operational needs as the application enters a business-as-usual (BAU) application lifecycle phase.


Postscript: the ‘ilities

The ‘ilities’, courtesy of codesqueeze.com and en.wikipedia.org

  • Scalability
    The capability of a system, network, or process to handle a growing amount of work or its potential to be enlarged to accommodate that growth.
  • Testability
    The practical feasibility of observing a reproducible series of such counterexamples if they do exist.
  • Reliability (Resilience)
    The ability of a system or component to perform its required functions under stated conditions for a specified time.
  • Usability (Performance)
    The degree to which software can be used by specified consumers to achieve quantified objectives with effectiveness, efficiency, and satisfaction in a quantified context of use.
  • Serviceability (Supportability)
    The ability of technical support personnel to install, configure, and monitor computer products, identify exceptions or faults, debug or isolate faults to root cause analysis, and provide hardware or software maintenance in pursuit of solving a problem and restoring the product into service.
  • Availability
    The proportion of time a system is in a functioning condition. Defined in the service-level agreement (SLA).
  • Maintainability
    The ease with which a product can be maintained. Isolate and correct defects or their cause, prevent unexpected breakdowns, maximize a product’s useful life, maximize efficiency, reliability, and safety, meet new requirements, make future maintenance easier, and cope with a changed environment.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , ,

Leave a comment

Spring Music Revisited: Java-Spring-MongoDB Web App with Docker 1.12

Build, test, deploy, and monitor a multi-container, MongoDB-backed, Java Spring web application, using the new Docker 1.12.

Spring Music Infrastructure

Introduction

** This post and associated project code were updated 9/3/2016 to use Tomcat 8.5.4 with OpenJDK 8.**

This post and the post’s example project represent an update to a previous post, Build and Deploy a Java-Spring-MongoDB Application using Docker. This new post incorporates many improvements made in Docker 1.12, including the use of the new Docker Compose v2 YAML format. The post’s project was also updated to use Filebeat with ELK, as opposed to Logspout, which was used previously.

In this post, we will demonstrate how to build, test, deploy, and manage a Java Spring web application, hosted on Apache Tomcat, load-balanced by NGINX, monitored by ELK with Filebeat, and all containerized with Docker.

We will use a sample Java Spring application, Spring Music, available on GitHub from Cloud Foundry. The Spring Music sample record album collection application was originally designed to demonstrate the use of database services on Cloud Foundry, using the Spring Framework. Instead of Cloud Foundry, we will host the Spring Music application locally, using Docker on VirtualBox, and optionally on AWS.

All files necessary to build this project are stored on the docker_v2 branch of the garystafford/spring-music-docker repository on GitHub. The Spring Music source code is stored on the springmusic_v2 branch of the garystafford/spring-music repository, also on GitHub.

Spring Music Application

Application Architecture

The Java Spring Music application stack contains the following technologies: JavaSpring Framework, AngularJS, Bootstrap, jQueryNGINXApache TomcatMongoDB, the ELK Stack, and Filebeat. Testing frameworks include the Spring MVC Test Framework, Mockito, Hamcrest, and JUnit.

A few changes were made to the original Spring Music application to make it work for this demonstration, including:

  • Move from Java 1.7 to 1.8 (including newer Tomcat version)
  • Add unit tests for Continuous Integration demonstration purposes
  • Modify MongoDB configuration class to work with non-local, containerized MongoDB instances
  • Add Gradle warNoStatic task to build WAR without static assets
  • Add Gradle zipStatic task to ZIP up the application’s static assets for deployment to NGINX
  • Add Gradle zipGetVersion task with a versioning scheme for build artifacts
  • Add context.xml file and MANIFEST.MF file to the WAR file
  • Add Log4j RollingFileAppender appender to send log entries to Filebeat
  • Update versions of several dependencies, including Gradle, Spring, and Tomcat

We will use the following technologies to build, publish, deploy, and host the Java Spring Music application: GradlegitGitHubTravis CIOracle VirtualBoxDockerDocker ComposeDocker MachineDocker Hub, and optionally, Amazon Web Services (AWS).

NGINX
To increase performance, the Spring Music web application’s static content will be hosted by NGINX. The application’s WAR file will be hosted by Apache Tomcat 8.5.4. Requests for non-static content will be proxied through NGINX on the front-end, to a set of three load-balanced Tomcat instances on the back-end. To further increase application performance, NGINX will also be configured for browser caching of the static content. In many enterprise environments, the use of a Java EE application server, like Tomcat, is still not uncommon.

Reverse proxying and caching are configured thought NGINX’s default.conf file, in the server configuration section:

The three Tomcat instances will be manually configured for load-balancing using NGINX’s default round-robin load-balancing algorithm. This is configured through the default.conf file, in the upstream configuration section:

Client requests are received through port 80 on the NGINX server. NGINX redirects requests, which are not for non-static assets, to one of the three Tomcat instances on port 8080.

MongoDB
The Spring Music application was designed to work with a number of data stores, including MySQL, Postgres, Oracle, MongoDB, Redis, and H2, an in-memory Java SQL database. Given the choice of both SQL and NoSQL databases, we will select MongoDB.

The Spring Music application, hosted by Tomcat, will store and modify record album data in a single instance of MongoDB. MongoDB will be populated with a collection of album data from a JSON file, when the Spring Music application first creates the MongoDB database instance.

ELK
Lastly, the ELK Stack with Filebeat, will aggregate NGINX, Tomcat, and Java Log4j log entries, providing debugging and analytics to our demonstration. A similar method for aggregating logs, using Logspout instead of Filebeat, can be found in this previous post.

Kibana 4 Web Console

Continuous Integration

In this post’s example, two build artifacts, a WAR file for the application and ZIP file for the static web content, are built automatically by Travis CI, whenever source code changes are pushed to the springmusic_v2 branch of the garystafford/spring-music repository on GitHub.

Travis CI Output

Following a successful build and a small number of unit tests, Travis CI pushes the build artifacts to the build-artifacts branch on the same GitHub project. The build-artifacts branch acts as a pseudo binary repository for the project, much like JFrog’s Artifactory. These artifacts are used later by Docker to build the project’s immutable Docker images and containers.

Build Artifact Repository

Build Notifications
Travis CI pushes build notifications to a Slack channel, which eliminates the need to actively monitor Travis CI.

Travis CI Slack Notifications

Automation Scripting
The .travis.yaml file, custom gradle.build Gradle tasks, and the deploy_travisci.sh script handles the Travis CI automation described, above.

Travis CI .travis.yaml file:

Custom gradle.build tasks:

The deploy.sh file:

You can easily replicate the project’s continuous integration automation using your choice of toolchains. GitHub or BitBucket are good choices for distributed version control. For continuous integration and deployment, I recommend Travis CI, Semaphore, Codeship, or Jenkins. Couple those with a good persistent chat application, such as Glider Labs’ Slack or Atlassian’s HipChat.

Building the Docker Environment

Make sure VirtualBox, Docker, Docker Compose, and Docker Machine, are installed and running. At the time of this post, I have the following versions of software installed on my Mac:

  • Mac OS X 10.11.6
  • VirtualBox 5.0.26
  • Docker 1.12.1
  • Docker Compose 1.8.0
  • Docker Machine 0.8.1

To build the project’s VirtualBox VM, Docker images, and Docker containers, execute the build script, using the following command: sh ./build_project.sh. A build script is useful when working with CI/CD automation tools, such as Jenkins CI or ThoughtWorks go. However, to understand the build process, I suggest first running the individual commands, locally.

Deploying to AWS
By simply changing the Docker Machine driver to AWS EC2 from VirtualBox, and providing your AWS credentials, the springmusic environment may also be built on AWS.

Build Process
Docker Machine provisions a single VirtualBox springmusic VM on which host the project’s containers. VirtualBox provides a quick and easy solution that can be run locally for initial development and testing of the application.

Next, the script creates a Docker data volume and project-specific Docker bridge network.

Next, using the project’s individual Dockerfiles, Docker Compose pulls base Docker images from Docker Hub for NGINX, Tomcat, ELK, and MongoDB. Project-specific immutable Docker images are then built for NGINX, Tomcat, and MongoDB. While constructing the project-specific Docker images for NGINX and Tomcat, the latest Spring Music build artifacts are pulled and installed into the corresponding Docker images.

Docker Compose builds and deploys (6) containers onto the VirtualBox VM: (1) NGINX, (3) Tomcat, (1) MongoDB, and (1) ELK.

The NGINX Dockerfile:

The Tomcat Dockerfile:

Docker Compose v2 YAML
This post was recently updated for Docker 1.12, and to use Docker Compose v2 YAML file format. The post’s docker-compose.yml takes advantage of improvements in Docker 1.12 and Docker Compose v2 YAML. Improvements to the YAML file include eliminating the need to link containers and expose ports, and the addition of named networks and volumes.

The Results

Spring Music Infrastructure

Below are the results of building the project.

Testing the Application

Below are partial results of the curl test, hitting the NGINX endpoint. Note the different IP addresses in the Upstream-Address field between requests. This test proves NGINX’s round-robin load-balancing is working across the three Tomcat application instances: music_app_1, music_app_2, and music_app_3.

Also, note the sharp decrease in the Request-Time between the first three requests and subsequent three requests. The Upstream-Response-Time to the Tomcat instances doesn’t change, yet the total Request-Time is much shorter, due to caching of the application’s static assets by NGINX.

Spring Music Application Links

Assuming the springmusic VM is running at 192.168.99.100, the following links can be used to access various project endpoints. Note the (3) Tomcat instances each map to randomly exposed ports. These ports are not required by NGINX, which maps to port 8080 for each instance. The port is only required if you want access to the Tomcat Web Console. The port, shown below, 32771, is merely used as an example.

* The Tomcat user name is admin and the password is t0mcat53rv3r.

Helpful Links

TODOs

  • Automate the Docker image build and publish processes
  • Automate the Docker container build and deploy processes
  • Automate post-deployment verification testing of project infrastructure
  • Add Docker Swarm multi-host capabilities with overlay networking
  • Update Spring Music with latest CF project revisions
  • Include scripting example to stand-up project on AWS
  • Add Consul and Consul Template for NGINX configuration

, , , , , , , , , , , ,

8 Comments

Automate the Provisioning and Configuration of HAProxy and an Apache Web Server Cluster Using Foreman

Use Vagrant, Foreman, and Puppet to provision and configure HAProxy as a reverse proxy, load-balancer for a cluster of Apache web servers.

Simple Load Balanced 2

Introduction

In this post, we will use several technologies, including VagrantForeman, and Puppet, to provision and configure a basic load-balanced web server environment. In this environment, a single node with HAProxy will act as a reverse proxy and load-balancer for two identical Apache web server nodes. All three nodes will be provisioned and bootstrapped using Vagrant, from a Linux CentOS 6.5 Vagrant Box. Afterwards, Foreman, with Puppet, will then be used to install and configure the nodes with HAProxy and Apache, using a series of Puppet modules.

For this post, I will assume you already have running instances of Vagrant with the vagrant-hostmanager plugin, VirtualBox, and Foreman. If you are unfamiliar with Vagrant, the vagrant-hostmanager plugin, VirtualBox, Foreman, or Puppet, review my recent post, Installing Foreman and Puppet Agent on Multiple VMs Using Vagrant and VirtualBox. This post demonstrates how to install and configure Foreman. In addition, the post also demonstrates how to provision and bootstrap virtual machines using Vagrant and VirtualBox. Basically, we will be repeating many of this same steps in this post, with the addition of HAProxy, Apache, and some custom configuration Puppet modules.

All code for this post is available on GitHub. However, it been updated as of 8/23/2015. Changes were required to fix compatibility issues with the latest versions of Puppet 4.x and Foreman. Additionally, the version of CentOS on all VMs was updated from 6.6 to 7.1 and the version of Foreman was updated from 1.7 to 1.9.

Steps

Here is a high-level overview of our steps in this post:

  1. Provision and configure the three CentOS-based virtual machines (‘nodes’) using Vagrant and VirtualBox
  2. Install the HAProxy and Apache Puppet modules, from Puppet Forge, onto the Foreman server
  3. Install the custom HAProxy and Apache Puppet configuration modules, from GitHub, onto the Foreman server
  4. Import the four new module’s classes to Foreman’s Puppet class library
  5. Add the three new virtual machines (‘hosts’) to Foreman
  6. Configure the new hosts in Foreman, assigning the appropriate Puppet classes
  7. Apply the Foreman Puppet configurations to the new hosts
  8. Test HAProxy is working as a reverse and proxy load-balancer for the two Apache web server nodes

In this post, I will use the terms ‘virtual machine’, ‘machine’, ‘node’, ‘agent node’, and ‘host’, interchangeable, based on each software’s own nomenclature.

Provisioning

First, using the process described in the previous post, provision and bootstrap the three new virtual machines. The new machine’s Vagrant configuration is shown below. This should be added to the JSON configuration file. All code for the earlier post is available on GitHub.

{
  "nodes": {
    "haproxy.example.com": {
      ":ip": "192.168.35.101",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node01.example.com": {
      ":ip": "192.168.35.121",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node02.example.com": {
      ":ip": "192.168.35.122",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    }
  }
}

After provisioning and bootstrapping, observe the three machines running in Oracle’s VM VirtualBox Manager.

Oracle VM VirtualBox Manager View of New Nodes

Oracle VM VirtualBox Manager View of New Nodes

Installing Puppet Forge Modules

The next task is to install the HAProxy and Apache Puppet modules on the Foreman server. This allows Foreman to have access to them. I chose the puppetlabs-haproxy HAProxy module and the puppetlabs-apache Apache modules. Both modules were authored by Puppet Labs, and are available on Puppet Forge.

The exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory.

sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-haproxy
sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-apache

# confirm module installation
puppet module list --modulepath /etc/puppet/environments/production/modules

Installing Configuration Modules

Next, install the HAProxy and Apache configuration Puppet modules on the Foreman server. Both modules are hosted on my GitHub repository. Both modules can be downloaded directly from GitHub and installed on the Foreman server, from the command line. Again, the exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory. Also, notice I am currently downloading version 0.1.0 of both modules at the time of writing this post. Make sure to double-check for the latest versions of both modules before running the commands. Modify the commands if necessary.

# apache config module
wget -N https://github.com/garystafford/garystafford-apache_example_config/archive/v0.1.0.tar.gz && \
sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force

# haproxy config module
wget -N https://github.com/garystafford/garystafford-haproxy_node_config/archive/v0.1.0.tar.gz && \
sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force

# confirm module installation
puppet module list --modulepath /etc/puppet/environments/production/modules
GitHub Repository for Apache Config Example

GitHub Repository for Apache Config Example

HAProxy Configuration
The HAProxy configuration module configures HAProxy’s /etc/haproxy/haproxy.cfg file. The single class in the module’s init.pp manifest is as follows:

class haproxy_node_config () inherits haproxy {
  haproxy::listen { 'puppet00':
    collect_exported => false,
    ipaddress        => '*',
    ports            => '80',
    mode             => 'http',
    options          => {
      'option'  => ['httplog'],
      'balance' => 'roundrobin',
    },
  }

  Haproxy::Balancermember <<| listening_service == 'puppet00' |>>

  haproxy::balancermember { 'haproxy':
    listening_service => 'puppet00',
    server_names      => ['node01.example.com', 'node02.example.com'],
    ipaddresses       => ['192.168.35.121', '192.168.35.122'],
    ports             => '80',
    options           => 'check',
  }
}

The resulting /etc/haproxy/haproxy.cfg file will have the following configuration added. It defines the two Apache web server node’s hostname, ip addresses, and http port. The configuration also defines the load-balancing method, ‘round-robin‘ in our example. In this example, we are using layer 7 load-balancing (application layer – http), as opposed to layer 4 load-balancing (transport layer – tcp). Either method will work for this example. The Puppet Labs’ HAProxy module’s documentation on Puppet Forge and HAProxy’s own documentation are both excellent starting points to understand how to configure HAProxy. We are barely scraping the surface of HAProxy’s capabilities in this brief example.

listen puppet00
  bind *:80
  mode  http
  balance  roundrobin
  option  httplog
  server node01.example.com 192.168.35.121:80 check
  server node02.example.com 192.168.35.122:80 check

Apache Configuration
The Apache configuration module creates default web page in Apache’s docroot directory, /var/www/html/index.html. The single class in the module’s init.pp manifest is as follows:
ApacheConfigClass
The resulting /var/www/html/index.html file will look like the following. Observe that the facter variables shown in the module manifest above have been replaced by the individual node’s hostname and ip address during application of the configuration by Puppet (ie. ${fqdn} became node01.example.com).

ApacheConfigClass

Both of these Puppet modules were created specifically to configure HAProxy and Apache for this post. Unlike published modules on Puppet Forge, these two modules are very simple, and don’t necessarily represent the best practices and patterns for authoring Puppet Forge modules.

Importing into Foreman

After installing the new modules onto the Foreman server, we need to import them into Foreman. This is accomplished from the ‘Puppet classes’ tab, using the ‘Import from theforeman.example.com’ button. Once imported, the module classes are available to assign to host machines.

Importing Puppet Classes into Foreman

Importing Puppet Classes into Foreman

Add Host to Foreman

Next, add the three new hosts to Foreman. If you have questions on how to add the nodes to Foreman, start Puppet’s Certificate Signing Request (CSR) process on the hosts, signing the certificates, or other first time tasks, refer to the previous post. That post explains this process in detail.

Foreman Hosts Tab Showing New Nodes

Foreman Hosts Tab Showing New Nodes

Configure the Hosts

Next, configure the HAProxy and Apache nodes with the necessary Puppet classes. In addition to the base module classes and configuration classes, I recommend adding git and ntp modules to each of the new nodes. These modules were explained in the previous post. Refer to the screen-grabs below for correct module classes to add, specific to HAProxy and Apache.

HAProxy Node Puppet Classes Tab

HAProxy Node Puppet Classes Tab

Apache Nodes Puppet Classes Tab

Apache Nodes Puppet Classes Tab

Agent Configuration and Testing the System

Once configurations are retrieved and applied by Puppet Agent on each node, we can test our reverse proxy load-balanced environment. To start, open a browser and load haproxy.paychex.com. You should see one of the two pages below. Refresh the page a few times. You should observe HAProxy re-directing you to one Apache web server node, and then the other, using HAProxy’s round-robin algorithm. You can differentiate the Apache web servers by the hostname and ip address displayed on the web page.

Load Balancer Directing Traffic to Node01

Load Balancer Directing Traffic to Node01

Load Balancer Directing Traffic to Node02

Load Balancer Directing Traffic to Node02

After hitting HAProxy’s URL several times successfully, view HAProxy’s built-in Statistics Report page at http://haproxy.example.com/haproxy?stats. Note below, each of the two Apache node has been hit 44 times each from HAProxy. This demonstrates the effectiveness of the reverse proxy and load-balancing features of HAProxy.

Statistics Report for HAProxy

Statistics Report for HAProxy

Accessing Apache Directly
If you are testing HAProxy from the same machine on which you created the virtual machines (VirtualBox host), you will likely be able to directly access either of the Apache web servers (ei. node02.example.com). The VirtualBox host file contains the ip addresses and hostnames of all three hosts. This DNS configuration was done automatically by the vagrant-hostmanager plugin. However, in an actual Production environment, only the HAProxy server’s hostname and ip address would be publicly accessible to a user. The two Apache nodes would sit behind a firewall, accessible only by the HAProxy server. HAProxy acts as a façade to public side of the network.

Testing Apache Host Failure
The main reason you would likely use a load-balancer is high-availability. With HAProxy acting as a load-balancer, we should be able to impair one of the two Apache nodes, without noticeable disruption. HAProxy will continue to serve content from the remaining Apache web server node.

Log into node01.example.com, using the following command, vagrant ssh node01.example.com. To simulate an impairment on ‘node01’, run the following command to stop Apache, sudo service httpd stop. Now, refresh the haproxy.example.com URL in your web browser. You should notice HAProxy is now redirecting all traffic to node02.example.com.

Troubleshooting

While troubleshooting HAProxy configuration issues for this demonstration, I discovered logging is not configured by default on CentOS. No worries, I recommend HAProxy: Give me some logs on CentOS 6.5!, by Stephane Combaudon, to get logging running. Once logging is active, you can more easily troubleshoot HAProxy and Apache configuration issues. Here are some example commands you might find useful:

# haproxy
sudo more -f /var/log/haproxy.log
sudo haproxy -f /etc/haproxy/haproxy.cfg -c # check/validate config file

# apache
sudo ls -1 /etc/httpd/logs/
sudo tail -50 /etc/httpd/logs/error_log
sudo less /etc/httpd/logs/access_log

Redundant Proxies

In this simple example, the system’s weakest point is obviously the single HAProxy instance. It represents a single-point-of-failure (SPOF) in our environment. In an actual production environment, you would likely have more than one instance of HAProxy. They may both be in a load-balanced pool, or one active and on standby as a failover, should one instance become impaired. There are several techniques for building in proxy redundancy, often with the use of Virtual IP and Keepalived. Below is a list of articles that might help you take this post’s example to the next level.

, , , , , , , , , , , , ,

Leave a comment

Installing Foreman and Puppet Agent on Multiple VMs Using Vagrant and VirtualBox

Automatically install and configure Foreman, the open source infrastructure lifecycle management tool, and multiple Puppet Agent VMs using Vagrant and VirtualBox.

Foreman - Overview

Introduction

In the last post, Installing Puppet Master and Agents on Multiple VM Using Vagrant and VirtualBox, we installed Puppet Master/Agent on VirtualBox VMs using Vagrant. Puppet Master is an excellent tool, but lacks the ease-of-use of Puppet Enterprise or Foreman. In this post, we will build an almost identical environment, substituting Foreman for Puppet Master.

According to Foreman’s website, “Foreman is an open source project that helps system administrators manage servers throughout their lifecycle, from provisioning and configuration to orchestration and monitoring. Using Puppet or Chef and Foreman’s smart proxy architecture, you can easily automate repetitive tasks, quickly deploy applications, and proactively manage change, both on-premise with VMs and bare-metal or in the cloud.

Combined with Puppet Labs’ Open Source Puppet, Foreman is an effective solution to manage infrastructure and system configuration. Again, according to Foreman’s website, the Foreman installer is a collection of Puppet modules that installs everything required for a full working Foreman setup. The installer uses native OS packaging and adds necessary configuration for the complete installation. By default, the Foreman installer will configure:

  • Apache HTTP with SSL (using a Puppet-signed certificate)
  • Foreman running under mod_passenger
  • Smart Proxy configured for Puppet, TFTP and SSL
  • Puppet master running under mod_passenger
  • Puppet agent configured
  • TFTP server (under xinetd on Red Hat platforms)

For the average Systems Engineer or Software Developer, installing and configuring Foreman, Puppet Master, Apache, Puppet Agent, and the other associated software packages listed above, is daunting. If the installation doesn’t work properly, you must troubleshooting, or trying to remove and reinstall some or all the components.

A better solution is to automate the installation of Foreman into a Docker container, or on to a VM using Vagrant. Automating the installation process guarantees accuracy and consistency. The Vagrant VirtualBox VM can be snapshotted, moved to another host, or simply destroyed and recreated, if needed.

All code for this post is available on GitHub. However, it been updated as of 8/23/2015. Changes were required to fix compatibility issues with the latest versions of Puppet 4.x and Foreman. Additionally, the version of CentOS on all VMs was updated from 6.6 to 7.1 and the version of Foreman was updated from 1.7 to 1.9.

The Post’s Example

In this post, we will use Vagrant and VirtualBox to create three VMs. The VMs in this post will be build from a standard CentOS 6.5 x64 base Vagrant Box, located on Atlas. We will use a single JSON-format configuration file to automatically build all three VMs with Vagrant. As part of the provisioning process, using Vagrant’s shell provisioner, we will execute a bootstrap shell script. The script will install Foreman and it’s associated software on the first VM, and Puppet Agent on the two remaining VMs (aka Puppet ‘agent nodes’ or Foreman ‘hosts’).

Foreman does have the ability to provision on bare-metal infrastructure and public or private clouds. However, this example would simulate an environment where you have existing nodes you want to manage with Foreman.

The Foreman bootstrap script will also download several Puppet modules. To test Foreman once the provisioning is complete, import those module’s classes into Foreman and assign the classes to the hosts. The hosts will fetch and apply the configurations. You can then test for the installed instances of those module’s components on the puppet agent hosts.

Vagrant

To begin the process, we will use the JSON-format configuration file to create the three VMs, using Vagrant and VirtualBox.

{
  "nodes": {
    "theforeman.example.com": {
      ":ip": "192.168.35.5",
      "ports": [],
      ":memory": 1024,
      ":bootstrap": "bootstrap-foreman.sh"
    },
    "agent01.example.com": {
      ":ip": "192.168.35.10",
      "ports": [],
      ":memory": 1024,
      ":bootstrap": "bootstrap-node.sh"
    },
    "agent02.example.com": {
      ":ip": "192.168.35.20",
      "ports": [],
      ":memory": 1024,
      ":bootstrap": "bootstrap-node.sh"
    }
  }
}

The Vagrantfile uses the JSON-format configuration file, to provision the three VMs, using a single ‘vagrant up‘ command. That’s it, less than 30 lines of actual code in the Vagrantfile to create as many VMs as you want. For this post’s example, we will not need to add any VirtualBox port mappings. However, that can also done from the JSON configuration file (see the READM.md for more directions).

 

Vagrant Provisioning the VMs

Vagrant Provisioning the VMs

If you have not used the CentOS Vagrant Box, it will take a few minutes the first time for Vagrant to download the it to the local Vagrant Box repository.

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Builds single Foreman server and
# multiple Puppet Agent Nodes using JSON config file
# Gary A. Stafford - 01/15/2015

# read vm and chef configurations from JSON files
nodes_config = (JSON.parse(File.read("nodes.json")))['nodes']

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "chef/centos-6.5"

  nodes_config.each do |node|
    node_name   = node[0] # name of node
    node_values = node[1] # content of node

    config.vm.define node_name do |config|
      # configures all forwarding ports in JSON array
      ports = node_values['ports']
      ports.each do |port|
        config.vm.network :forwarded_port,
          host:  port[':host'],
          guest: port[':guest'],
          id:    port[':id']
      end

      config.vm.hostname = node_name
      config.vm.network :private_network, ip: node_values[':ip']

      config.vm.provider :virtualbox do |vb|
        vb.customize ["modifyvm", :id, "--memory", node_values[':memory']]
        vb.customize ["modifyvm", :id, "--name", node_name]
      end

      config.vm.provision :shell, :path => node_values[':bootstrap']
    end
  end
end

Once provisioned, the three VMs, also called ‘Machines’ by Vagrant, should appear in Oracle VM VirtualBox Manager.

Oracle VM VirtualBox Manager View

Oracle VM VirtualBox Manager View

The name of the VMs, referenced in Vagrant commands, is the parent node name in the JSON configuration file (node_name), such as, ‘vagrant ssh theforeman.example.com‘.

Vagrant Status

Bootstrapping Foreman

As part of the Vagrant provisioning process (‘vagrant up‘ command), a bootstrap script is executed on the VMs (shown below). This script will do almost of the installation and configuration work. Below is script for bootstrapping the Foreman VM.

#!/bin/sh

# Run on VM to bootstrap Foreman server
# Gary A. Stafford - 01/15/2015

if ps aux | grep "/usr/share/foreman" | grep -v grep 2> /dev/null
then
    echo "Foreman appears to all already be installed. Exiting..."
else
    # Configure /etc/hosts file
    echo "" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "192.168.35.5    theforeman.example.com   theforeman" | sudo tee --append /etc/hosts 2> /dev/null

    # Update system first
    sudo yum update -y

    # Install Foreman for CentOS 6
    sudo rpm -ivh http://yum.puppetlabs.com/puppetlabs-release-el-6.noarch.rpm && \
    sudo yum -y install epel-release http://yum.theforeman.org/releases/1.7/el6/x86_64/foreman-release.rpm && \
    sudo yum -y install foreman-installer && \
    sudo foreman-installer

    # First run the Puppet agent on the Foreman host which will send the first Puppet report to Foreman,
    # automatically creating the host in Foreman's database
    sudo puppet agent --test --waitforcert=60

    # Install some optional puppet modules on Foreman server to get started...
    sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-ntp
    sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-git
    sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-docker
fi

Bootstrapping Puppet Agent Nodes

Below is script for bootstrapping the puppet agent nodes. The agent node bootstrap script was executed as part of the Vagrant provisioning process.

#!/bin/sh

# Run on VM to bootstrap Puppet Agent nodes
# Gary A. Stafford - 01/15/2015

if ps aux | grep "puppet agent" | grep -v grep 2> /dev/null
then
    echo "Puppet Agent is already installed. Moving on..."
else
    # Update system first
    sudo yum update -y

    # Install Puppet for CentOS 6
    sudo rpm -ivh http://yum.puppetlabs.com/puppetlabs-release-el-6.noarch.rpm && \
    sudo yum -y install puppet

    # Configure /etc/hosts file
    echo "" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "192.168.35.5    theforeman.example.com   theforeman" | sudo tee --append /etc/hosts 2> /dev/null

    # Add agent section to /etc/puppet/puppet.conf (sets run interval to 120 seconds)
    echo "" | sudo tee --append /etc/puppet/puppet.conf 2> /dev/null && \
    echo "    server = theforeman.example.com" | sudo tee --append /etc/puppet/puppet.conf 2> /dev/null && \
    echo "    runinterval = 120" | sudo tee --append /etc/puppet/puppet.conf 2> /dev/null

    sudo service puppet stop
    sudo service puppet start

    sudo puppet resource service puppet ensure=running enable=true
    sudo puppet agent --enable
fi

Now that the Foreman is running, use the command, ‘vagrant ssh agent01.example.com‘, to ssh into the first puppet agent node. Run the command below.

sudo puppet agent --test --waitforcert=60

The command above manually starts Puppet’s Certificate Signing Request (CSR) process, to generate the certificates and security credentials (private and public keys) generated by Puppet’s built-in certificate authority (CA). Each puppet agent node must have it certificate signed by the Foreman, first. According to Puppet’s website, “Before puppet agent nodes can retrieve their configuration catalogs, they need a signed certificate from the local Puppet certificate authority (CA). When using Puppet’s built-in CA (that is, not using an external CA), agents will submit a certificate signing request (CSR) to the CA Puppet Master (Foreman) and will retrieve a signed certificate once one is available.

Waiting for Certificate to be Signed by Foreman

Waiting for Certificate to be Signed by Foreman

Open the Foreman browser-based interface, running at https://theforeman.example.com. Proceed to the ‘Infrastructure’ -> ‘Smart Proxies’ tab. Sign the certificate(s) from the agent nodes (shown below). The agent node will wait for the Foreman to sign the certificate, before continuing with the initial configuration.

Certificate Waiting to be Signed in Foreman

Certificate Waiting to be Signed in Foreman

Once the certificate signing process is complete, the host retrieves the client configuration from the Foreman and applies it to the hosts.

Foreman Puppet Configuration Applied to Agent Node

Foreman Puppet Configuration Applied to Agent Node

That’s it, you should now have one host running Foreman and two puppet agent nodes.

Testing Foreman

To test Foreman, import the classes from the Puppet modules installed with the Foreman bootstrap script.

Foreman - Puppet Classes

Foreman – Puppet Classes

Next, apply  ntp, git, and Docker classes to both agent nodes (aka, Foreman ‘hosts’), as well as the Foreman node, itself.

Foreman - Agents Puppet Classes

Foreman – Agents Puppet Classes

Every two minutes, the two agent node hosts should fetch their latest configuration from Foreman and apply it. In a few minutes, check the times reported in the ‘Last report’ column on the ‘All Hosts’ tab. If the times are two minutes or less, Foreman and Puppet Agent are working. Note we changed the runinterval to 120 seconds (‘120s’) in the bootstrap script to speed up the Puppet Agent updates for the sake of the demo. The normal default interval is 30 minutes. I recommend changing the agent node’s runinterval back to 30 minutes (’30m’) on the hosts, once everything is working to save unnecessary use of resources.

Foreman - Hosts Reporting Back

Foreman – Hosts Reporting Back

Finally, to verify that the configuration was successfully applied to the hosts, check if ntp, git, and Docker are now running on the hosts.

Agent Node with ntp and git Now Installed

Agent Node with ntp and git Now Installed

Helpful Links

All the source code this project is on Github.

Foreman:
http://theforeman.org

Atlas – Discover Vagrant Boxes:
https://atlas.hashicorp.com/boxes/search

Learning Puppet – Basic Agent/Master Puppet
https://docs.puppetlabs.com/learning/agent_master_basic.html

Puppet Glossary (of terms):
https://docs.puppetlabs.com/references/glossary.html

, , , , , , , , ,

8 Comments

Installing Puppet Master and Agents on Multiple VM Using Vagrant and VirtualBox

 Automatically provision multiple VMs with Vagrant and VirtualBox. Automatically install, configure, and test Puppet Master and Puppet Agents on those VMs.

Puppet Master Agent Vagrant (3)

Introduction

Note this post and accompanying source code was updated on 12/16/2014 to v0.2.1. It contains several improvements to improve and simplify the install process.

Puppet Labs’ Open Source Puppet Agent/Master architecture is an effective solution to manage infrastructure and system configuration. However, for the average System Engineer or Software Developer, installing and configuring Puppet Master and Puppet Agent can be challenging. If the installation doesn’t work properly, the engineer’s stuck troubleshooting, or trying to remove and re-install Puppet.

A better solution, automate the installation of Puppet Master and Puppet Agent on Virtual Machines (VMs). Automating the installation process guarantees accuracy and consistency. Installing Puppet on VMs means the VMs can be snapshotted, cloned, or simply destroyed and recreated, if needed.

In this post, we will use Vagrant and VirtualBox to create three VMs. The VMs will be build from a  Ubuntu 14.04.1 LTS (Trusty Tahr) Vagrant Box, previously on Vagrant Cloud, now on Atlas. We will use a single JSON-format configuration file to build all three VMs, automatically. As part of the Vagrant provisioning process, we will run a bootstrap shell script to install Puppet Master on the first VM (Puppet Master server) and Puppet Agent on the two remaining VMs (agent nodes).

Lastly, to test our Puppet installations, we will use Puppet to install some basic Puppet modules, including ntp and git on the server, and ntpgitDocker and Fig, on the agent nodes.

All the source code this project is on Github.

Vagrant

To begin the process, we will use the JSON-format configuration file to create the three VMs, using Vagrant and VirtualBox.

{
  "nodes": {
    "puppet.example.com": {
      ":ip": "192.168.32.5",
      "ports": [],
      ":memory": 1024,
      ":bootstrap": "bootstrap-master.sh"
    },
    "node01.example.com": {
      ":ip": "192.168.32.10",
      "ports": [],
      ":memory": 1024,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node02.example.com": {
      ":ip": "192.168.32.20",
      "ports": [],
      ":memory": 1024,
      ":bootstrap": "bootstrap-node.sh"
    }
  }
}

The Vagrantfile uses the JSON-format configuration file, to provision the three VMs, using a single ‘vagrant up‘ command. That’s it, less than 30 lines of actual code in the Vagrantfile to create as many VMs as we need. For this post’s example, we will not need to add any port mappings, which can be done from the JSON configuration file (see the READM.md for more directions). The Vagrant Box we are using already has the correct ports opened.

If you have not previously used the Ubuntu Vagrant Box, it will take a few minutes the first time for Vagrant to download the it to the local Vagrant Box repository.

# vi: set ft=ruby :

# Builds Puppet Master and multiple Puppet Agent Nodes using JSON config file
# Author: Gary A. Stafford

# read vm and chef configurations from JSON files
nodes_config = (JSON.parse(File.read("nodes.json")))['nodes']

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"

  nodes_config.each do |node|
    node_name   = node[0] # name of node
    node_values = node[1] # content of node

    config.vm.define node_name do |config|
      # configures all forwarding ports in JSON array
      ports = node_values['ports']
      ports.each do |port|
        config.vm.network :forwarded_port,
          host:  port[':host'],
          guest: port[':guest'],
          id:    port[':id']
      end

      config.vm.hostname = node_name
      config.vm.network :private_network, ip: node_values[':ip']

      config.vm.provider :virtualbox do |vb|
        vb.customize ["modifyvm", :id, "--memory", node_values[':memory']]
        vb.customize ["modifyvm", :id, "--name", node_name]
      end

      config.vm.provision :shell, :path => node_values[':bootstrap']
    end
  end
end

Once provisioned, the three VMs, also referred to as ‘Machines’ by Vagrant, should appear, as shown below, in Oracle VM VirtualBox Manager.

Vagrant Machines in VM VirtualBox Manager

Vagrant Machines in VM VirtualBox Manager

The name of the VMs, referenced in Vagrant commands, is the parent node name in the JSON configuration file (node_name), such as, ‘vagrant ssh puppet.example.com‘.

Vagrant Machine Names

Vagrant Machine Names

Bootstrapping Puppet Master Server

As part of the Vagrant provisioning process, a bootstrap script is executed on each of the VMs (script shown below). This script will do 98% of the required work for us. There is one for the Puppet Master server VM, and one for each agent node.

#!/bin/sh

# Run on VM to bootstrap Puppet Master server

if ps aux | grep "puppet master" | grep -v grep 2> /dev/null
then
    echo "Puppet Master is already installed. Exiting..."
else
    # Install Puppet Master
    wget https://apt.puppetlabs.com/puppetlabs-release-trusty.deb && \
    sudo dpkg -i puppetlabs-release-trusty.deb && \
    sudo apt-get update -yq && sudo apt-get upgrade -yq && \
    sudo apt-get install -yq puppetmaster

    # Configure /etc/hosts file
    echo "" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "# Host config for Puppet Master and Agent Nodes" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "192.168.32.5    puppet.example.com  puppet" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "192.168.32.10   node01.example.com  node01" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "192.168.32.20   node02.example.com  node02" | sudo tee --append /etc/hosts 2> /dev/null

    # Add optional alternate DNS names to /etc/puppet/puppet.conf
    sudo sed -i 's/.*\[main\].*/&\ndns_alt_names = puppet,puppet.example.com/' /etc/puppet/puppet.conf

    # Install some initial puppet modules on Puppet Master server
    sudo puppet module install puppetlabs-ntp
    sudo puppet module install garethr-docker
    sudo puppet module install puppetlabs-git
    sudo puppet module install puppetlabs-vcsrepo
    sudo puppet module install garystafford-fig

    # symlink manifest from Vagrant synced folder location
    ln -s /vagrant/site.pp /etc/puppet/manifests/site.pp
fi

There are a few last commands we need to run ourselves, from within the VMs. Once the provisioning process is complete,  ‘vagrant ssh puppet.example.com‘ into the newly provisioned Puppet Master server. Below are the commands we need to run within the ‘puppet.example.com‘ VM.

sudo service puppetmaster status # test that puppet master was installed
sudo service puppetmaster stop
sudo puppet master --verbose --no-daemonize
# Ctrl+C to kill puppet master
sudo service puppetmaster start
sudo puppet cert list --all # check for 'puppet' cert

According to Puppet’s website, ‘these steps will create the CA certificate and the puppet master certificate, with the appropriate DNS names included.

Bootstrapping Puppet Agent Nodes

Now that the Puppet Master server is running, open a second terminal tab (‘Shift+Ctrl+T‘). Use the command, ‘vagrant ssh node01.example.com‘, to ssh into the new Puppet Agent node. The agent node bootstrap script should have already executed as part of the Vagrant provisioning process.

#!/bin/sh

# Run on VM to bootstrap Puppet Agent nodes
# http://blog.kloudless.com/2013/07/01/automating-development-environments-with-vagrant-and-puppet/

if ps aux | grep "puppet agent" | grep -v grep 2> /dev/null
then
    echo "Puppet Agent is already installed. Moving on..."
else
    sudo apt-get install -yq puppet
fi

if cat /etc/crontab | grep puppet 2> /dev/null
then
    echo "Puppet Agent is already configured. Exiting..."
else
    sudo apt-get update -yq && sudo apt-get upgrade -yq

    sudo puppet resource cron puppet-agent ensure=present user=root minute=30 \
        command='/usr/bin/puppet agent --onetime --no-daemonize --splay'

    sudo puppet resource service puppet ensure=running enable=true

    # Configure /etc/hosts file
    echo "" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "# Host config for Puppet Master and Agent Nodes" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "192.168.32.5    puppet.example.com  puppet" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "192.168.32.10   node01.example.com  node01" | sudo tee --append /etc/hosts 2> /dev/null && \
    echo "192.168.32.20   node02.example.com  node02" | sudo tee --append /etc/hosts 2> /dev/null

    # Add agent section to /etc/puppet/puppet.conf
    echo "" && echo "[agent]\nserver=puppet" | sudo tee --append /etc/puppet/puppet.conf 2> /dev/null

    sudo puppet agent --enable
fi

Run the two commands below within both the ‘node01.example.com‘ and ‘node02.example.com‘ agent nodes.

sudo service puppet status # test that agent was installed
sudo puppet agent --test --waitforcert=60 # initiate certificate signing request (CSR)

The second command above will manually start Puppet’s Certificate Signing Request (CSR) process, to generate the certificates and security credentials (private and public keys) generated by Puppet’s built-in certificate authority (CA). Each Puppet Agent node must have it certificate signed by the Puppet Master, first. According to Puppet’s website, “Before puppet agent nodes can retrieve their configuration catalogs, they need a signed certificate from the local Puppet certificate authority (CA). When using Puppet’s built-in CA (that is, not using an external CA), agents will submit a certificate signing request (CSR) to the CA Puppet Master and will retrieve a signed certificate once one is available.

Agent Node Starting Puppet's Certificate Signing Request (CSR) Process

Agent Node Starting Puppet’s Certificate Signing Request (CSR) Process

Back on the Puppet Master Server, run the following commands to sign the certificate(s) from the agent node(s). You may sign each node’s certificate individually, or wait and sign them all at once. Note the agent node(s) will wait for the Puppet Master to sign the certificate, before continuing with the Puppet Agent configuration run.

sudo puppet cert list # should see 'node01.example.com' cert waiting for signature
sudo puppet cert sign --all # sign the agent node certs
sudo puppet cert list --all # check for signed certs
Puppet Master Completing Puppet's Certificate Signing Request (CSR) Process

Puppet Master Completing Puppet’s Certificate Signing Request (CSR) Process

Once the certificate signing process is complete, the Puppet Agent retrieves the client configuration from the Puppet Master and applies it to the local agent node. The Puppet Agent will execute all applicable steps in the site.pp manifest on the Puppet Master server, designated for that specific Puppet Agent node (ie.’node node02.example.com {...}‘).

Configuration Run Completed on Puppet Agent Node

Configuration Run Completed on Puppet Agent Node

Below is the main site.pp manifest on the Puppet Master server, applied by Puppet Agent on the agent nodes.

node default {
# Test message
  notify { "Debug output on ${hostname} node.": }

  include ntp, git
}

node 'node01.example.com', 'node02.example.com' {
# Test message
  notify { "Debug output on ${hostname} node.": }

  include ntp, git, docker, fig
}

That’s it! You should now have one server VM running Puppet Master, and two agent node VMs running Puppet Agent. Both agent nodes should have successfully been registered with Puppet Master, and configured themselves based on the Puppet Master’s main manifest. Agent node configuration includes installing ntp, git, Fig, and Docker.

Helpful Links

All the source code this project is on Github.

Puppet Glossary (of terms):
https://docs.puppetlabs.com/references/glossary.html

Puppet Labs Open Source Automation Tools:
http://puppetlabs.com/misc/download-options

Puppet Master Overview:
http://ci.openstack.org/puppet.html

Install Puppet on Ubuntu:
https://docs.puppetlabs.com/guides/install_puppet/install_debian_ubuntu.html

Installing Puppet Master:
http://andyhan.linuxdict.com/index.php/sys-adm/item/273-puppet-371-on-centos-65-quick-start-i

Regenerating Node Certificates:
https://docs.puppetlabs.com/puppet/latest/reference/ssl_regenerate_certificates.html

Automating Development Environments with Vagrant and Puppet:
http://blog.kloudless.com/2013/07/01/automating-development-environments-with-vagrant-and-puppet

, , , , , , , , , ,

8 Comments