Posts Tagged DevOps

The Evolving Role of DevOps in Emerging Technologies: Part 1/2

31970399_m

Growth of DevOps

The adoption of DevOps practices by global organizations has become mainstream, according to many recent industry studies. For instance, a late 2016 study, conducted by IDG Research for Unisys Corporation of global enterprise organizations, found 38 percent of respondents had already adopted DevOps, while another 29 percent were in the planning phase, and 17 percent in the evaluation stage. Adoption rates were even higher, 49 percent versus 38 percent, for larger organizations with 500 or more developers.

Another recent 2017 study by Red Gate Software, The State of Database DevOps, based on 1,000 global organizations, found 47 percent of the respondents had already adopted DevOps practices, with another 33 percent planning on adopting DevOps practices within the next 24 months. Similar to the Unisys study, prior adoption rates were considerably higher, 59 percent versus 47 percent, for larger organizations with over 1,000 employees.

Emerging Technologies

Although DevOps originated to meet the needs of Agile software development to release more frequently, DevOps is no longer just continuous integration and continuous delivery. As more organizations undergo a digital transformation and adopt disruptive technologies to drive business success, the role of DevOps continues to evolve and expand.

Emerging technology trends, such as Machine Learning, Artificial Intelligence (AI), and Internet of Things (IoT/IIoT), serve to both influence DevOps practices, as well as create the need for the application of DevOps practices to these emerging technologies. Let’s examine the impact of some of these emerging technology trends on DevOps in this brief, two-part post.

Mobile

Although mobile application development is certainly not new, DevOps practices around mobile continue to evolve as mobile becomes the primary application platform for many organizations. Mobile applications have unique development and operational requirements. Take for example UI functional testing. Whereas web application developers often test against a relatively small matrix of popular web browsers and operating systems (Desktop Browser Market Share – Net Application.com), mobile developers must test against a continuous outpouring of new mobile devices, both tablets and phones (Test on the right mobile devices – BroswerStack). The complexity of automating the testing of such a large number mobile devices has resulted in the growth of specialized cloud-based testing platforms, such as BrowserStack and SauceLabs.

Cloud

Similar to Mobile, the Cloud is certainly not new. However, as more firms move their IT operations to the Cloud, DevOps practices have had to adapt rapidly. The need to adjust is no more apparent than with Amazon Web Services. Currently, AWS lists no less than 18 categories of cloud offerings on their website, with each category containing several products and services. Categories include compute, storage, databases, networking, security, messaging, mobile, AI, IoT, and analytics.

In addition to products like compute, storage, and database, AWS now offers development, DevOps, and management tools, such as AWS OpsWorks and AWS CloudFormation. These products offer alternatives to traditional non-cloud CI/CD/RM workflows for deploying and managing complex application platforms on AWS. Learning the nuances of a growing list of AWS specific products and workflows, while simultaneously adapting your organization’s DevOps practices to them, has resulted in a whole new category of DevOps engineering specialization centered around AWS. Cloud-centric DevOps engineering specialization is also seen with other large cloud providers, such as Microsoft Azure and Google Cloud Platform.

Security

Call it DevSecOps, SecDevOps, SecOps, or Rugged DevOps, the intersection of DevOps and Security is bustling these days. As the complexity of modern application platforms grows, as well as the sophistication of threats from hackers and the requirements of government and industry compliance, security is no longer an afterthought or a process run in seeming isolation from software development and DevOps. In my recent experience, it is not uncommon to see IT security specialists actively participating on Agile development teams and embedded on DevOps and Platform teams.

Modern application platforms must be designed from day one to be bug-free, performant, compliant, and secure.

Security practices are now commonly part of the entire software development lifecycle, including enterprise architecture, software development, data governance, continuous testing, and infrastructure as code. Modern application platforms must be designed from day one to be bug-free, performant, compliant, and secure.

Take for example penetration (PEN) testing. Once a mostly manual process, done close to release time, evolving DevOps practices now allow testing for security vulnerabilities to applications and software-defined infrastructure to be done early and often in the software development lifecycle. Easily automatable and configurable cloud- and non-cloud-based tools like SonarQube, Veracode, QualysOWASP ZAP, and Chef Compliance, amongst others, are frequently incorporated into continuous integration workflows by development and DevOps teams. There is no longer an excuse for security vulnerabilities to be discovered just before release, or worse, in Production.

Modern Platforms

Along with the Cloud, modern application development trends, like the rise of the platform, microservices (or service-based architectures), containerization, NoSQL databases, and container orchestration, have likely provided the majority of fuel for the recent explosive growth of DevOps. Although innovative IT organizations have fostered these technologies for the past few years, their growth and relative maturity have risen sharply in the last 12 to 18 months.

No longer the stuff of Unicorns, platforms based on Evolutionary Architectures are being built and deployed by an increasing number of everyday organizations.

No longer the stuff of Unicorns, such as Amazon, Etsy, and Netflix, platforms based on Evolutionary Architectures are being built and deployed by an increasing number of everyday organizations. Although complexity continues to rise, the barrier to entry has been greatly reduced with technologies found across the SDLC, including  Node, Spring Boot, Docker, Consul, Terraform, and Kubernetes, amongst others.

As modern platforms become more commonplace, the DevOps practices around them continue to mature and become specialized. Imagine, with potentially hundreds of moving parts, building, testing, deploying, and actively managing a large-scale microservice-based application on a container orchestration platform requires highly-specialized knowledge. The ability to ‘do DevOps at scale’ is critical.

Legacy Systems

Legacy systems as an emerging technology trend in DevOps? As the race to build the ‘next generation’ of application platforms accelerates to meet the demands of the business and their customers, there is a growing need to support ‘last generation’ systems. Many IT organizations support multiple legacy systems, ranging in age from as short as five years old to more than 25 years old. These monolithic legacy systems, which often contain a company’s secret sauce, such as complex business algorithms and decision engines, are built on out-moded technology stacks, often lack vendor support, and require separate processes to build, test, deploy, and manage. Worse, the knowledge to maintain these systems is frequently only known to a shrinking group of IT resources. Who wants to work on the old system with so many bright and shiny toys being built?

As a cost-effective means to maintain these legacy systems, organizations are turning to modern DevOps practices. Although not possible to the same degree, depending on the legacy technology, practices include the use source control, various types of automated testing, automated provisioning, deployment and configuration of system components, and infrastructure automation (DevOps for legacy systems – Infosys white paper).

Not specifically a DevOps practice, organizations are also implementing content collaboration systems, like Atlassian Confluence and Microsoft SharePoint, to document legacy system architectures and manual processes, before the resources and their knowledge is lost.

Part Two

In the second half of this post, we will continue to look at emerging technologies and their impact on DevOps, including:

  • Big Data
  • Internet of Things (IoT/IIoT)
  • Artificial Intelligence (AI)
  • Machine Learning
  • COTS/SaaS

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

Illustration Copyright: Andreus / 123RF Stock Photo

, , , , , , ,

Leave a comment

Eventual Consistency: Decoupling Microservices with Spring AMQP and RabbitMQ

RabbitMQEnventCons.png

Introduction

In a recent post, Decoupling Microservices using Message-based RPC IPC, with Spring, RabbitMQ, and AMPQ, we moved away from synchronous REST HTTP for inter-process communications (IPC) toward message-based IPC. Moving to asynchronous message-based communications allows us to decouple services from one another. It makes it easier to build, test, and release our individual services. In that post, we did not achieve fully asynchronous communications. Although, we did achieve a higher level of service decoupling using message-based Remote Procedure Call (RPC) IPC.

In this post, we will fully decouple our services using the distributed computing model of eventual consistency. More specifically, we will use a message-based, event-driven, loosely-coupled, eventually consistent architectural approach for communications between services.

What is eventual consistency? One of the best definitions of eventual consistency I have read was posted on microservices.io. To paraphrase, ‘using an event-driven, eventually consistent approach, each service publishes an event whenever it updates its data. Other services subscribe to events. When an event is received, a service updates its data.

Example of Eventual Consistency

Imagine Service A, the Customer service, inserts a new customer record into its database. Based on that ‘customer created’ event, Service A publishes a message containing the new customer object, serialized to JSON, to a lightweight persistent message queue.

Service B, the new customer Onboarding service, a subscriber to that queue, consumes Service A’s message. Service B then executes that same CRUD operation, inserting the same new customer record into its database.

In the above example, it can be said that the customer records in Service B’s database are eventually consistent with the customer records in Service A’s database. Service A makes a change and publishes a message in response to the event. Service B consumes the message and makes the same change. Eventually (likely milliseconds) Service B’s customer records are consistent with Service A’s customer records.

Why Eventual Consistency?

So what does this apparent added complexity and duplication of data buy us? Consider the advantages. Service B, the Onboarding service, requires no knowledge of, or a dependency on, Service A, the Customer service. Still, Service B has a current record of all the customers that Service A maintains. Instead of making repeated and potentially costly REST HTTP call or RPC message-based call to or from Service A to Service B for new customers, Service B queries its database for a list of customers.

The value of eventual consistency increases factorially as you scale a distributed system. Imagine dozens of distinct microservices, many requiring data from other microservices. Further, imagine multiple instances of each of those services all running in parallel. Decoupling services from one another, through asynchronous forms of IPC, messaging, and event-driven eventual consistency greatly simplifies the software development lifecycle and operations.

Demonstration

In this post, we could use a few different architectural patterns to demonstrate message passing with RabbitMQ and Spring AMQP. They including Work Queues, Publish/Subscribe, Routing, or Topics. To keep things as simple as possible, we will have a single Producer, publish messages to a single durable and persistent message queue. We will have a single Subscriber, a Consumer, consume the messages from that queue. We focus on a single type of event message.

Sample Code

To demonstrate Spring AMQP-based messaging with RabbitMQ, we will use a reference set of three Spring Boot microservices. The Election ServiceCandidate Service, and Voter Service are all backed by MongoDB. The services and MongoDB, along with RabbitMQ and Voter API Gateway, are all part of the Voter API.

The Voter API Gateway, based on HAProxy, serves as a common entry point to all three services, as well as serving as a reverse proxy and load balancer. The API Gateway provides round-robin load-balanced access to multiple instances of each service.

Voter_API_Architecture

All the source code found this post’s example is available on GitHub, within a few different project repositories. The Voter Service repository contains the Voter service source code, along with the scripts and Docker Compose files required to deploy the project. The Election Service repository, Candidate Service repository, and Voter API Gateway repository are also available on GitHub. There is also a new AngularJS/Node.js Web Client, to demonstrate how to use the Voter API.

For this post, you only need to clone the Voter Service repository.

Deploying Voter API

All components, including the Spring Boot services, MongoDB, RabbitMQ, API Gateway, and the Web Client, are individually deployed using Docker. Each component is publicly available as a Docker Image, on Docker Hub. The Voter Service repository contains scripts to deploy the entire set of Dockerized components, locally. The repository also contains optional scripts to provision a Docker Swarm, using Docker’s newer swarm mode, and deploy the components. We will only deploy the services locally for this post.

To clone and deploy the components locally, including the Spring Boot services, MongoDB, RabbitMQ, and the API Gateway, execute the following commands. If this is your first time running the commands, it may take a few minutes for your system to download all the required Docker Images from Docker Hub.

If everything was deployed successfully, you should observe six running Docker containers, similar to the output, below.

Using Voter API

The Voter Service, Election Service, and Candidate Service GitHub repositories each contain README files, which detail all the API endpoints each service exposes, and how to call them.

In addition to casting votes for candidates, the Voter service can simulate election results. Calling the /simulation endpoint, and indicating the desired election, the Voter service will randomly generate a number of votes for each candidate in that election. This will save us the burden of casting votes for this demonstration. However, the Voter service has no knowledge of elections or candidates. The Voter service depends on the Candidate service to obtain a list of candidates.

The Candidate service manages electoral candidates, their political affiliation, and the election in which they are running. Like the Voter service, the Candidate service also has a /simulation endpoint. The service will create a list of candidates based on the 2012 and 2016 US Presidential Elections. The simulation capability of the service saves us the burden of inputting candidates for this demonstration.

The Election service manages elections, their polling dates, and the type of election (federal, state, or local). Like the other services, the Election service also has a /simulation endpoint, which will create a list of sample elections. The Election service will not be discussed in this post’s demonstration. We will examine communications between the Candidate and Voter services, only.

REST HTTP Endpoint

As you recall from our previous post, Decoupling Microservices using Message-based RPC IPC, with Spring, RabbitMQ, and AMPQ, the Voter service exposes multiple, almost identical endpoints. Each endpoint uses a different means of IPC to retrieve candidates and generates random votes.

Calling the /voter/simulation/election/{election} endpoint and providing a specific election, prompts the Voter service to request a list of candidates from the Candidate service, based on the election parameter you input. This request is done using synchronous REST HTTP. The Voter service uses the HTTP GET method to request the data from the Candidate service. The Voter service then waits for a response.

The Candidate service receives the HTTP request. The Candidate service responds to the Voter service with a list of candidates in JSON format. The Voter service receives the response payload containing the list of candidates. The Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message-based RPC Endpoint

Similarly, calling the /voter/simulation/rpc/election/{election} endpoint and providing a specific election, prompts the Voter service to request the same list of candidates. However, this time, the Voter service (the client) produces a request message and places in RabbitMQ’s voter.rpc.requests queue. The Voter service then waits for a response. The Voter service has no direct dependency on the Candidate service; it only depends on a response to its request message. In this way, it is still a synchronous form of IPC, but the Voter service is now decoupled from the Candidate service.

The request message is consumed by the Candidate service (the server), who is listening to that queue. In response, the Candidate service produces a message containing the list of candidates serialized to JSON. The Candidate service (the server) sends a response back to the Voter service (the client) through RabbitMQ. This is done using the Direct reply-to feature of RabbitMQ or using a unique response queue, specified in the reply-to header of the request message, sent by the Voter Service.

The Voter service receives the message containing the list of candidates. The Voter service deserializes the JSON payload to candidate objects. The Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, identical to the previous example, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters database.

New Endpoint

Calling the new /voter/simulation/db/election/{election} endpoint and providing a specific election, prompts the Voter service to query its own MongoDB database for a list of candidates.

But wait, where did the candidates come from? The Voter service didn’t call the Candidate service? The answer is message-based eventual consistency. Whenever a new candidate is created, using a REST HTTP POST request to the Candidate service’s /candidate/candidates endpoint, a Spring Data Rest Repository Event Handler responds. Responding to the candidate created event, the event handler publishes a message, containing a serialized JSON representation of the new candidate object, to a durable and persistent RabbitMQ queue.

The Voter service is listening to that queue. The Voter service consumes messages off the queue, deserializes the candidate object, and saves it to its own voters database, to the candidate collection. For this example, we are saving the incoming candidate object as is, with no transformations. The candidate object model for both services is identical.

When /voter/simulation/db/election/{election} endpoint is called, the Voter service queries its voters database for a list of candidates. They Voter service then proceeds to generate a random number of votes for each candidate in the list. Finally, identical to the previous two examples, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message_Queue_Diagram_Final3B

Exploring the Code

We will not review the REST HTTP or RPC IPC code in this post. It was covered in detail, in the previous post. Instead, we will explore the new code required for eventual consistency.

Spring Dependencies

To use AMQP with RabbitMQ, we need to add a project dependency on org.springframework.boot.spring-boot-starter-amqp. Below is a snippet from the Candidate service’s build.gradle file, showing project dependencies. The Voter service’s dependencies are identical.

AMQP Configuration

Next, we need to add a small amount of RabbitMQ AMQP configuration to both services. We accomplish this by using Spring’s @Configuration annotation on our configuration classes. Below is the abridged configuration class for the Voter service.

And here, the abridged configuration class for the Candidate service.

Event Handler

With our dependencies and configuration in place, we will define the CandidateEventHandler class. This class is annotated with the Spring Data Rest @RepositoryEventHandler and Spring’s @Component. The @Component annotation ensures the event handler is registered.

The class contains the handleCandidateSave method, which is annotated with the Spring Data Rest @HandleAfterCreate. The event handler acts on the Candidate object, which is the first parameter in the method signature.

Responding to the candidate created event, the event handler publishes a message, containing a serialized JSON representation of the new candidate object, to the candidates.queue queue. This was the queue we configured earlier.

Consuming Messages

Next, we let’s switch to the Voter service’s CandidateListService class. Below is an abridged version of the class with two new methods. First, the getCandidateMessage method listens to the candidates.queue queue. This was the queue we configured earlier. The method is annotated with theSpring AMQP Rabbit @RabbitListener annotation.

The getCandidateMessage retrieves the new candidate object from the message, deserializes the message’s JSON payload, maps it to the candidate object model and saves it to the Voter service’s database.

The second method, getCandidatesQueueDb, retrieves the candidates from the Voter service’s database. The method makes use of the Spring Data MongoDB Aggregation package to return a list of candidates from MongoDB.

RabbitMQ Management Console

The easiest way to observe what is happening with the messages is using the RabbitMQ Management Console. To access the console, point your web browser to localhost, on port 15672. The default login credentials for the console are guest/guest. As you successfully produce and consume messages with RabbitMQ, you should see activity on the Overview tab.

RabbitMQ_EC_Durable3.png

Recall we said the queue, in this example, was durable. That means messages will survive the RabbitMQ broker stopping and starting. In the below view of the RabbitMQ Management Console, note the six messages persisted in memory. The Candidate service produced the messages in response to six new candidates being created. However, the Voter service was not running, and therefore, could not consume the messages. In addition, the RabbitMQ server was restarted, after receiving the candidate messages. The messages were persisted and still present in the queue after the successful reboot of RabbitMQ.

RabbitMQ_EC_Durable

Once RabbitMQ and the Voter service instance were back online, the Voter service successfully consumed the six waiting messages from the queue.

RabbitMQ_EC_Durable2.png

Service Logs

In addition to using the RabbitMQ Management Console, we may obverse communications between the two services by looking at the Voter and Candidate service’s logs. I have grabbed a snippet of both service’s logs and added a few comments to show where different processes are being executed.

First the Candidate service logs. We observe a REST HTTP POST request containing a new candidate. We then observe the creation of the new candidate object in the Candidate service’s database, followed by the event handler publishing a message on the queue. Finally, we observe the response is returned in reply to the initial REST HTTP POST request.

Now the Voter service logs. At the exact same second as the message and the response sent by the Candidate service, the Voter service consumes the message off the queue. The Voter service then deserializes the new candidate object and inserts it into its database.

MongoDB

Using the mongo Shell, we can observe six new 2016 Presidential Election candidates in the Candidate service’s database.

Now, looking at the Voter service’s database, we should find the same six 2016 Presidential Election candidates. Note the Object IDs are the same between the two service’s document sets, as are the rest of the fields (first name, last name, political party, and election). However, the class field is different between the two service’s records.

Production Considerations

The post demonstrated a simple example of message-based, event-driven eventual consistency. In an actual Production environment, there are a few things that must be considered.

  • We only addressed a ‘candidate created’ event. We would also have to code for other types of events, such as a ‘candidate deleted’ event and an ‘candidate updated’ event.
  • If a candidate is added, deleted, then re-added, are the events published and consumed in the right order? What about with multiple instances of the Voter service running? Does this pattern guarantee event ordering?
  • How should the Candidate service react on startup if RabbitMQ is not available
  • What if RabbitMQ fails after the Candidate services have started?
  • How should the Candidate service react if a new candidate record is added to the database, but a ‘candidate created’ event message cannot be published to RabbitMQ? The two actions are not wrapped in a single transaction.
  • In all of the above scenarios, what response should be returned to the API end user?

Conclusion

In this post, using eventual consistency, we successfully decoupled our two microservices and achieved asynchronous inter-process communications. Adopting a message-based, event-driven, loosely-coupled architecture, wherever possible, in combination with REST HTTP when it makes sense, will improve the overall manageability and scalability of a microservices-based platform.

References

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

Leave a comment

Preparing for Your Organization’s DevOps Journey

19672001 - man looking at pencil with eraser erases maze

Copyright: peshkova / 123RF Stock Photo

Introduction

Recently, I was asked two questions regarding DevOps. The first, ‘How do you get started implementing DevOps in an organization?’ A question I get asked, and answer, fairly frequently. The second was a bit more challenging to answer, ‘How do you prepare your organization to implement DevOps?

Getting Started

The first question, ‘How do you get started implementing DevOps in an organization?’, is a popular question many companies ask. The answer varies depending on who you ask, but the process is fairly well practiced and documented by a number of well-known and respected industry pundits. A successful DevOps implementation is a combination of strategic planning and effective execution.

A successful DevOps implementation is a combination of strategic planning and effective execution.

Most commonly, an organization starts with some form of a DevOps maturity assessment. The concept of a DevOps maturity model was introduced by Jez Humble and David Farley, in their ground-breaking book, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series), circa 2011.

Humble and Farley presented their ‘Maturity Model for Configuration and Release Management’ (page 419). This model, which encompassed much more than just CM and RM, was created as a means of evaluating and improving an organization’s DevOps practices.

Although there are several variations, maturity models ordinarily all provide some means of ranking the relative maturity of an organization’s DevOps practices. Less sophisticated models focus primarily on tooling and processes. More holistic models, such as Accenture’s DevOps Maturity Assessment, focus on tooling, processes, people and culture.

Following the analysis, most industry experts recommend a strategic plan, followed an implementation plan. The plans set milestones for reaching higher levels of maturity, according to the model. Experts will identify key performance indicators, such as release frequency, defect rates, production downtime, and mean time to recovery from failures, which are often used to measure DevOps success.

Preparing for the Journey

As I said, the second question, ‘How do you prepare your organization to implement DevOps?’, is a bit more challenging to answer. And, as any good consultant would respond, it depends.

The exact answer depends on many factors. How engaged is management in wanting to transform their organization? How mature is the organization’s current IT practices? Are the other parts of the organization, such as sales, marketing, training, product documentation, and customer support, aligned with IT? Is IT aligned with them?

Even the basics matter, such as the organization’s size, both physical and financial, as well as the age of the organization? The industry? Are they in a highly regulated industry? Are they a global organization with distributed IT resources? Have they tried DevOps before and failed? Why did they fail?

As overwhelming as those questions might seem, I managed to break down my answer to the question, “How do you prepare your organization to implement DevOps?”, into five key areas. In my experience, each of these is critical for any DevOps transformation to succeed. Before the journey starts, these are five areas an organization needs to consider:

  1. Have an Agile Mindset
  2. Breakdown Silos
  3. Know Your Business
  4. Take the Long View
  5. Be Introspective

Have an Agile Mindset

It is commonly accepted that DevOps was born from the need of Agile software development to increase the frequency of releases. More releases required faster feedback loops, better quality control methods, and the increased use of automation, amongst other necessities. DevOps practices evolved to meet those challenges.

If an organization is considering DevOps, it should have already successfully embraced Agile, or be well along in their Agile transformation. An outgrowth of Agile software development, DevOps follow many Agile practices. Such Agile practices as cross-team collaboration, continuous and rapid feedback loops, continuous improvement, test-driven development, continuous integration, scheduling work in sprints, and breaking down business requirements into epics, stories, and tasks, are usually all part of a successful DevOps implementation.

If your organization cannot adopt Agile, it will likely fail to successfully embrace DevOps. Imagine a typical scenario in which DevOps enables an organization to release more frequently — monthly instead of quarterly, weekly instead of monthly. However, if the rest of the organization — sales, marketing, training, product documentation, and customer support, is still working in a non-Agile manner, they will not be able to match the improved cycle time DevOps would provide.

Breakdown Silos

Closely associated with an Agile mindset, is breaking down departmental silos. If your organization has already made an Agile transformation, then one should assume those ‘silos’, the physical or more often process-induced ‘walls’ between departments, have been torn down. Having embraced Agile, we assume that Development and Testing are working side-by-side as part of an Agile software development team.

Implementing DevOps requires closing the often wide gap between Development and Operations. If your organization cannot tear down the typically shorter wall between Development and Testing, then tearing down the larger walls between Development and Operations will be impossible.

Know Your Business

Before starting your DevOps journey, an organization needs to know thyself. Most organizations establish business metrics, such as sales quotas, profit targets, employee retention objectives, and client acquisition goals. However, many organizations have not formalized their IT-related Key Performance Indicators (KPIs) or Service Level Agreements (SLAs).

DevOps is all about measurements — application response time, incident volume, severity, and impact, defect density, Mean Time To Recovery (MTTR), downtime, uptime, and so forth. Established meaningful and measurable metrics is one of the best ways to evaluate the continuous improvements achieved by a maturing DevOps practice.

To successfully implement DevOps, an organization should first identify its business critical performance metrics and service level expectations. Additionally, an organization must accurately and honestly measure itself against those metrics, before beginning the DevOps journey.

Take the Long View

Rome was not built in a day, organizations don’t transform overnight, and DevOps is a journey, not a time-boxed task in a team’s backlog. Before an organization sets out on their journey, they must be willing to take the long view on DevOps. There is a reason DevOps maturity models exist. Like most engineering practices, cultural and organizational transformation, and skill-building exercise, DevOps takes the time to become successfully entrenched in a company.

Rome was not built in a day, organizations don’t transform overnight, and DevOps is a journey, not a time-boxed task in a team’s backlog.

Organizations need to value quick, small wins, followed by more small wins. They should not expect a big bang with DevOps. Achieving high levels DevOps performance is similar to the Agile practice of delivering small pieces of valuable functionality, in an incremental fashion.

Getting the ‘Hello World’ application successfully through a simple continuous integration pipeline might seem small, but think of all the barriers that were overcome to achieve that task — source control, continuous integration server, unit testing, artifact repository, and so on. Your next win, deploy that ‘Hello World’ application to your Test environment, automatically, through a continuous deployment pipeline…

This practice reminds me of an adage. Would you prefer a dollar, every day for the next week, or seven dollars at the end of the week? Most people prefer the immediacy of a dollar each day (small wins), as well as the satisfaction of seeing the value build consistently, day after day. Exercise the same philosophy with DevOps.

Be Introspective

As stated earlier, generally, the first step in creating a strategic plan for implementing DevOps is analyzing your organization’s current level of IT maturity. Individual departments must be willing to be open, honest, and objective when assessing their current state.

The inability of organizations to be transparent about their practices, challenges, and performance, is a sign of an unhealthy corporate culture. Not only is an accurate perspective critical for a maturity analysis and strategic planning, but the existence of an unhealthy culture can also be fatal to most DevOps transformation. DevOps only thrives in an open, collaborative, and supportive culture.

Conclusion

As Alexander Graham Bell once famously said, ‘before anything else, preparation is the key to success.’ Although not a guarantee, properly preparing for a DevOps transformation by addressing these five key areas, should greatly improve an organization’s chances of success.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

Leave a comment

Decoupling Microservices using Message-based RPC IPC, with Spring, RabbitMQ, and AMPQ

RabbitMQ_Screen_3

Introduction

There has been a considerable growth in modern, highly scalable, distributed application platforms, built around fine-grained RESTful microservices. Microservices generally use lightweight protocols to communicate with each other, such as HTTP, TCP, UDP, WebSockets, MQTT, and AMQP. Microservices commonly communicate with each other directly using REST-based HTTP, or indirectly, using messaging brokers.

There are several well-known, production-tested messaging queues, such as Apache Kafka, Apache ActiveMQAmazon Simple Queue Service (SQS), and Pivotal’s RabbitMQ. According to Pivotal, of these messaging brokers, RabbitMQ is the most widely deployed open source message broker.

RabbitMQ supports multiple messaging protocols. RabbitMQ’s primary protocol, the Advanced Message Queuing Protocol (AMQP), is an open standard wire-level protocol and semantic framework for high-performance enterprise messaging. According to Spring, ‘AMQP has exchanges, routes, and queues. Messages are first published to exchanges. Routes define on which queue(s) to pipe the message. Consumers subscribing to that queue then receive a copy of the message.

Pivotal’s Spring AMQP project applies core Spring concepts to the development of AMQP-based messaging solutions. The project’s libraries facilitate management of AMQP resources while promoting the use of dependency injection and declarative configuration. The project provides a ‘template’ (RabbitTemplate) as a high-level abstraction for sending and receiving messages.

In this post, we will explore how to start moving Spring Boot Java services away from using synchronous REST HTTP for inter-process communications (IPC), and toward message-based IPC. Moving from synchronous IPC to messaging queues and asynchronous IPC decouples services from one another, allowing us to more easily build, test, and release individual microservices.

Message-Based RPC IPC

Decoupling services using asynchronous IPC is considered optimal by many enterprise software architects when developing modern distributed platforms. However, sometimes it is not easy or possible to get away from synchronous communications. Rightly or wrongly, often times services are architected, such that one service needs to retrieve data from another service or services, in order to process its own requests. It can be said, that service has a direct dependency on the other services. Many would argue, services, especially RESTful microservices, should not be coupled in this way.

There are several ways to break direct service-to-service dependencies using asynchronous IPC. We might implement request/async response REST HTTP-based IPC. We could also use publish/subscribe or publish/async response messaging queue-based IPC. These are all described by NGINX, in their article, Building Microservices: Inter-Process Communication in a Microservices Architecture; a must-read for anyone working with microservices. We might also implement an architecture which supports eventual consistency, eliminating the need for one service to obtain data from another service.

So what if we cannot implement asynchronous methods to break direct service dependencies, but we want to move toward message-based IPC? One answer is message-based Remote Procedure Call (RPC) IPC. I realize the mention of RPC might send cold shivers down the spine of many seasoned architected. Traditional RPC has several challenges, many which have been overcome with more modern architectural patterns.

According to Wikipedia, ‘in distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in another address space (commonly on another computer on a shared network), which is coded as if it were a normal (local) procedure call, without the programmer explicitly coding the details for the remote interaction.

Although still a form of RPC and not asynchronous, it is possible to replace REST HTTP IPC with message-based RPC IPC. Using message-based RPC, services have no direct dependencies on other services. A service only depends on a response to a message request it makes to that queue. The services are now decoupled from one another. The requestor service (the client) has no direct knowledge of the respondent service (the server).

RPC with RabbitMQ and AMQP

RabbitMQ has an excellent set of six tutorials, which cover the basics of creating messaging applications, applying different architectural patterns, using RabbitMQ, in several different programming languages. The sixth and final tutorial covers using RabbitMQ for RPC-based IPC, with the request/reply architectural pattern.

Pivotal recently added Spring AMPQ implementations to each RabbitMQ tutorial, based on their Spring AMQP project. If you recall, the Spring AMQP project applies core Spring concepts to the development of AMQP-based messaging solutions.

This post’s RPC IPC example is closely based on the architectural pattern found in the Spring AMQP RabbitMQ tutorial.

Sample Code

To demonstrate Spring AMQP-based RPC IPC messaging with RabbitMQ, we will use a pair of simple Spring Boot microservices. These services, the Voter and Candidate services, have been used in several previous posts, and for training and testing DevOps engineers. Both services are backed by MongoDB. The services and MongoDB, along with RabbitMQ, are all part of the Voter API project. The Voter API project also contains an HAProxy-based API Gateway, which provides indirect, load-balanced access to the two services.

All code necessary to build this post’s example is available on GitHub, within three projects. The Voter Service project repository contains the Voter service source code, along with the scripts and Docker Compose files required to deploy the project. The Candidate Service project repository and the Voter API Gateway project repository are also available on GitHub. For this post, you need only clone the Voter Service project repository.

Deploying Voter API

All components, including the two Spring services, MongoDB, RabbitMQ, and the API Gateway, are individually deployed using Docker. Each component is publicly available as a Docker Image, on Docker Hub.

The Voter Service repository contains scripts to deploy the entire set of Dockerized components, locally. The repository also contains optional scripts to provision a Docker Swarm, using Docker’s newer swarm mode, and deploy the components. We will only deploy the services locally for this post.

To clone and deploy the components locally, including the two Spring services, MongoDB, RabbitMQ, and the API Gateway, execute the following commands. If this is your first time running the commands, it may take a few minutes for your system to download all the required Docker Images from Docker Hub.

If everything was deployed successfully, you should see the following output. You should observe five running Docker containers.

Using Voter API

The Voter Service and Candidate Service GitHub repositories both contain README files, which detail all the API endpoints each service exposes, and how to call them.

In addition to casting votes for candidates, the Voter service has the ability to simulate election results. By calling a /simulation endpoint, and indicating the desired election, the Voter service will randomly generate a number of votes for each candidate in that election. This will save us the burden of casting votes for this demonstration. However, the Voter service has no knowledge of elections or candidates. To obtain a list of candidates, the Voter service depends on the Candidate service.

The Candidate service manages electoral candidates, their political affiliation, and the election in which they are running. Like the Voter service, the Candidate service also has a /simulation endpoint. The service will create a list of candidates based on the 2012 and 2016 US Presidential Elections. The simulation capability of the service saves us the burden of inputting candidates for this demonstration.

REST HTTP Endpoint

The Voter service exposes two almost identical endpoints. Both endpoints generate random votes. However, below the covers, the two endpoints are very different. Calling the /voter/simulation/election/{election} endpoint, prompts the Voter service to request a list of candidates from the Candidate service, based on the election parameter you input. This request is done using synchronous REST HTTP. The Voter service uses the HTTP GET method to request the data from the Candidate service. The Voter service then waits for a response.

The HTTP request is received by the Candidate service. The Candidate service responds to the Voter service with a list of candidates, in JSON format. The Voter service receives the response containing the list of candidates. The Voter service then proceeds to generate a random number of votes for each candidate. Finally, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message Queue Diagram 1D

Message-based RPC Endpoint

Similarly, calling the /voter/simulation/rpc/election/{election} endpoint with a specific election prompts the Voter service to request the same list of candidates. However, this time, the Voter service (the client), produces a request message and places in RabbitMQ’s voter.rpc.requests queue. The Voter service then waits for a response. The Voter service has no direct dependency on the Candidate service. It only depends on a response to its message request. In this way, it is still a synchronous form of IPC, but the Voter service is now decoupled from the Candidate service.

The request message is consumed by the Candidate service (the server), who is listening to that queue. In response, the Candidate service produces a message containing the list of candidates, serialized to JSON. The Candidate service (the server) sends a response back to the Voter service (the client), through RabbitMQ. This is done using the Direct reply-to feature of RabbitMQ, or using a unique response queue, specified in the reply-to header of the request message, sent by the Voter Service.

According to RabbitMQ, ‘the direct reply-to feature allows RPC clients to receive replies directly from their RPC server, without going through a reply queue. (“Directly” here still means going through AMQP and the RabbitMQ server; there is no separate network connection between RPC client and RPC server.)

According to Spring, ‘starting with version 3.4.0, the RabbitMQ server now supports Direct reply-to; this eliminates the main reason for a fixed reply queue (to avoid the need to create a temporary queue for each request). Starting with Spring AMQP version 1.4.1 Direct reply-to will be used by default (if supported by the server) instead of creating temporary reply queues. When no replyQueue is provided (or it is set with the name amq.rabbitmq.reply-to), the RabbitTemplate will automatically detect whether Direct reply-to is supported and either uses it or fall back to using a temporary reply queue. When using Direct reply-to, a reply-listener is not required and should not be configured.’ We are using the latest versions of both RabbitMQ and Spring AMQP, which should support Direct reply-to.

The Voter service receives the message containing the list of candidates. The Voter service deserializes the JSON payload to Candidate objects and proceeds to generate a random number of votes for each candidate in the list. Finally, each new vote object (MongoDB document) is written back to the vote collection in the Voter service’s voters  database.

Message Queue Diagram 2D

Exploring the RPC Code

We will not examine the REST HTTP IPC code in this post. Instead, we will explore the RPC code. You are welcome to download the source code and explore the REST HTTP code pattern; it uses some advanced features of Spring Boot and Spring Data.

Spring Dependencies

In order to use RabbitMQ, we need to add a project dependency on org.springframework.boot.spring-boot-starter-amqp. Below is a snippet from the Candidate service’s build.gradle file, showing project dependencies. The Voter service’s dependencies are identical.

AMQP Configuration

Next, we need to add a small amount of RabbitMQ AMQP configuration to both services. We accomplish this by using Spring’s @Configuration annotation on our configuration classes. Below is the configuration class for the Voter service.

And here, the configuration class for the Candidate service.

Candidate Service Code

With the dependencies and configuration in place, we define the method in the Voter service, which will request the candidates from the Candidate service, using RabbitMQ. Below is an abridged version of the Voter service’s CandidateListService class, containing the getCandidatesMessageRpc method. This method calls the rabbitTemplate.convertSendAndReceive method (see line 5, below).

Voter Service Code

Next, we define a method in the Candidate service, which will process the Voter service’s request. Below is an abridged version of the CandidateController class, containing the getCandidatesMessageRpc method. This method is decorated with Spring’s @RabbitListener annotation (see line 1, below). This annotation marks c to be the target of a Rabbit message listener on the voter.rpc.requests queue.

Also shown, are the getCandidatesMessageRpc method’s two helper methods, getByElection and serializeToJson. These methods query MongoDB for the list of candidates and serialize the list to JSON.

Demonstration

To demonstrate both the synchronous REST HTTP IPC code and the Spring AMQP-based RPC IPC code, we will make a few REST HTTP calls to the Voter API Gateway. For convenience, I have provided a shell script, demostrate_ipc.sh, which executes all the API calls necessary. I have added sleep commands to slow the output to the terminal down a bit, for easier analysis. The script requires HTTPie, a great time saver when working with RESTful services.

The demostrate_ipc.sh script does three things. First, it calls the Candidate service to generate a group of sample candidates. Next, the script calls the Voter service to simulate votes, using synchronous REST HTTP. Lastly, the script repeats the voter simulation, this time using message-based RPC IPC. All API calls are done through the Voter API Gateway on port 8080. To understand the API calls, examine the script, below.

Below is the list of candidates for the 2016 Presidential Election, generated by the Candidate service. The JSON payload was retrieved using the Voter service’s /voter/candidates/rpc/election/{election} endpoint. This endpoint uses the same RPC IPC method as the Voter service’s /voter/simulation/rpc/election/{election} endpoint.

Based on the list of candidates, below are the simulated election results. This JSON payload was retrieved using the Voter service’s /voter/results endpoint.

RabbitMQ Management Console

The easiest way to observe what is happening with our messages is using the RabbitMQ Management Console. To access the console, point your web-browser to localhost, on port 15672. The default login credentials for the console are guest/guest.

As you successfully send and receive messages between the services through RabbitMQ, you should see activity on the Overview tab. In addition, you should see a number of Connections, Channels, Exchanges, Queues, and Consumers.

RabbitMQ_Screen_3

In the Queues tab, you should find a single queue, the voter.rpc.requests queue. This queue was configured in the Candidate service’s configuration class, shown previously.

RabbitMQ_Screen_2

In the Exchanges tab, you should see one exchange, voter.rpc, which we configured in both the Voter and the Candidate service’s configuration classes (aka DirectExchange). Also, visible in the Exchanges tab, should be the routing key rpc, which we configured in the Candidate service’s configuration class (aka Binding).

The route binds the exchange to the voter.rpc.requests queue. If you recall Spring’s description, AMQP has exchanges (DirectExchange), routes (Binding), and queues (Queue). Messages are first published to exchanges. Routes define on which queue(s) to pipe the message. Consumers subscribing to that queue then receive a copy of the message.

RabbitMQ_Screen_1

In the Channels tab, you should note two connections, the single instances of the Voter and Candidate services. Likewise, there are two channels, one for each service. You can differentiate the channels by the presence of the consumer tag. The consumer tag, in this example, amq.ctag-Anv7GXs7ZWVoznO64euyjQ, uniquely identifies the consumer. In this example, the Voter service is the consumer. For a more complete explanation of the consumer tag, check out RabbitMQ’s AMQP documentation.

RabbitMQ_Screen_4.png

Message Structure

Messages cannot be viewed directly in the RabbitMQ Management Console. One way I have found to view messages is using your IDE’s debugger. Below, I have added a breakpoint on the Candidate service’ getCandidatesMessageRpc method, using IntelliJ IDEA. You can view the Voter service’s request message, as it is received by the Candidate service.

Debug_RPC_Message.png

Note the message payload, the requested election. Note the twelve message header elements. The headers include the AMQP exchange, queue, and binding. The message headers also include the consumer tag. The message also uniquely identifies the reply-to queue to use, if the server does not support Direct reply-to (see earlier explanation).

Service Logs

In addition to the RabbitMQ Management Console, we may obverse communications between the two services, by looking at the Voter and Candidate service’s logs. I have grabbed a snippet of both service’s logs and added a few comments to show where different processes are being executed. First the Voter service logs.

Next, the Candidate service logs.

Performance

What about the performance of Spring AMQP RPC IPC versus REST HTTP IPC? RabbitMQ has proven to be very performant, having been clocked at one million messages per second on GCE. I performed a series of fairly ‘unscientific’ performance tests, completing 250, 500, and then 1,000 requests. The tests were performed on a six-node Docker Swarm cluster with three instances of each service in a round-robin load-balanced configuration, and a single instance of RabbitMQ. The scripts to create the swarm cluster can be found in the Voter service GitHub project.

Based on consistent test results, the speed of the two methods was almost identical. Both methods performed between 3.1 to 3.2 responses per second. For example, the Spring AMQP RPC IPC method successfully completed 1,000 requests in 5 minutes and 11 seconds, while the REST HTTP IPC method successfully completed 1,000 requests in 5 minutes and 18 seconds, 7 seconds slower than the RPC method.

RabbitMQ on Docker Swarm

There are many variables to consider, which could dramatically impact IPC performance. For example, RabbitMQ was not clustered. Also, we did not use any type of caching, such as Varnish, Memcached, or Redis. Both these could dramatically increase IPC performance.

There are also several notable differences between the two methods from a code perspective. The REST HTTP method relies on Spring Data Projection combined with Spring Data MongoDB Repository, to obtain the candidate list from MongoDB. Somewhat differently, the RPC method makes use of Spring Data MongoDB Aggregation to return a list of candidates. Therefore, the test results should be taken with a grain of salt.

Production Considerations

The post demonstrated a simple example of RPC communications between two services using Spring AMQP. In an actual Production environment, there are a few things that must be considered, as Pivotal points out:

  • How should either service react on startup if RabbitMQ is not available? What if RabbitMQ fails after the services have started?
  • How should the Voter server (the client) react if there are no Candidate service instances (the server) running?
  • Should the Voter service have a timeout for the RPC response to return? What should happen if the request times out?
  • If the Candidate service malfunctions and raises an exception, should it be forwarded to the Voter service?
  • How does the Voter service protect against invalid incoming messages (eg checking bounds of the candidate list) before processing?
  • In all of the above scenarios, what, if any, response is returned to the API end user?

Conclusion

Although in this post we did not achieve asynchronous inter-process communications, we did achieve a higher level of service decoupling, using message-based RPC IPC. Adopting a message-based, loosely-coupled architecture, whether asynchronous or synchronous, wherever possible, will improve the overall functionality and deliverability of a microservices-based platform.

References

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , ,

Leave a comment

Provision and Deploy a Consul Cluster on AWS, using Terraform, Docker, and Jenkins

Cover2

Introduction

Modern DevOps tools, such as HashiCorp’s Packer and Terraform, make it easier to provision and manage complex cloud architecture. Utilizing a CI/CD server, such as Jenkins, to securely automate the use of these DevOps tools, ensures quick and consistent results.

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot application. The cluster was built locally with VirtualBox. This architecture is fine for development and testing, but not for use in Production.

In this post, we will deploy a highly available three-node Consul cluster to AWS. We will use Terraform to provision a set of EC2 instances and accompanying infrastructure. The instances will be built from a hybrid AMIs containing the new Docker Community Edition (CE). In a recent post, Baking AWS AMI with new Docker CE Using Packer, we provisioned an Ubuntu AMI with Docker CE, using Packer. We will deploy Docker containers to each EC2 host, containing an instance of Consul server.

All source code can be found on GitHub.

Jenkins

I have chosen Jenkins to automate all of the post’s build, provisioning, and deployment tasks. However, none of the code is written specific to Jenkins; you may run all of it from the command line.

For this post, I have built four projects in Jenkins, as follows:

  1. Provision Docker CE AMI: Builds Ubuntu AMI with Docker CE, using Packer
  2. Provision Consul Infra AWS: Provisions Consul infrastructure on AWS, using Terraform
  3. Deploy Consul Cluster AWS: Deploys Consul to AWS, using Docker
  4. Destroy Consul Infra AWS: Destroys Consul infrastructure on AWS, using Terraform

Jenkins UI

We will primarily be using the ‘Provision Consul Infra AWS’, ‘Deploy Consul Cluster AWS’, and ‘Destroy Consul Infra AWS’ Jenkins projects in this post. The fourth Jenkins project, ‘Provision Docker CE AMI’, automates the steps found in the recent post, Baking AWS AMI with new Docker CE Using Packer, to build the AMI used to provision the EC2 instances in this post.

Consul AWS Diagram 2

Terraform

Using Terraform, we will provision EC2 instances in three different Availability Zones within the US East 1 (N. Virginia) Region. Using Terraform’s Amazon Web Services (AWS) provider, we will create the following AWS resources:

  • (1) Virtual Private Cloud (VPC)
  • (1) Internet Gateway
  • (1) Key Pair
  • (3) Elastic Cloud Compute (EC2) Instances
  • (2) Security Groups
  • (3) Subnets
  • (1) Route
  • (3) Route Tables
  • (3) Route Table Associations

The final AWS architecture should resemble the following:

Consul AWS Diagram

Production Ready AWS

Although we have provisioned a fairly complete VPC for this post, it is far from being ready for Production. I have created two security groups, limiting the ingress and egress to the cluster. However, to further productionize the environment would require additional security hardening. At a minimum, you should consider adding public/private subnets, NAT gateways, network access control list rules (network ACLs), and the use of HTTPS for secure communications.

In production, applications would communicate with Consul through local Consul clients. Consul clients would take part in the LAN gossip pool from different subnets, Availability Zones, Regions, or VPCs using VPC peering. Communications would be tightly controlled by IAM, VPC, subnet, IP address, and port.

Also, you would not have direct access to the Consul UI through a publicly exposed IP or DNS address. Access to the UI would be removed altogether or locked down to specific IP addresses, and accessed restricted to secure communication channels.

Consul

We will achieve high availability (HA) by clustering three Consul server nodes across the three Elastic Cloud Compute (EC2) instances. In this minimally sized, three-node cluster of Consul servers, we are protected from the loss of one Consul server node, one EC2 instance, or one Availability Zone(AZ). The cluster will still maintain a quorum of two nodes. An additional level of HA that Consul supports, multiple datacenters (multiple AWS Regions), is not demonstrated in this post.

Docker

Having Docker CE already installed on each EC2 instance allows us to execute remote Docker commands over SSH from Jenkins. These commands will deploy and configure a Consul server node, within a Docker container, on each EC2 instance. The containers are built from HashiCorp’s latest Consul Docker image pulled from Docker Hub.

Getting Started

Preliminary Steps

If you have built infrastructure on AWS with Terraform, these steps should be familiar to you:

  1. First, you will need an AMI with Docker. I suggest reading Baking AWS AMI with new Docker CE Using Packer.
  2. You will need an AWS IAM User with the proper access to create the required infrastructure. For this post, I created a separate Jenkins IAM User with PowerUser level access.
  3. You will need to have an RSA public-private key pair, which can be used to SSH into the EC2 instances and install Consul.
  4. Ensure you have your AWS credentials set. I usually source mine from a .env file, as environment variables. Jenkins can securely manage credentials, using secret text or files.
  5. Fork and/or clone the Consul cluster project from  GitHub.
  6. Change the aws_key_name and public_key_path variable values to your own RSA key, in the variables.tf file
  7. Change the aws_amis_base variable values to your own AMI ID (see step 1)
  8. If you are do not want to use the US East 1 Region and its AZs, modify the variables.tf, network.tf, and instances.tf files.
  9. Disable Terraform’s remote state or modify the resource to match your remote state configuration, in the main.tf file. I am using an Amazon S3 bucket to store my Terraform remote state.

Building an AMI with Docker

If you have not built an Amazon Machine Image (AMI) for use in this post already, you can do so using the scripts provided in the previous post’s GitHub repository. To automate the AMI build task, I built the ‘Provision Docker CE AMI’ Jenkins project. Identical to the other three Jenkins projects in this post, this project has three main tasks, which include: 1) SCM: clone the Packer AMI GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Packer.

The SCM and Bindings tasks are identical to the other projects (see below for details), except for the use of a different GitHub repository. The project’s Build step, which runs the packer_build_ami.sh script looks as follows:

jenkins_13

The resulting AMI ID will need to be manually placed in Terraform’s variables.tf file, before provisioning the AWS infrastructure with Terraform. The new AMI ID will be displayed in Jenkin’s build output.

jenkins_14

Provisioning with Terraform

Based on the modifications you made in the Preliminary Steps, execute the terraform validate command to confirm your changes. Then, run the terraform plan command to review the plan. Assuming are were no errors, finally, run the terraform apply command to provision the AWS infrastructure components.

In Jenkins, I have created the ‘Provision Consul Infra AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run Terraform. Those tasks look as follows:

Jenkins_08.png

You will obviously need to use your modified GitHub project, incorporating the configuration changes detailed above, as the SCM source for Jenkins.

Jenkins Credentials

You will also need to configure your AWS credentials.

Jenkins_03.png

The provision_infra.sh script provisions the AWS infrastructure using Terraform. The script also updates Terraform’s remote state. Remember to update the remote state configuration in the script to match your personal settings.

The Jenkins build output should look similar to the following:

jenkins_12.png

Although the build only takes about 90 seconds to complete, the EC2 instances could take a few extra minutes to complete their Status Checks and be completely ready. The final results in the AWS EC2 Management Console should look as follows:

EC2 Management Console

Note each EC2 instance is running in a different US East 1 Availability Zone.

Installing Consul

Once the AWS infrastructure is running and the EC2 instances have completed their Status Checks successfully, we are ready to deploy Consul. In Jenkins, I have created the ‘Deploy Consul Cluster AWS’ project. This project has three tasks, which include: 1) SCM: clone the GitHub project, 2) Bindings: set up the AWS credentials, and 3) Build: run an SSH remote Docker command on each EC2 instance to deploy Consul. The SCM and Bindings tasks are identical to the project above. The project’s Build step looks as follows:

Jenkins_04.png

First, the delete_containers.sh script deletes any previous instances of Consul containers. This is helpful if you need to re-deploy Consul. Next, the deploy_consul.sh script executes a series of SSH remote Docker commands to install and configure Consul on each EC2 instance.

The entire Jenkins build process only takes about 30 seconds. Afterward, the output from a successful Jenkins build should show that all three Consul server instances are running, have formed a quorum, and have elected a Leader.

Jenkins_05.png

Persisting State

The Consul Docker image exposes VOLUME /consul/data, which is a path were Consul will place its persisted state. Using Terraform’s remote-exec provisioner, we create a directory on each EC2 instance, at /home/ubuntu/consul/config. The docker run command bind-mounts the container’s /consul/data path to the EC2 host’s /home/ubuntu/consul/config directory.

According to Consul, the Consul server container instance will ‘store the client information plus snapshots and data related to the consensus algorithm and other state, like Consul’s key/value store and catalog’ in the /consul/data directory. That container directory is now bind-mounted to the EC2 host, as demonstrated below.

jenkins_15

Accessing Consul

Following a successful deployment, you should be able to use the public URL, displayed in the build output of the ‘Deploy Consul Cluster AWS’ project, to access the Consul UI. Clicking on the Nodes tab in the UI, you should see all three Consul server instances, one per EC2 instance, running and healthy.

Consul UI

Destroying Infrastructure

When you are finished with the post, you may want to remove the running infrastructure, so you don’t continue to get billed by Amazon. The ‘Destroy Consul Infra AWS’ project destroys all the AWS infrastructure, provisioned as part of this post, in about 60 seconds. The project’s SCM and Bindings tasks are identical to the both previous projects. The Build step calls the destroy_infra.sh script, which is included in the GitHub project. The script executes the terraform destroy -force command. It will delete all running infrastructure components associated with the post and update Terraform’s remote state.

Jenkins_09

Conclusion

This post has demonstrated how modern DevOps tooling, such as HashiCorp’s Packer and Terraform, make it easy to build, provision and manage complex cloud architecture. Using a CI/CD server, such as Jenkins, to securely automate the use of these tools, ensures quick and consistent results.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , ,

Leave a comment

Baking AWS AMI with new Docker CE Using Packer

AWS for Docker

Introduction

On March 2 (less than a week ago as of this post), Docker announced the release of Docker Enterprise Edition (EE), a new version of the Docker platform optimized for business-critical deployments. As part of the release, Docker also renamed the free Docker products to Docker Community Edition (CE). Both products are adopting a new time-based versioning scheme for both Docker EE and CE. The initial release of Docker CE and EE, the 17.03 release, is the first to use the new scheme.

Along with the release, Docker delivered excellent documentation on installing, configuring, and troubleshooting the new Docker EE and CE. In this post, I will demonstrate how to partially bake an existing Amazon Machine Image (Amazon AMI) with the new Docker CE, preparing it as a base for the creation of Amazon Elastic Compute Cloud (Amazon EC2) compute instances.

Adding Docker and similar tooling to an AMI is referred to as partially baking an AMI, often referred to as a hybrid AMI. According to AWS, ‘hybrid AMIs provide a subset of the software needed to produce a fully functional instance, falling in between the fully baked and JeOS (just enough operating system) options on the AMI design spectrum.

Installing Docker CE on an AWS AMI should not be confused with Docker’s also recently announced Docker Community Edition (CE) for AWS. Docker for AWS offers multiple CloudFormation templates for Docker EE and CE. According to Docker, Docker for AWS ‘provides a Docker-native solution that avoids operational complexity and adding unneeded additional APIs to the Docker stack.

Base AMI

Docker provides detailed directions for installing Docker CE and EE onto several major Linux distributions. For this post, we will choose a widely used Linux distro, Ubuntu. According to Docker, currently Docker CE and EE can be installed on three popular Ubuntu releases:

  • Yakkety 16.10
  • Xenial 16.04 (LTS)
  • Trusty 14.04 (LTS)

To provision a small EC2 instance in Amazon’s US East (N. Virginia) Region, I will choose Ubuntu 16.04.2 LTS Xenial Xerus . According to Canonical’s Amazon EC2 AMI Locator website, a Xenial 16.04 LTS AMI is available, ami-09b3691f, for US East 1, as a t2.micro EC2 instance type.

Packer

HashiCorp Packer will be used to partially bake the base Ubuntu Xenial 16.04 AMI with Docker CE 17.03. HashiCorp describes Packer as ‘a tool for creating machine and container images for multiple platforms from a single source configuration.’ The JSON-format Packer file is as follows:

{
  "variables": {
    "aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}",
    "aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}",
    "us_east_1_ami": "ami-09b3691f",
    "name": "aws-docker-ce-base",
    "us_east_1_name": "ubuntu-xenial-docker-ce-base",
    "ssh_username": "ubuntu"
  },
  "builders": [
    {
      "name": "{{user `us_east_1_name`}}",
      "type": "amazon-ebs",
      "access_key": "{{user `aws_access_key`}}",
      "secret_key": "{{user `aws_secret_key`}}",
      "region": "us-east-1",
      "vpc_id": "",
      "subnet_id": "",
      "source_ami": "{{user `us_east_1_ami`}}",
      "instance_type": "t2.micro",
      "ssh_username": "{{user `ssh_username`}}",
      "ssh_timeout": "10m",
      "ami_name": "{{user `us_east_1_name`}} {{timestamp}}",
      "ami_description": "{{user `us_east_1_name`}} AMI",
      "run_tags": {
        "ami-create": "{{user `us_east_1_name`}}"
      },
      "tags": {
        "ami": "{{user `us_east_1_name`}}"
      },
      "ssh_private_ip": false,
      "associate_public_ip_address": true
    }
  ],
  "provisioners": [
    {
      "type": "file",
      "source": "bootstrap_docker_ce.sh",
      "destination": "/tmp/bootstrap_docker_ce.sh"
    },
    {
          "type": "file",
          "source": "cleanup.sh",
          "destination": "/tmp/cleanup.sh"
    },
    {
      "type": "shell",
      "execute_command": "echo 'packer' | sudo -S sh -c '{{ .Vars }} {{ .Path }}'",
      "inline": [
        "whoami",
        "cd /tmp",
        "chmod +x bootstrap_docker_ce.sh",
        "chmod +x cleanup.sh",
        "ls -alh /tmp",
        "./bootstrap_docker_ce.sh",
        "sleep 10",
        "./cleanup.sh"
      ]
    }
  ]
}

The Packer file uses Packer’s amazon-ebs builder type. This builder is used to create Amazon AMIs backed by Amazon Elastic Block Store (EBS) volumes, for use in EC2.

Bootstrap Script

To install Docker CE on the AMI, the Packer file executes a bootstrap shell script. The bootstrap script and subsequent cleanup script are executed using  Packer’s remote shell provisioner. The bootstrap is like the following:

#!/bin/sh

sudo apt-get remove docker docker-engine

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y docker-ce

sudo groupadd docker
sudo usermod -aG docker ubuntu

sudo systemctl enable docker

This script closely follows directions provided by Docker, for installing Docker CE on Ubuntu. After removing any previous copies of Docker, the script installs Docker CE. To ensure sudo is not required to execute Docker commands on any EC2 instance provisioned from resulting AMI, the script adds the ubuntu user to the docker group.

The bootstrap script also uses systemd to start the Docker daemon. Starting with Ubuntu 15.04, Systemd System and Service Manager is used by default instead of the previous init system, Upstart. Systemd ensures Docker will start on boot.

Cleaning Up

It is best good practice to clean up your activities after baking an AMI. I have included a basic clean up script. The cleanup script is as follows:

#!/bin/sh

set -e

echo 'Cleaning up after bootstrapping...'
sudo apt-get -y autoremove
sudo apt-get -y clean
sudo rm -rf /tmp/*
cat /dev/null > ~/.bash_history
history -c
exit

Partially Baking

Before running Packer to build the Docker CE AMI, I set both my AWS access key and AWS secret access key. The Packer file expects the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Running the packer build ubuntu_docker_ce_ami.json command builds the AMI. The abridged output should look similar to the following:

$ packer build docker_ami.json
ubuntu-xenial-docker-ce-base output will be in this color.

==> ubuntu-xenial-docker-ce-base: Prevalidating AMI Name...
    ubuntu-xenial-docker-ce-base: Found Image ID: ami-09b3691f
==> ubuntu-xenial-docker-ce-base: Creating temporary keypair: packer_58bc7a49-9e66-7f76-ce8e-391a67d94987
==> ubuntu-xenial-docker-ce-base: Creating temporary security group for this instance...
==> ubuntu-xenial-docker-ce-base: Authorizing access to port 22 the temporary security group...
==> ubuntu-xenial-docker-ce-base: Launching a source AWS instance...
    ubuntu-xenial-docker-ce-base: Instance ID: i-0ca883ecba0c28baf
==> ubuntu-xenial-docker-ce-base: Waiting for instance (i-0ca883ecba0c28baf) to become ready...
==> ubuntu-xenial-docker-ce-base: Adding tags to source instance
==> ubuntu-xenial-docker-ce-base: Waiting for SSH to become available...
==> ubuntu-xenial-docker-ce-base: Connected to SSH!
==> ubuntu-xenial-docker-ce-base: Uploading bootstrap_docker_ce.sh => /tmp/bootstrap_docker_ce.sh
==> ubuntu-xenial-docker-ce-base: Uploading cleanup.sh => /tmp/cleanup.sh
==> ubuntu-xenial-docker-ce-base: Provisioning with shell script: /var/folders/kf/637b0qns7xb0wh9p8c4q0r_40000gn/T/packer-shell189662158
    ...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: E: Unable to locate package docker-engine
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: ca-certificates is already the newest version (20160104ubuntu1).
    ubuntu-xenial-docker-ce-base: apt-transport-https is already the newest version (1.2.19).
    ubuntu-xenial-docker-ce-base: curl is already the newest version (7.47.0-1ubuntu2.2).
    ubuntu-xenial-docker-ce-base: software-properties-common is already the newest version (0.96.20.5).
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: OK
    ubuntu-xenial-docker-ce-base: pub   4096R/0EBFCD88 2017-02-22
    ubuntu-xenial-docker-ce-base: Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
    ubuntu-xenial-docker-ce-base: uid                  Docker Release (CE deb) <docker@docker.com>
    ubuntu-xenial-docker-ce-base: sub   4096R/F273FCD8 2017-02-22
    ubuntu-xenial-docker-ce-base:
    ubuntu-xenial-docker-ce-base: Hit:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial InRelease
    ubuntu-xenial-docker-ce-base: Get:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
    ...
    ubuntu-xenial-docker-ce-base: Get:27 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [89.5 kB]
    ubuntu-xenial-docker-ce-base: Fetched 10.6 MB in 2s (4,065 kB/s)
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: Calculating upgrade...
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: The following additional packages will be installed:
    ubuntu-xenial-docker-ce-base: aufs-tools cgroupfs-mount libltdl7
    ubuntu-xenial-docker-ce-base: Suggested packages:
    ubuntu-xenial-docker-ce-base: mountall
    ubuntu-xenial-docker-ce-base: The following NEW packages will be installed:
    ubuntu-xenial-docker-ce-base: aufs-tools cgroupfs-mount docker-ce libltdl7
    ubuntu-xenial-docker-ce-base: 0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
    ubuntu-xenial-docker-ce-base: Need to get 19.4 MB of archives.
    ubuntu-xenial-docker-ce-base: After this operation, 89.4 MB of additional disk space will be used.
    ubuntu-xenial-docker-ce-base: Get:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/universe amd64 aufs-tools amd64 1:3.2+20130722-1.1ubuntu1 [92.9 kB]
    ...
    ubuntu-xenial-docker-ce-base: Get:4 https://download.docker.com/linux/ubuntu xenial/stable amd64 docker-ce amd64 17.03.0~ce-0~ubuntu-xenial [19.3 MB]
    ubuntu-xenial-docker-ce-base: debconf: unable to initialize frontend: Dialog
    ubuntu-xenial-docker-ce-base: debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
    ubuntu-xenial-docker-ce-base: debconf: falling back to frontend: Readline
    ubuntu-xenial-docker-ce-base: debconf: unable to initialize frontend: Readline
    ubuntu-xenial-docker-ce-base: debconf: (This frontend requires a controlling tty.)
    ubuntu-xenial-docker-ce-base: debconf: falling back to frontend: Teletype
    ubuntu-xenial-docker-ce-base: dpkg-preconfigure: unable to re-open stdin:
    ubuntu-xenial-docker-ce-base: Fetched 19.4 MB in 1s (17.8 MB/s)
    ubuntu-xenial-docker-ce-base: Selecting previously unselected package aufs-tools.
    ubuntu-xenial-docker-ce-base: (Reading database ... 53844 files and directories currently installed.)
    ubuntu-xenial-docker-ce-base: Preparing to unpack .../aufs-tools_1%3a3.2+20130722-1.1ubuntu1_amd64.deb ...
    ubuntu-xenial-docker-ce-base: Unpacking aufs-tools (1:3.2+20130722-1.1ubuntu1) ...
    ...
    ubuntu-xenial-docker-ce-base: Setting up docker-ce (17.03.0~ce-0~ubuntu-xenial) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for libc-bin (2.23-0ubuntu5) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for systemd (229-4ubuntu16) ...
    ubuntu-xenial-docker-ce-base: Processing triggers for ureadahead (0.100.0-19) ...
    ubuntu-xenial-docker-ce-base: groupadd: group 'docker' already exists
    ubuntu-xenial-docker-ce-base: Synchronizing state of docker.service with SysV init with /lib/systemd/systemd-sysv-install...
    ubuntu-xenial-docker-ce-base: Executing /lib/systemd/systemd-sysv-install enable docker
    ubuntu-xenial-docker-ce-base: Cleanup...
    ubuntu-xenial-docker-ce-base: Reading package lists...
    ubuntu-xenial-docker-ce-base: Building dependency tree...
    ubuntu-xenial-docker-ce-base: Reading state information...
    ubuntu-xenial-docker-ce-base: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
==> ubuntu-xenial-docker-ce-base: Stopping the source instance...
==> ubuntu-xenial-docker-ce-base: Waiting for the instance to stop...
==> ubuntu-xenial-docker-ce-base: Creating the AMI: ubuntu-xenial-docker-ce-base 1288227081
    ubuntu-xenial-docker-ce-base: AMI: ami-e9ca6eff
==> ubuntu-xenial-docker-ce-base: Waiting for AMI to become ready...
==> ubuntu-xenial-docker-ce-base: Modifying attributes on AMI (ami-e9ca6eff)...
    ubuntu-xenial-docker-ce-base: Modifying: description
==> ubuntu-xenial-docker-ce-base: Modifying attributes on snapshot (snap-058a26c0250ee3217)...
==> ubuntu-xenial-docker-ce-base: Adding tags to AMI (ami-e9ca6eff)...
==> ubuntu-xenial-docker-ce-base: Tagging snapshot: snap-043a16c0154ee3217
==> ubuntu-xenial-docker-ce-base: Creating AMI tags
==> ubuntu-xenial-docker-ce-base: Creating snapshot tags
==> ubuntu-xenial-docker-ce-base: Terminating the source AWS instance...
==> ubuntu-xenial-docker-ce-base: Cleaning up any extra volumes...
==> ubuntu-xenial-docker-ce-base: No volumes to clean up, skipping
==> ubuntu-xenial-docker-ce-base: Deleting temporary security group...
==> ubuntu-xenial-docker-ce-base: Deleting temporary keypair...
Build 'ubuntu-xenial-docker-ce-base' finished.

==> Builds finished. The artifacts of successful builds are:
--> ubuntu-xenial-docker-ce-base: AMIs were created:

us-east-1: ami-e9ca6eff

Results

The result is an Ubuntu 16.04 AMI in US East 1 with Docker CE 17.03 installed. To confirm the new AMI is now available, I will use the AWS CLI to examine the resulting AMI:

aws ec2 describe-images \
  --filters Name=tag-key,Values=ami Name=tag-value,Values=ubuntu-xenial-docker-ce-base \
  --query 'Images[*].{ID:ImageId}'

Resulting output:

{
    "Images": [
        {
            "VirtualizationType": "hvm",
            "Name": "ubuntu-xenial-docker-ce-base 1488747081",
            "Tags": [
                {
                    "Value": "ubuntu-xenial-docker-ce-base",
                    "Key": "ami"
                }
            ],
            "Hypervisor": "xen",
            "SriovNetSupport": "simple",
            "ImageId": "ami-e9ca6eff",
            "State": "available",
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "SnapshotId": "snap-048a16c0250ee3227",
                        "VolumeSize": 8,
                        "VolumeType": "gp2",
                        "Encrypted": false
                    }
                },
                {
                    "DeviceName": "/dev/sdb",
                    "VirtualName": "ephemeral0"
                },
                {
                    "DeviceName": "/dev/sdc",
                    "VirtualName": "ephemeral1"
                }
            ],
            "Architecture": "x86_64",
            "ImageLocation": "931066906971/ubuntu-xenial-docker-ce-base 1488747081",
            "RootDeviceType": "ebs",
            "OwnerId": "931066906971",
            "RootDeviceName": "/dev/sda1",
            "CreationDate": "2017-03-05T20:53:41.000Z",
            "Public": false,
            "ImageType": "machine",
            "Description": "ubuntu-xenial-docker-ce-base AMI"
        }
    ]
}

Finally, here is the new AMI as seen in the AWS EC2 Management Console:

EC2 Management Console - AMI

Terraform

To confirm Docker CE is installed and running, I can provision a new EC2 instance, using HashiCorp Terraform. This post is too short to detail all the Terraform code required to stand up a complete environment. I’ve included the complete code in the GitHub repo for this post. Not, the Terraform code is only used to testing. No security, including the use of a properly configured security groups, public/private subnets, and a NAT server, is configured.

Below is a greatly abridged version of the Terraform code I used to provision a new EC2 instance, using Terraform’s aws_instance resource. The resulting EC2 instance should have Docker CE available.

# test-docker-ce instance
resource "aws_instance" "test-docker-ce" {
  connection {
    user        = "ubuntu"
    private_key = "${file("~/.ssh/test-docker-ce")}"
    timeout     = "${connection_timeout}"
  }

  ami               = "ami-e9ca6eff"
  instance_type     = "t2.nano"
  availability_zone = "us-east-1a"
  count             = "1"

  key_name               = "${aws_key_pair.auth.id}"
  vpc_security_group_ids = ["${aws_security_group.test-docker-ce.id}"]
  subnet_id              = "${aws_subnet.test-docker-ce.id}"

  tags {
    Owner       = "Gary A. Stafford"
    Terraform   = true
    Environment = "test-docker-ce"
    Name        = "tf-instance-test-docker-ce"
  }
}

By using the AWS CLI, once again, we can confirm the new EC2 instance was built using the correct AMI:

aws ec2 describe-instances \
  --filters Name='tag:Name,Values=tf-instance-test-docker-ce' \
  --output text --query 'Reservations[*].Instances[*].ImageId'

Resulting output looks good:

ami-e9ca6eff

Finally, here is the new EC2 as seen in the AWS EC2 Management Console:

EC2 Management Console - EC2

SSHing into the new EC2 instance, I should observe that the operating system is Ubuntu 16.04.2 LTS and that Docker version 17.03.0-ce is installed and running:

Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-64-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.

Last login: Sun Mar  5 22:06:01 2017 from <my_ip_address>

ubuntu@ip-<ec2_local_ip>:~$ docker --version
Docker version 17.03.0-ce, build 3a232c8

Conclusion

Docker EE and CE represent a significant step forward in expanding Docker’s enterprise-grade toolkit. Replacing or installing Docker EE or CE on your AWS AMIs is easy, using Docker’s guide along with HashiCorp Packer.

All source code for this post can be found on GitHub.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

1 Comment

Effective DevOps Technical Analysis

mindmap_analysis

Introduction

Building a successful DevOps team is a significant challenge. Building a successful Agile DevOps team can be an enormous challenge. Most junior Software Developers have experience working on Agile teams, often starting in college. On the contrary, most Operations Engineers, System Administrators, and Security Specialists have not participated on an Agile team. They frequently cut their teeth in highly reactive, support ticket-driven environments. Agile methodologiesceremonies, and coding practices, often seem strange and awkward to traditional operation resources.

Adding to the challenge, most DevOps teams are expected to execute on a broad range of requirements. DevOps user stories span the software development, release, and support. DevOps team backlogs often include a diverse assortment of epics, including logging, monitoring, alerting, infrastructure provisioning, networking, security, continuous integration and deployment, and support.

Worst yet, many DevOps teams don’t apply the same rigor to developing high-quality user stories and acceptance criteria for DevOps-related requirements, as they do for software application requirements. On teams where DevOps Engineers are integrated with Software Developers, I find the lack of rigor is often due to a lack of understanding of typical DevOps requirements. On stand-alone DevOps teams, I find engineers often believe their stories don’t need the same attention to details, as their software development peers.

Successful Teams

I’ve been lucky enough to be part of several successful DevOps teams, both integrated and stand-alone. The teams were a mix of engineers from both software development, system administration, security, support, and operations backgrounds. One of the many reasons these teams were effective, was the presence of a Technical Analyst. The Technical Analyst role falls somewhere in between an Agile Business Analyst and a traditional Systems Analyst, probably closer to the latter on a DevOps team. The Technical Analyst ensures that user stories are Ready for Development, by meeting the Definition of Ready.

There is a common misperception that the Technical Analyst on a DevOps team must be the senior technologist. DevOps is a very broad and quickly evolving discipline. While a good understanding of Agile practices, CI/CD, and modern software development is beneficial, I’ve found organizational and investigative skills to be most advantageous to the Technical Analyst.

In this post, we will explore how to optimize the effectiveness of a Technical Analyst, through the use of a consistent approach to functional requirement analysis and story development. Uniformity improves the overall quality and consistency of user stories and acceptance criteria, maximizing business value and minimizing resource utilization.

Inadequate Requirements

Let’s start with a typical greenfield requirement, frequently assigned to DevOps teams, centralized logging.

Title
Centralized Application Logging

User Story
As a Software Developer,
I need to view application logs,
So that I can monitor and troubleshoot running applications.

Acceptance Criterion 1
I can view the latest logs from multiple applications.

Acceptance Criterion 2
I can view application logs without having to log directly into each host on which the applications are running.

Acceptance Criterion 3
I can view the log entries in exact chronological order, based on timestamps.

Technical Background Information
Currently, Software Developers log directly into host machines, using elevated credentials, to review running application logs. This is a high-risk practice, which often results in an increase in high-severity support tickets. This is also a huge security concern. Sensitive data could potentially be accessed from the host, using the elevated credentials.

I know some DevOps Engineers that would willingly pick up this user story, make a handful of assumptions based on previous experiences, and manage to hack out a centralized application logging system. The ability to meet the end user’s expectations would clearly not be based on the vague and incomplete user story, acceptance criteria, and technical background details. Any centralized logging solution would be mostly based on the personal knowledge (and luck!) of the engineer.

Multi-Faceted Analysis

Consistently delivering well-defined user stories and comprehensive acceptance criteria, which provide maximum business value with minimal resource utilization, requires a systematic analytical approach.

Based on my experience, I have found certain aspects of the software development, delivery, and support lifecycles, affect or are affected by DevOps-related functional requirements. Every requirement should be consistently analyzed against these facets to improve the overall quality of user stories and acceptance criteria.

For simplicity, I prefer to organize these aspects into five broad categories: Technology, Infrastructure, Delivery, Support, and Organizational. In no particular order, these facets, include:

  • Technology
    • Technology Preferences
    • Architectural Approval
    • Technical Debt
    • Configuration and Feature Management
  • Infrastructure
    • Infrastructure
    • Life Expectancy
    • Backup and Recovery
    • Data Management
    • Computer Networking
    • High Availability and Fault Tolerance
  • Delivery
    • Environment Management
    • Continuous Integration and Delivery
    • Release Management
  • Support
    • Service Level Agreements (SLAs)
    • Monitoring
    • Alerting
    • Logging
    • Support
    • Maintenance
    • Service Level Agreements (SLAs)
    • Training and Documentation
  • Organizational
    • Ownership
    • Cost
    • Licensing and Legal
    • Security and Compliance

Requirement Impacts

Let’s take a closer look at each aspect, and how they could impact DevOps requirements.

Logging

Do the requirements introduce changes, which would or should produce logs? How should those logs be handled? Do those changes impact existing logging processes and systems?

Monitoring

Do the requirements introduce changes, which would or should be monitored? Do those changes impact existing monitoring processes and systems?

Alerting

Do the requirements introduce changes, which would or should produce alerts? Do the requirements introduce changes, which impact existing alerting processes and systems?

Ownership

Do the requirements introduce changes, which require an owner? Do those changes impact existing ownership models? An example of this might be regarding who owns the support and maintenance of new infrastructure.

Support

Do the requirements introduce changes, which would or should require internal, external, or vendor support? Do those changes impact existing support models? For example, adding new service categories to a support ticketing system, like ServiceNow.

Maintenance

Do the requirements introduce changes, which would or should require regular and on-going maintenance? Do those changes impact existing maintenance processes and schedules? An example of maintenance might be regarding how upgrading and patching would be completed for a new monitoring system.

Backup and Recovery

Do the requirements introduce changes, which would or should require backup and recovery? Do the requirements introduce changes, which impact existing backup and recovery processes and systems?

Security and Compliance

Do the requirements introduce changes, which have security and compliance implications? Do those changes impact existing security and compliance requirements, processes, and systems? Compliance in IT frequently includes Regulatory Compliance with HIPAA, Sarbanes-Oxley, and PCI-DSS. Security and Compliance is closely associated with Licensing and Legal.

Continuous Integration and Delivery

Do the requirements introduce changes, which require continuous integration and delivery of application or infrastructure code? Do those changes impact existing continuous integration and delivery processes and systems?

Service Level Agreements (SLAs)

Do the requirements introduce changes, which are or should be covered by Service Level Agreements (SLA)? Do the requirements introduce changes, which impact existing SLAs Often, these requirements are system performance- or support-related, centered around Mean Time Between Failures (MTBF), Mean Time to Repair, and Mean Time to Recovery (MTTR).

Configuration Management

Do the requirements introduce changes, which requires the management of new configuration or multiple configurations? Do those changes affect existing configuration or configuration management processes and systems? Configuration management also includes feature management (aka feature switches). Will the changes turned on or turned off for testing and release? Configuration Management is closely associated with Environment Managment and Release Management.

Environment Management

Do the requirements introduce changes, which affect existing software environments or require new software environments? Do those changes affect existing environments? Software environments typically include Development, Test, Performance, Staging, and Production.

High Availability and Fault Tolerance

Do the requirements introduce changes, which require high availability and fault tolerant configurations? Do those changes affect existing highly available and fault-tolerant systems?

Release Management

Do the requirements introduce changes, which require releases to the Production or other client-facing environments? Do those changes impact existing release management processes and systems?

Life Expectancy

Do the requirements introduce changes, which have a defined lifespan? For example, will the changes be part a temporary fix, a POC, an MVP, pilot, or a permanent solution? Do those changes impact the lifecycles of existing processes and systems? For example, do the changes replace an existing system?

Architectural Approval

Do the requirements introduce changes, which require the approval of the Client, Architectural Review Board (ARB), Change Approval Board (CAB), or other governing bodies? Do the requirements introduce changes, which impact other existing or pending technology choices? Architectural Approval is closely related to Technology Preferences, below. Often an ARB or similar governing body will regulate technology choices.

Technology Preferences

Do the requirements introduce changes, which might be impacted by existing preferred vendor relationships or previously approved technologies? Are the requirements impacted by blocked technology, including countries or companies blacklisted by the Government, such as the Specially Designated Nationals List (SDN) or Consolidated Sanctions List (CSL)? Does the client approve of the use of Open Source technologies? What is the impact of those technology choices on other facets, such as Licensing and Legal, Cost, and Support? Do those changes impact other existing or pending technology choices?

Infrastructure

Do the requirements introduce changes, which would or should impact infrastructure? Is the infrastructure new or existing, is it physical, is it virtualized, is it in the cloud? If the infrastructure is virtualized, are there coding requirements about how it is provisioned? If the infrastructure is physical, how will it impact other aspects, such as Technology Preferences, Cost, Support, Monitoring, and so forth? Do the requirements introduce changes, which impact existing infrastructure?

Data Management

Do the requirements introduce changes, which produce data or require the preservation of state? Do those changes impact existing data management processes and systems? Is the data ephemeral or transient? How is the data persisted? Is the data stored in a SQL database, NoSQL database, distributed key-value store, or flat-file? Consider how other aspects, such as Security and Compliance, Technology Preferences, and Architectural Approval, impact or are impacted by the various aspects of Data?

Training and Documentation

Do the requirements introduce changes, which require new training and documentation? This also includes architectural diagrams and other visuals. Do the requirements introduce changes, which impact training and documentation processes and practices?

Computer Networking

Do the requirements introduce changes, which require networking additions or modifications? Do those changes impact existing networking processes and systems? I have separated out Computer Networking from Infrastructure since networking is itself, so broad, involving such things as DNS, SDN, VPN, SSL, certificates, load balancing, proxies, firewalls, and so forth. Again, how do networking choices impact other aspects, like Security and Compliance?

Licensing and Legal

Do the requirements introduce changes, which requires licensing? Should or would the requirements require additional legal review? Are there contractual agreements that need authorization? Do the requirements introduce changes, which impact licensing or other legal agreements or contracts?

Cost

Do the requirements introduce changes, which requires the purchase of equipment, contracts, outside resources, or other goods which incur a cost? How and where will the funding come from to purchase those items? Are there licensing or other legal agreements and contracts required as part of the purchase? Who approves these purchases? Are they within acceptable budgetary expectations? Do those changes impact existing costs or budgets?

Technical Debt

Do the requirements introduce changes, which incur technical debt? What is the cost of that debt? How will impact other requirements and timelines? Do the requirements eliminate existing technical debt? Do those changes impact existing or future technical debt caused by other processes and systems?

Other

Depending on your organization, industry, and DevOps maturity, you may prefer different aspects critical to your requirements.

Closer Analysis

Let’s take a second look at our centralized logging story, applying what we’ve learned. For each facet, we will ask the following three questions.

Is there any aspect of _______ that impacts the centralized logging requirement(s)?
Will the centralized logging requirement(s) impact any aspect of _______?
If yes, will the impacts be captured as part of this user story, an additional user story, or elsewhere?

For the sake of brevity, we will only look at the half the aspects.

Logging

Although the requirement is about centralized logging, there is still analysis that should occur around Logging. For example, we should determine if there is a requirement that the centralized logging application should consume its own logs for easier supportability. If so, we should add an acceptance criterion to the original user story around the logging application’s logs being captured.

Monitoring

If we are building a centralized application logging system, we will assume there will be a centralized logging application, possibly a database, and infrastructure on which the application is running. Therefore, it is reasonable to also assume that the logging application, database, and infrastructure need to be monitored. This would require interfacing with an existing monitoring system, or by creating a new monitoring system. If and how the system will be monitored needs to be clarified.

Interfacing with a current monitoring system will minimally require additional acceptance criteria. Developing a new monitoring system would undoubtedly require additional user story or stories, with their own acceptance criteria.

Alerting

If we are monitoring the system, then we could assume that there should be a requirement to alert someone if monitoring finds the centralized application logging system becomes impaired. Similar to monitoring, this would require interfacing with an existing alerting system or creating a new alerting system. A bigger question, who do the alerts go to? These details should be clarified first.

Interfacing with an existing alerting system will minimally require additional acceptance criteria. Developing a new alerting system would require additional user story or stories, with their own acceptance criteria.

Ownership

Who will own the new centralized application logging system? Ownership impacts many aspects, such as Support, Maintenance, and Alerting. Ownership should be clearly defined, before the requirements stories are played. Don’t build something that does not have an owner, it will fail as soon as it is launched.

No additional user stories or acceptance criteria should be required.

Support

We will assume that there are no Support requirements, other than what is already covered in Monitoring, Ownership, and Training and Documentation.

Maintenance

Similarly, we will assume that there are no Maintenance requirements, other than what is already covered in Ownership and Training and Documentation. Requirements regarding automating system maintenance in any way should be an additional user story or stories, with their own acceptance criteria.

Backup and Recovery

Similar to both Support and Maintenance, Backup and Recovery should have a clear owner. Regardless of the owner, integrating backup into the centralized application logging system should be addressed as part of the requirement. To create a critical support system such as logging, with no means of backup or recovery of the system or the data, is unwise.

Backup and Recovery should be an additional user story or stories, with their own acceptance criteria.

Security and Compliance

Since we are logging application activity, we should assume there is the potential of personally identifiable information (PII) and other sensitive business data being captured. Due to the security and compliance risks associated with improper handling of PII, there should be clear details in the logging requirement regarding such things as system access.

At a minimum, there are usually requirements for a centralized application logging system to implement some form of user, role, application, and environment-level authorization for viewing logs. Security and Compliance should be an additional user story or stories, with their own acceptance criteria. Any requirements around the obfuscation of sensitive data in the log entries would also be an entirely separate set of business requirements.

Continuous Integration and Delivery

We will assume that there are no Continuous Integration and Delivery requirements around logging. Any requirements to push the logs of existing Continuous Integration and Delivery systems to the new centralized application logging system would be a separate use story.

Service Level Agreements (SLAs)

A critical aspect of most DevOps user stories, which is commonly overlooked, Service Level Agreements. Most requirements should at least a minimal set of performance-related metrics associated with them. Defining SLAs also help to refine the scope of a requirement. For example, what is the anticipated average volume of log entries to be aggregated and over what time period? From how many sources will logs be aggregated? What is the anticipated average size of each log entry, 2 KB or 2 MB? What is the anticipated average volume of concurrent of system users?

Given these base assumptions, how quickly are new log entries expected to be available in the new centralized application logging system? What is the expected overall response time of the system under average load?

Directly applicable SLAs should be documented as acceptance criteria in the original user story, and the remaining SLAs assigned to the other new user stories.

Configuration Management

We will also assume that there no Configuration Management requirements around the new centralized application logging system. However, if the requirements prescribed standing up separately logging implementations, possibly to support multiple environments or applications, then Configuration Management would certainly be in scope.

Environment Management

Based on further clarification, we will assume we are only creating a single centralized application logging system instance. However, we still need to learn which application environments need to be captured by that system. Is this application only intended to support the Development and Test environments? Are the higher environments of Staging and Production precluded due to PII?

We should add an explicit acceptance criterion to the original user story around which application environments should be included and which should not.

High Availability and Fault Tolerance

The logging requirement should state whether or not this system is considered business critical. Usually, if a system is considered critical to the business, then we should assume the system requires monitoring, alerting, support, and backup. Additionally, the system should be highly available and fault tolerant.

Making the system highly available would surely have an impact on facets like Infrastructure, Monitoring, and Alerting. High Availability and Fault Tolerance could be a separate user story or stories, with its own acceptance criteria, as long as the original requirement doesn’t dictate those features.

Other Aspects

We skipped half the facets. They would certainly expand our story list and acceptance criteria. We didn’t address how the end user would view the logs. Should we assume a web browser? Which web browsers? We didn’t determine if the system should be accessible outside the internal network. We didn’t determine how the system would be configured and deployed. We didn’t investigate if there are any vendor or licensing restrictions, or if there are any required training and documentation needs. Possibly, we will need Architectural Approval before we start to develop.

Analysis Results

Based on our brief, partial analysis, we uncovered some further needs.

Further Clarification

We need additional information and clarification regarding Logging, Monitoring, Alerting, and Security and Compliance. We should also determine ownership of the resulting centralized application logging system, before it is built, to ensure Alerting, Maintenance, and Support, are covered.

Additional Acceptance Criteria

We need to add additional acceptance criteria to the original user story around SLAs, Security and Compliance, Logging, Monitoring, and Environment Management.I prefer to write acceptance criteria in the

I prefer to write acceptance criteria in the Given-When-Then format. I find the ‘Given-When-Then’ format tends to drive a greater level of story refinement and detail than the ‘I’ format. For example, compare:

I expect to see the most current log entries with no more than a two-minute time delay.

With:

Given the centralized application logging system is operating normally and under average expected load,
When I view log entries for running applications, from my web browser, 
Then I should see the most current log entries with no more than a two-minute time delay.

New User Stories

Following Agile best practices, illustrated by the INVEST mnemonic, we need to add new, quality user stories, with their own acceptance criteria, around Monitoring, Alerting, Backup and Recovery, Security and Compliance, and High Availability and Fault Tolerance.

Conclusion

In this post, we have shown how uniformly analyzing the impacts of the above aspects on the requirements will improve the overall quality and consistency of user stories and acceptance criteria, maximizing business value and minimizing resource utilization.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , ,

Leave a comment