Posts Tagged proxy

Deploying and Configuring Istio on Google Kubernetes Engine (GKE)

GKE_021B

Introduction

Unquestionably, Kubernetes has quickly become the leading Container-as-a-Service (CaaS) platform. In late September 2017, Rancher Labs announced the release of Rancher 2.0, based on Kubernetes. In mid-October, at DockerCon Europe 2017, Docker announced they were integrating Kubernetes into the Docker platform. In late October, Microsoft released the public preview of Managed Kubernetes for Azure Container Service (AKS). In November, Google officially renamed its Google Container Engine to Google Kubernetes Engine. Most recently, at AWS re:Invent 2017, Amazon announced its own manged version of Kubernetes, Amazon Elastic Container Service for Kubernetes (Amazon EKS).

The recent abundance of Kuberentes-based CaaS offerings makes deploying, scaling, and managing modern distributed applications increasingly easier. However, as Craig McLuckie, CEO of Heptio, recently stated, “…it doesn’t matter who is delivering Kubernetes, what matters is how it runs.” Making Kubernetes run better is the goal of a new generation of tools, such as Istio, EnvoyProject Calico, Helm, and Ambassador.

What is Istio?

One of those new tools and the subject of this post is Istio. Released in Alpha by Google, IBM and Lyft, in May 2017, Istio is an open platform to connect, manage, and secure microservices. Istio describes itself as, “…an easy way to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, without requiring any changes in service code. You add Istio support to services by deploying a special sidecar proxy throughout your environment that intercepts all network communication between microservices, configured and managed using Istio’s control plane functionality.

Istio contains several components, split between the data plane and a control plane. The data plane includes the Istio Proxy (an extended version of Envoy proxy). The control plane includes the Istio Mixer, Istio Pilot, and Istio-Auth. The Istio components work together to provide behavioral insights and operational control over a microservice-based service mesh. Istio describes a service mesh as a “transparently injected layer of infrastructure between a service and the network that gives operators the controls they need while freeing developers from having to bake solutions to distributed system problems into their code.

In this post, we will deploy the latest version of Istio, v0.4.0, on Google Cloud Platform, using the latest version of Google Kubernetes Engine (GKE), 1.8.4-gke.1. Both versions were just released in mid-December, as this post is being written. Google, as you probably know, was the creator of Kubernetes, now an open-source CNCF project. Google was the first Cloud Service Provider (CSP) to offer managed Kubernetes in the Cloud, starting in 2014, with Google Container Engine (GKE), which used Kubernetes. This post will outline the installation of Istio on GKE, as well as the deployment of a sample application, integrated with Istio, to demonstrate Istio’s observability features.

Getting Started

All code from this post is available on GitHub. You will need to change some variables within the code, to meet your own project’s needs. This post uses Gists to display source code; they are best viewed using a desktop browser.

The scripts used in this post are as follows, in order of execution.

Creating GKE Cluster

First, we create the Google Kubernetes Engine (GKE) cluster. The GKE cluster creation is highly-configurable from either the GCP Cloud Console or from the command line, using the Google Cloud Platform gcloud CLI. The CLI will be used throughout the post. I have chosen to create a highly-available, 3-node cluster (1 node/zone) in GCP’s South Carolina us-east1 region.

Once built, we need to retrieve the cluster’s credentials.

Having chosen to use Kubernetes’ Alpha Clusters feature, the following warning is displayed, warning the Alpha cluster will be deleted in 30 days.

The resulting GKE cluster will have the following characteristics.

Installing Istio

With the GKE cluster created, we can now deploy Istio. There are at least two options for deploying Istio on GCP. You may choose to manually install and configure Istio in a GKE cluster, as I will do in this post, following these instructions. Alternatively, you may choose to use the Istio GKE Deployment Manager. This all-in-one GCP service will create your GKE cluster, and install and configure Istio and the Istio add-ons, including their Book Info sample application.

G002_DeployCluster

There were a few reasons I chose not to use the Istio GKE Deployment Manager option. First, until very recently, you could not install the latest versions of Istio with this option (as of 12/21 you can now deploy v0.3.0 and v0.4.0). Secondly, currently, you only have the choice of GKE version 1.7.8-gke.0. I wanted to test the latest v1.8.4 release with a stable GA version of RBAC. Thirdly, at least three out of four of my initial attempts to use the Istio GKE Deployment Manager failed during provisioning for unknown reasons. Lastly, you will learn more about GKE, Kubernetes, and Istio by doing it yourself, at least the first time.

Istio Code Changes

Before installing Istio, I had to make several minor code changes to my existing Kubernetes resource files. The requirements are detailed in Istio’s Pod Spec Requirements. These changes are minor, but if missed, cause errors during deployment, which can be hard to identify and resolve.

First, you need to name your Service ports in your Service resource files. More specifically, you need to name your service ports, http, as shown in the Candidate microservice’s Service resource file, below (note line 10).

Second, an app label is required for Istio. I added an app label to each Deployment and Service resource file, as shown below in the Candidate microservice’s Deployment resource files (note lines 5 and 6).

The next set of code changes were to my existing Ingress resource file. The requirements for an Ingress resource using Istio are explained here. The first change, Istio ignores all annotations other than kubernetes.io/ingress.class: istio (note line 7, below). The next change, if using HTTPS, the secret containing your TLS/SSL certificate and private key must be called istio-ingress-certs; all other names will be ignored (note line 10, below). Related and critically important, that secret must be deployed to the istio-system namespace, not the application’s namespace. The last change, for my particular my prefix match routing rules, I needed to change the rules from /{service_name} to /{service_name}/.*. The /.* is a special Istio notation that is used to indicate a prefix match (note lines 14, 18, and 22, below).

Installing Istio

To install Istio, you first will need to download and uncompress the correct distribution of Istio for your OS. Istio provides instructions for installation on various platforms.

My install-istio.sh script contains a variable, ISTIO_HOME, which should point to the root of your local Istio directory. We will also deploy all the current Istio add-ons, including Prometheus, Grafana, ZipkinService Graph, and Zipkin-to-Stackdriver.

Once installed, from the GCP Cloud Console, an alternative to the native Kubernetes Dashboard, we should see the following Istio resources deployed and running. Below, note the three nodes are distributed across three zones within the GCP us-east-1 region, the correct version of GKE is employed, Stackdriver logging and monitoring are enabled, and the Alpha Clusters features are also enabled.

GKE_001

And here, we see the nodes that comprise the GKE cluster.

GKE_001_1

GKE_001_2.PNG

Below, note the four components that comprise Istio: istio-ca, istio-ingress, istio-mixer, and istio-pilot. Additionally, note the five components that comprise the Istio add-ons.

GKE_002

Below, observe the Istio Ingress has automatically been assigned a public IP address by GCP, accessible on ports 80 and 443. This IP address is how we will communicate with applications running on our GKE cluster, behind the Istio Ingress Load Balancer. Later, we will see how the Istio Ingress Load Balancer knows how to route incoming traffic to those application endpoints, using the Voter API’s Ingress configuration.

GKE_003.PNG

Istio makes ample use of Kubernetes Config Maps and Secrets, to store configuration, and to store certificates for mutual TLS.

GKE_004

Creation of the GKE cluster and deployed Istio to the cluster is complete. Following, I will demonstrate the deployment of the Voter API to the cluster. This will be used to demonstrate the capabilities of Istio on GKE.

Kubernetes Dashboard

In addition to the GCP Cloud Console, the native Kubernetes Dashboard is also available. To open, use the kubectl proxy command and connect to the Kubernetes Dashboard at https://127.0.0.1:8001/ui. You should now be able to view and edit all resources, from within the Kubernetes Dashboard.

GKE_005_5

Sample Application

To demonstrate the functionality of Istio and GKE, I will deploy the Voter API. I have used variations of the sample Voter API application in several previous posts, including Architecting Cloud-Optimized Apps with AKS (Azure’s Managed Kubernetes), Azure Service Bus, and Cosmos DB and Eventual Consistency: Decoupling Microservices with Spring AMQP and RabbitMQ. I suggest reading these two post to better understand the Voter API’s design.

AKS

For this post, I have reconfigured the Voter API to use MongoDB’s Atlas Database-as-a-Service (DBaaS) as the NoSQL data-source for each microservice. The Voter API is connected to a MongoDB Atlas 3-node M10 instance cluster in GCP’s us-east1 (South Carolina) region. With Atlas, you have the choice of deploying clusters to GCP or AWS.

GKE_014

The Voter API will use CloudAMQP’s RabbitMQ-as-a-Service for its decoupled, eventually consistent, message-based architecture. For this post, the Voter API is configured to use a RabbitMQ cluster in GCP’s us-east1 (South Carolina) region; I chose a minimally-configured free version of RabbitMQ. CloudAMQP allows you to provide a much more robust multi-node clusters for Production, on GCP or AWS.

GKE_015_1.PNG

CloudAMQP provides access to their own Management UI, in addition to access to RabbitMQ’s Management UI.

GKE_015B

With the Voter API running and taking traffic, we can see each Voter API microservice instance, nine replicas in total, connected to RabbitMQ. They are each publishing and consuming messages off the two queues.

GKE_016

The GKE, MongoDB Atlas, and RabbitMQ clusters are all running in the same GCP Region. Optimizing the Voter API cloud architecture on GCP, within a single Region, greatly reduces network latency, increases API performance, and improves end-to-end application and infrastructure observability and traceability.

Installing the Voter API

For simplicity, I have divided the Voter API deployment into three parts. First, we create the new voter-api Kubernetes Namespace, followed by creating a series of Voter API Kuberentes Secrets.

There are a total of five secrets, one secret for each of the three microservice’s MongoDB databases, one secret for the RabbitMQ connection string (shown below), and one secret containing a Let’s Encrypt SSL/TLS certificate chain and private key for the Voter API’s domain, api.voter-demo.com (shown below).

GKE_011

GKE_006.PNG

GKE_007.PNG

Next, we create the microservice pods, using the Kubernetes Deployment files, create three ClusterIP-type Kubernetes Services, and a Kubernetes Ingress. The Ingress contains the service endpoint configuration, which Istio Ingress will use to correctly route incoming external API traffic.

Three Kubernetes Pods for each of the three microservice should be created, for a total of nine pods. In the GCP Cloud UI’s Workloads (Kubernetes Deployments), you should see the following three resources. Note each Workload has three pods, each containing one replica of the microservice.

GKE_010

In the GCP Cloud UI’s Discovery and Load Balancing tab, you should observe the following four resources. Note the Voter API Ingress endpoints for the three microservices, which are used by the Istio Proxy, discussed below.

GKE_009.PNG

Istio Proxy

Examining the Voter API deployment more closely, you will observe that each of the nine Voter API microservice pods have two containers running within them.

Along with the microservice container, there is an Istio Proxy container, commonly referred to as a sidecar container. Istio Proxy is an extended version of the Envoy proxy, Lyfts well-known, highly performant edge and service proxy. The proxy sidecar container is injected automatically when the Voter API pods are created. This is possible because we deployed the Istio Initializer (istio-initializer.yaml). The Istio Initializer guarantees that Istio Proxy will be automatically injected into every microservice Pod. This is referred to as automatic sidecar injection. Below we see an example of one of three Candidate pods running the istio-proxy sidecar.

GKE_012

In the example above, all traffic to and from the Candidate microservice now passes through the Istio Proxy sidecar. With Istio Proxy, we gain several enterprise-grade features, including enhanced observability, service discovery and load balancing, credential injection, and connection management.

Manual Sidecar Injection

What if we have application components we do not want automatically managed with Istio Proxy. In that case, manual sidecar injection might be preferable to automatic sidecar injection with Istio Initializer. For manual sidecar injection, we execute a istioctl kube-inject command for each of the Kubernetes Deployments. The command manually injects the Istio Proxy container configuration into the Deployment resource file, alongside each Voter API microservice container. On Mac and Linux, this command is similar to the following. Proxy injection is discussed in detail, here.

External Service Egress

Whether you choose automatic or manual sidecar injection of the Istio Proxy, Istio’s egress rules currently only support HTTP and HTTPS requests. The Voter API makes external calls to its backend services, using two alternate protocols, MongoDB Wire Protocol (mongodb://) and RabbitMQ AMQP (amqps://). Since we cannot use an Istio egress rule for either protocol, we will use the includeIPRanges option with the istioctl kube-inject command to open egress to the two backend services. This will completely bypass Istio for a specific IP range. You can read more about calling external services directly, on Istio’s website.

You will need to modify the includeIPRanges argument within the create-voter-api-part3.sh script, adding your own GKE cluster’s IP ranges to the IP_RANGES variable. The two IP ranges can be found using the following GCP CLI command.

The create-voter-api-part3.sh script also contains a modified version the istioctl kube-inject command for each Voter API Deployment. Using the modified command, the original Deployment files are not altered, instead, a temporary copy of the Deployment file is created into which Istio injects the required modifications. The temporary Deployment file is then used for the deployment, and then immediately deleted.

Some would argue not having the actual deployed version of the file checked into in source code control is an anti-pattern; in this case, I would disagree. If I need to redeploy, I would just run the istioctl kube-inject command again. You can always view, edit, and import the deployed YAML file, from the GCP CLI or GKE Management UI.

The amount of Istio configuration injected into each microservice Pod’s Deployment resource file is considerable. The Candidate Deployment file swelled from 68 lines to 276 lines of code! This hints at the power, as well as the complexity of Istio. Shown below is a snippet of the Candidate Deployment YAML, after Istio injection.

GKE_025

Confirming Voter API

Installation of the Voter API is now complete. We can validate the Voter API is working, and that traffic is being routed through Istio, using Postman. Below, we see a list of candidates successfully returned from the Voter microservice, through the Voter API. This means, not only us the API running, but that messages have been successfully passed between the services, using RabbitMQ, and saved to the microservice’s corresponding MongoDB databases.

GKE_030

Below, note the server and x-envoy-upstream-service-time response headers. They both confirm the Voter API HTTPS traffic is being managed by Istio.

GKE_031.PNG

Observability

Observability is certainly one of the primary advantages of implementing Istio. For anyone like myself, who has spent many long and often frustrating hours installing, configuring, and managing monitoring systems for distributed platforms, Istio’s observability features are most welcome. Istio provides Prometheus, Grafana, ZipkinService Graph, and Zipkin-to-Stackdriver add-ons. Combined with the monitoring capabilities of Backend-as-a-Service providers, such as MongoDB Altas and CloudAMQP RabvbitMQ, you get considerable visibility into your application, out-of-the-box.

Prometheus
First, we will look at Prometheus, a leading open-source monitoring solution. The easiest way to access the Prometheus UI, or any of the other add-ons, including Prometheus, is using port-forwarding. For example with Prometheus, we use the following command.

Alternatively, you could securely expose any of the Istio add-ons through the Istio Ingress, similar to how the Voter API microservice endpoints are exposed.

Prometheus collects time series metrics from both the Istio and Voter API components. Below we see two examples of typical metrics being collected; they include the 201 responses generated by the Candidate microservice, and the outflow of bytes by the Election microservice, over a given period of time.

GKE_022

GKE_022_1

Grafana
Although Prometheus is an excellent monitoring solution, Grafana, the leading open source software for time series analytics, provides a much easier way to visualize the metrics collected by Prometheus. Conveniently, Istio provides a dynamically-configured Grafana Dashboard, which will automatically display metrics for components deployed to GKE.

GKE_020B.PNG

Below, note the metrics collected for the Candidate and Election microservice replicas. Out-of-the-box, Grafana displays common HTTP KPIs, such as request rate, success rate, response codes, response time, and response size. Based on the version label included in the Deployment resource files, we can delineate metrics collected by the version of the Voter API microservices, in this case, v1 of the Candidate and Election microservices.

GKE_021B

Zipkin
Next, we have Zipkin, a leading distributed tracing system.

GKE_018

Since the Voter API application uses RabbitMQ to decouple communications between services, versus direct HTTP-based IPC, we won’t see any complex multi-segment traces. We will only see traces representing traffic to and from the microservices, which passes through the Istio Ingress.

GKE_019

Service Graph
Similar to Zipkin, Service Graph is not as valuable with the Voter API application as it could be with more complex applications. Below is a Service Graph view of the Voter API showing microservice version and requests/second to each microservice.

GKE_024

Stackdriver

One last tool we have to monitor our GKE cluster is Stackdriver. Stackdriver provides fine-grain monitoring, logging, and diagnostics. If you recall, we enabled Stackdriver logging and monitoring when we first provisioned the GKE cluster. Stackdrive allows us to examine the performance of the GKE cluster’s resources, review logs, and set alerts.

GKE_028

GKE_029

GKE_027

Zipkin-to-Stackdriver

When we installed Istio, we also installed the Zipkin-to-Stackdriver add-on. The Stackdriver Trace Zipkin Collector is a drop-in replacement for the standard Zipkin HTTP collector that writes to Google’s free Stackdriver Trace distributed tracing service. To use Stackdriver for traces originating from Zipkin, there is additional configuration required, which is commented out of the current version of the zipkin-to-stackdriver.yaml file.

Instructions to configure the Zipkin-to-Stackdriver feature can be found here. Below is an example of how you might add the necessary configuration using a Kubernetes ConfigMap to inject the required user credentials JSON file (zipkin-to-stackdriver-creds.json) into the zipkin-to-stackdriver container. The new configuration can be seen on lines 27-44.

Conclusion

Istio provides a significant amount of fine-grained management control to Kubernetes. Managed Kubernetes CaaS offerings like GKE, coupled with tools like Istio, will soon make running reliable and secure containerized applications in Production, commonplace.

References

All opinions in this post are my own, and not necessarily the views of my current or past employers or their clients.

, , , , , , , , , , ,

Leave a comment

Automate the Provisioning and Configuration of HAProxy and an Apache Web Server Cluster Using Foreman

Use Vagrant, Foreman, and Puppet to provision and configure HAProxy as a reverse proxy, load-balancer for a cluster of Apache web servers.

Simple Load Balanced 2

Introduction

In this post, we will use several technologies, including VagrantForeman, and Puppet, to provision and configure a basic load-balanced web server environment. In this environment, a single node with HAProxy will act as a reverse proxy and load-balancer for two identical Apache web server nodes. All three nodes will be provisioned and bootstrapped using Vagrant, from a Linux CentOS 6.5 Vagrant Box. Afterwards, Foreman, with Puppet, will then be used to install and configure the nodes with HAProxy and Apache, using a series of Puppet modules.

For this post, I will assume you already have running instances of Vagrant with the vagrant-hostmanager plugin, VirtualBox, and Foreman. If you are unfamiliar with Vagrant, the vagrant-hostmanager plugin, VirtualBox, Foreman, or Puppet, review my recent post, Installing Foreman and Puppet Agent on Multiple VMs Using Vagrant and VirtualBox. This post demonstrates how to install and configure Foreman. In addition, the post also demonstrates how to provision and bootstrap virtual machines using Vagrant and VirtualBox. Basically, we will be repeating many of this same steps in this post, with the addition of HAProxy, Apache, and some custom configuration Puppet modules.

All code for this post is available on GitHub. However, it been updated as of 8/23/2015. Changes were required to fix compatibility issues with the latest versions of Puppet 4.x and Foreman. Additionally, the version of CentOS on all VMs was updated from 6.6 to 7.1 and the version of Foreman was updated from 1.7 to 1.9.

Steps

Here is a high-level overview of our steps in this post:

  1. Provision and configure the three CentOS-based virtual machines (‘nodes’) using Vagrant and VirtualBox
  2. Install the HAProxy and Apache Puppet modules, from Puppet Forge, onto the Foreman server
  3. Install the custom HAProxy and Apache Puppet configuration modules, from GitHub, onto the Foreman server
  4. Import the four new module’s classes to Foreman’s Puppet class library
  5. Add the three new virtual machines (‘hosts’) to Foreman
  6. Configure the new hosts in Foreman, assigning the appropriate Puppet classes
  7. Apply the Foreman Puppet configurations to the new hosts
  8. Test HAProxy is working as a reverse and proxy load-balancer for the two Apache web server nodes

In this post, I will use the terms ‘virtual machine’, ‘machine’, ‘node’, ‘agent node’, and ‘host’, interchangeable, based on each software’s own nomenclature.

Provisioning

First, using the process described in the previous post, provision and bootstrap the three new virtual machines. The new machine’s Vagrant configuration is shown below. This should be added to the JSON configuration file. All code for the earlier post is available on GitHub.

{
  "nodes": {
    "haproxy.example.com": {
      ":ip": "192.168.35.101",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node01.example.com": {
      ":ip": "192.168.35.121",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    },
    "node02.example.com": {
      ":ip": "192.168.35.122",
      "ports": [],
      ":memory": 512,
      ":bootstrap": "bootstrap-node.sh"
    }
  }
}

After provisioning and bootstrapping, observe the three machines running in Oracle’s VM VirtualBox Manager.

Oracle VM VirtualBox Manager View of New Nodes

Oracle VM VirtualBox Manager View of New Nodes

Installing Puppet Forge Modules

The next task is to install the HAProxy and Apache Puppet modules on the Foreman server. This allows Foreman to have access to them. I chose the puppetlabs-haproxy HAProxy module and the puppetlabs-apache Apache modules. Both modules were authored by Puppet Labs, and are available on Puppet Forge.

The exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory.

sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-haproxy
sudo puppet module install -i /etc/puppet/environments/production/modules puppetlabs-apache

# confirm module installation
puppet module list --modulepath /etc/puppet/environments/production/modules

Installing Configuration Modules

Next, install the HAProxy and Apache configuration Puppet modules on the Foreman server. Both modules are hosted on my GitHub repository. Both modules can be downloaded directly from GitHub and installed on the Foreman server, from the command line. Again, the exact commands to install the modules onto your Foreman server will depend on your Foreman environment configuration. In my case, I used the following two commands to install the two Puppet Forge modules into my ‘Production’ environment’s module directory. Also, notice I am currently downloading version 0.1.0 of both modules at the time of writing this post. Make sure to double-check for the latest versions of both modules before running the commands. Modify the commands if necessary.

# apache config module
wget -N https://github.com/garystafford/garystafford-apache_example_config/archive/v0.1.0.tar.gz && \
sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force

# haproxy config module
wget -N https://github.com/garystafford/garystafford-haproxy_node_config/archive/v0.1.0.tar.gz && \
sudo puppet module install -i /etc/puppet/environments/production/modules ~/v0.1.0.tar.gz --force

# confirm module installation
puppet module list --modulepath /etc/puppet/environments/production/modules
GitHub Repository for Apache Config Example

GitHub Repository for Apache Config Example

HAProxy Configuration
The HAProxy configuration module configures HAProxy’s /etc/haproxy/haproxy.cfg file. The single class in the module’s init.pp manifest is as follows:

class haproxy_node_config () inherits haproxy {
  haproxy::listen { 'puppet00':
    collect_exported => false,
    ipaddress        => '*',
    ports            => '80',
    mode             => 'http',
    options          => {
      'option'  => ['httplog'],
      'balance' => 'roundrobin',
    },
  }

  Haproxy::Balancermember <<| listening_service == 'puppet00' |>>

  haproxy::balancermember { 'haproxy':
    listening_service => 'puppet00',
    server_names      => ['node01.example.com', 'node02.example.com'],
    ipaddresses       => ['192.168.35.121', '192.168.35.122'],
    ports             => '80',
    options           => 'check',
  }
}

The resulting /etc/haproxy/haproxy.cfg file will have the following configuration added. It defines the two Apache web server node’s hostname, ip addresses, and http port. The configuration also defines the load-balancing method, ‘round-robin‘ in our example. In this example, we are using layer 7 load-balancing (application layer – http), as opposed to layer 4 load-balancing (transport layer – tcp). Either method will work for this example. The Puppet Labs’ HAProxy module’s documentation on Puppet Forge and HAProxy’s own documentation are both excellent starting points to understand how to configure HAProxy. We are barely scraping the surface of HAProxy’s capabilities in this brief example.

listen puppet00
  bind *:80
  mode  http
  balance  roundrobin
  option  httplog
  server node01.example.com 192.168.35.121:80 check
  server node02.example.com 192.168.35.122:80 check

Apache Configuration
The Apache configuration module creates default web page in Apache’s docroot directory, /var/www/html/index.html. The single class in the module’s init.pp manifest is as follows:
ApacheConfigClass
The resulting /var/www/html/index.html file will look like the following. Observe that the facter variables shown in the module manifest above have been replaced by the individual node’s hostname and ip address during application of the configuration by Puppet (ie. ${fqdn} became node01.example.com).

ApacheConfigClass

Both of these Puppet modules were created specifically to configure HAProxy and Apache for this post. Unlike published modules on Puppet Forge, these two modules are very simple, and don’t necessarily represent the best practices and patterns for authoring Puppet Forge modules.

Importing into Foreman

After installing the new modules onto the Foreman server, we need to import them into Foreman. This is accomplished from the ‘Puppet classes’ tab, using the ‘Import from theforeman.example.com’ button. Once imported, the module classes are available to assign to host machines.

Importing Puppet Classes into Foreman

Importing Puppet Classes into Foreman

Add Host to Foreman

Next, add the three new hosts to Foreman. If you have questions on how to add the nodes to Foreman, start Puppet’s Certificate Signing Request (CSR) process on the hosts, signing the certificates, or other first time tasks, refer to the previous post. That post explains this process in detail.

Foreman Hosts Tab Showing New Nodes

Foreman Hosts Tab Showing New Nodes

Configure the Hosts

Next, configure the HAProxy and Apache nodes with the necessary Puppet classes. In addition to the base module classes and configuration classes, I recommend adding git and ntp modules to each of the new nodes. These modules were explained in the previous post. Refer to the screen-grabs below for correct module classes to add, specific to HAProxy and Apache.

HAProxy Node Puppet Classes Tab

HAProxy Node Puppet Classes Tab

Apache Nodes Puppet Classes Tab

Apache Nodes Puppet Classes Tab

Agent Configuration and Testing the System

Once configurations are retrieved and applied by Puppet Agent on each node, we can test our reverse proxy load-balanced environment. To start, open a browser and load haproxy.paychex.com. You should see one of the two pages below. Refresh the page a few times. You should observe HAProxy re-directing you to one Apache web server node, and then the other, using HAProxy’s round-robin algorithm. You can differentiate the Apache web servers by the hostname and ip address displayed on the web page.

Load Balancer Directing Traffic to Node01

Load Balancer Directing Traffic to Node01

Load Balancer Directing Traffic to Node02

Load Balancer Directing Traffic to Node02

After hitting HAProxy’s URL several times successfully, view HAProxy’s built-in Statistics Report page at http://haproxy.example.com/haproxy?stats. Note below, each of the two Apache node has been hit 44 times each from HAProxy. This demonstrates the effectiveness of the reverse proxy and load-balancing features of HAProxy.

Statistics Report for HAProxy

Statistics Report for HAProxy

Accessing Apache Directly
If you are testing HAProxy from the same machine on which you created the virtual machines (VirtualBox host), you will likely be able to directly access either of the Apache web servers (ei. node02.example.com). The VirtualBox host file contains the ip addresses and hostnames of all three hosts. This DNS configuration was done automatically by the vagrant-hostmanager plugin. However, in an actual Production environment, only the HAProxy server’s hostname and ip address would be publicly accessible to a user. The two Apache nodes would sit behind a firewall, accessible only by the HAProxy server. HAProxy acts as a façade to public side of the network.

Testing Apache Host Failure
The main reason you would likely use a load-balancer is high-availability. With HAProxy acting as a load-balancer, we should be able to impair one of the two Apache nodes, without noticeable disruption. HAProxy will continue to serve content from the remaining Apache web server node.

Log into node01.example.com, using the following command, vagrant ssh node01.example.com. To simulate an impairment on ‘node01’, run the following command to stop Apache, sudo service httpd stop. Now, refresh the haproxy.example.com URL in your web browser. You should notice HAProxy is now redirecting all traffic to node02.example.com.

Troubleshooting

While troubleshooting HAProxy configuration issues for this demonstration, I discovered logging is not configured by default on CentOS. No worries, I recommend HAProxy: Give me some logs on CentOS 6.5!, by Stephane Combaudon, to get logging running. Once logging is active, you can more easily troubleshoot HAProxy and Apache configuration issues. Here are some example commands you might find useful:

# haproxy
sudo more -f /var/log/haproxy.log
sudo haproxy -f /etc/haproxy/haproxy.cfg -c # check/validate config file

# apache
sudo ls -1 /etc/httpd/logs/
sudo tail -50 /etc/httpd/logs/error_log
sudo less /etc/httpd/logs/access_log

Redundant Proxies

In this simple example, the system’s weakest point is obviously the single HAProxy instance. It represents a single-point-of-failure (SPOF) in our environment. In an actual production environment, you would likely have more than one instance of HAProxy. They may both be in a load-balanced pool, or one active and on standby as a failover, should one instance become impaired. There are several techniques for building in proxy redundancy, often with the use of Virtual IP and Keepalived. Below is a list of articles that might help you take this post’s example to the next level.

, , , , , , , , , , , , ,

Leave a comment

Configure Chef Client on Windows for a Proxy Server

Configure Chef Client on Windows to work with a proxy server, by modifying Chef Knife’s configuration file.

Introduction

In my last two post, Configure Git for Windows and Vagrant on a Corporate Network and Easy Configuration of Git for Windows on a Corporate Network, I demonstrated how to configure Git for Windows and Vagrant to work properly on a corporate network with a proxy server. Modifying the .bashrc file and adding a few proxy-related environment variables worked fine for Git and Vagrant.

However, even though Chef Client also uses the Git Bash interactive shell to execute commands on Windows using Knife, Chef depends on Knife’s configuration file (knife.rb) for proxy settings.  In the following example, Git and Vagrant connect to the proxy server and authenticate using the proxy-related environment variables created by the ‘proxy_on’ function (described in my last post). However, Chef’s Knife command line tool fails to return the status of the online Hosted Chef server account, because the default knife.rb file contains no proxy server settings.

Knife Status Failed Due to Knife.rb Proxy Settings

Knife Status Failed Due to Knife’s Proxy Settings

For Chef to work correctly behind a proxy server, you must modify the knife.rb file, adding the necessary proxy-related settings. The good news, we can leverage the same proxy-related environment variables we already created for Git and Vagrant.

Configuring Chef Client

First, make sure you have your knife.rb file in the .chef folder, within your home directory (C:\Users\username\.chef\knife.rb’). This allows Chef to use the knife.rb file’s settings for all Chef repos on your local machine.

Chef's knife.rb Location in Home Directory

Chef’s knife.rb Location in Home Directory

Next, make sure you have the following environment variables set up on your computer: USERNAME, USERDNSDOMAIN, PASSWORD, PROXY_SERVER, and PROXY_PORT. The USERNAME and USERDNSDOMAIN should already present in the system wide environment variables on Windows. If you haven’t created the PASSWORD, PROXY_SERVER, PROXY_PORT environment variables already, based on my last post, I suggest adding them to the current user environment ( Environment Variables -> User variables, shown below) as opposed to the system wide environment (Environment Variables -> System variables). You can add the User variables manually, using Windows+Pause Keys -> Advanced system settings ->Environment Variables… -> New…

Windows 7 Environment Variables - Current User vs. System

Windows 7 Environment Variables – Current User vs. System

Alternately, you can use the ‘SETX‘ command. See commands below. When using ‘SETX’, do not use the ‘/m’ parameter, especially when setting the PASSWORD variable. According to SETX help (‘SETX /?’),  the ‘/m’ parameter specifies that the variable is set in the system wide (HKEY_LOCAL_MACHINE) environment. The default is to set the variable under the HKEY_CURRENT_USER environment (no ‘/m’). If you set your PASSWORD in the system wide environment, all user accounts on your machine could get your PASSWORD.

To see your changes with SETX, close and re-open your current command prompt window. Then, use a ‘env | grep -e PASSWORD -e PROXY’ command to view the three new environment variables.

[gist https://gist.github.com/garystafford/8233123 /]

Lastly, modify your existing knife.rb file, adding the required proxy-related settings, shown below. Notice, we use the ‘HTTP_PROXY’ and ‘HTTPS_PROXY’ environment variables set by ‘proxy_on’; no need to redefine them. Since my particular network environment requires proxy authentication, I have also included the ‘http_proxy_user’, ‘http_proxy_pass’, ‘https_proxy_user’, and ‘https_proxy_pass’ settings.

[gist https://gist.github.com/garystafford/8222755 /]

If your environment requires authentication and you fail to set these variables, you will see an error similar to the one shown below. Note the first line of the error. In this example, Chef cannot authenticate against the https proxy server. There is a ‘https_proxy’ setting, but no ‘https_proxy_user’ and ‘https_proxy_pass’ settings in the Knife configuration file.

HTTPS Proxy Authentication Credential Settings Missing from knife.rb

HTTPS Proxy Authentication Credential Settings Missing from knife.rb

Using the Code

Adding the proxy settings to the knife.rb file, Knife is able connect to the proxy server, authenticate, and complete its status check successfully. Now, Git, Vagrant, and Chef all have Internet connectivity through the proxy server, as shown below.

Knife Status Succeeds with Knife.rb Proxy Settings Added

Knife Status Succeeds with Knife.rb Proxy Settings Added

Why Include Authentication Settings?

Even with the domain, username and password, all included in the HTTP_PROXY and HTTPS_PROXY URIs, Chef still insists on using the ‘http_proxy_user’ and ‘http_proxy_pass’ or ‘https_proxy_user’ and ‘https_proxy_pass’ credential settings for proxy authentication. In my tests, if these settings are missing from Knife’s configuration file, Chef fails to authenticate with the proxy server.

, , , , , , , , , , , , , , , ,

Leave a comment

Configure Git for Windows and Vagrant on a Corporate Network

Modified bashrc configuration for Git for Windows to work with both Git and Vagrant.

Basic Network

Introduction

In my last post, Easy Configuration of Git for Windows on a Corporate Network, I demonstrated how to configure Git for Windows to work when switching between working on-site, working off-site through a VPN, and working totally off the corporate network. Dealing with a proxy server was the main concern. The solution worked fine for Git. However, after further testing with Vagrant using the Git Bash interactive shell, I ran into a snag. Unlike Git, Vagrant did not seem to like the standard URI, which contained ‘domain\username’:

http(s)://domain\username:password@proxy_server:proxy_port

In a corporate environment with LDAP, qualifying the username with a domain is normal, like ‘domain\username’. But, when trying to install a Vagrant plug-in with a command such as ‘vagrant plugin install vagrant-omnibus’, I received an error similar to the following (proxy details obscured):

$ vagrant plugin install vagrant-omnibus
Installing the 'vagrant-omnibus' plugin. This can take a few minutes...
c:/HashiCorp/Vagrant/embedded/lib/ruby/2.0.0/uri/common.rb:176: in `split':
bad URI(is not URI?): http://domain\username:password@proxy:port
(URI::InvalidURIError)...

Solution

After some research, it seems Vagrant’s ‘common.rb’ URI function does not like the ‘domain\username’ format of the original URI. To fix this problem, I modified the original ‘proxy_on’ function, removing the DOMAIN environment variable. I now suggest using the fully qualified domain name (FQDN) of the proxy server. So, instead of ‘my_proxy’, it would be ‘my_proxy.domain.tld’. The acronym ‘tld’ stands for the top-level domain (tld). Although .com is the most common one, there are over 300 top-level domains, so I don’t want assume yours is ‘.com’. The new proxy URI is as follows:

http(s)://username:password@proxy_server.domain.tld:proxy_port

Although all environments have different characteristics, I have found this change to work, with both Git and Vagrant, in my own environment. After making this change, I was able to install plug-ins and do other similar functions with Vagrant, using the Git Bash interactive shell.

$ vagrant plugin install vagrant-omnibus
Installing the 'vagrant-omnibus' plugin. This can take a few minutes...
Installed the plugin 'vagrant-omnibus (1.2.1)'!

Change to Environment Variables

One change you will notice compared to my last post, and unrelated to the Vagrant domain issue, is a change to PASSWORD, PROXY_SERVER, and PROXY_PORT environment variables. In the last post, I created and exported the PASSWORD, PROXY_SERVER, and PROXY_PORT environment variables within the ‘proxy_on’ function. After further consideration, I permanently moved them to Environment Variables -> User variables. I felt this was a better solution, especially for my password. Instead of my user’s account password residing in the .bashrc file, in plain text, it’s now in my user’s environment variables. Although still not ideal, I felt my password was slightly more secure. Also, since my proxy server address rarely change when I am at work or on the VPN, I felt moving these was easier and cleaner than placing them into the .bashrc file.

The New Code

Verbose version:

Compact version:

, , , , , , , , , , , , , ,

Leave a comment

Easy Configuration of Git for Windows on a Corporate Network

Configure Git for Windows to work when switching between working on-site, working off-site through a VPN, and working totally off the corporate network.

Basic Network

Introduction

Configuring Git to work on your corporate network can be challenging. A typical large corporate network may require Git to work behind proxy servers and firewalls, use LDAP authentication on a corporate domain, handle password expiration, deal with self-signed and internal CA certificates, and so forth. Telecommuters have the added burden of constantly switching device configurations between working on-site, working off-site through a VPN, and working totally off the corporate network at home or the local coffee shop.

There are dozens of posts on the Internet from users trying to configure Git for Windows to work on their corporate network. Many posts are oriented toward Git on Unix-based systems. Many responses only offer partial solutions without any explanation. Some responses incorrectly mix configurations for Unix-based systems with those for Windows.

Most solutions involve one of two approaches to handle proxy servers, authentication, and so forth. They are, modify Git’s .gitconfig file or set equivalent environment variables that Git will look for automatically. In my particular development situation, I spend equal amounts of time on and off a corporate network, on a Windows-based laptop. If I were always on-site, I would modify the .gitconfig file. However, since I am constantly moving on and off the network with a laptop, I chose a solution to create and destroy the environment variables, as I move on and off the corporate network.

Git for Windows

Whether you download Git from the Git website or the msysGit website, you will get the msysGit version of Git for Windows. As explained on the msysGit Wiki, msysGit is the build environment for Git for Windows. MSYS (thus the name, msysGit), is a Bourne Shell command line interpreter system, used by MinGW and originally forked from Cygwin. MinGW is a minimalist development environment for native Microsoft Windows applications.

Why do you care? By installing Git for Windows, you actually get a fairly functional Unix system running on Windows. Many of the commands you use on Unix-based systems also work on Windows, within msysGit’s Git Bash.

Setting Up Code

There are two identical versions of the post’s code, a well-commented version and a compact version.  Add either version’s contents to the .bashrc file in home directory. If you’ve worked with Linux, you are probably familiar with the .bashrc file and it’s functionality. On Unix-based systems, your home directory is ‘~/’ (/home/username), while on Windows, the equivalent directory path is ‘C:\Users\username\’.

On Windows, the .bashrc file is not created by default by Git for Windows. If you do not have a .bashrc file already, the easiest way to implement the post’s code is to download either Gist, shown below, from GitHub, rename it to .bashrc, and place it in your home directory.

After adding the code, change the PASSWORD, PROXY_SERVER, and PROXY_PORT environment variable values to match your network. Security note, this solution requires you to store you Windows user account password in plain text on your local system. This presents a certain level of security risk, as would storing it in your .gitconfig file.

The script assumes the same proxy server address for all protocols – HTTP, HTTPS, FTP, and SOCKS. If any of the proxy servers or ports are different, simply change the script’s variables.  You may also choose to add other variables and protocols, or remove them, based on your network requirements. Remember, environment variables on Windows are UPPERCASE. Even when using the interactive Git Bash shell, environment variables need to be UPPERCASED.

Lastly, as with most shells, you must exit any current interactive Git Bash shells and re-open a new interactive shell for the new functions in the .bashrc file to be available.

Verbose version:
[gist https://gist.github.com/garystafford/8128922 /]

Compact version:
[gist https://gist.github.com/garystafford/8135027 /]

Using the Code

When on-site and connected to your corporate network, or off-site and connected through a VPN, execute the ‘proxy_on’ function. When off your corporate network, execute the ‘proxy_off’ function.

Below, are a few examples of using Git to clone the popular angular.js repo from github.com (git clone https://github.com/angular/angular.js). The first example shows what happens on the corporate network when Git for Windows is not configured to work with the proxy server.

Failing to Clone with Proxy Settings Off

Failing to Clone GitHub Repo with Proxy Settings Off

The next example demonstrate successfully cloning the angular.js repo from github.com, while on the corporate network. The environment variables are set with the ‘proxy_on’ function. I have obscured the variable’s values and most of the verbose output from Git to hide confidential network-related details.

Successful Git Clone with Proxy Settings On

Successful Git Clone with Proxy Settings On

What’s My Proxy Server Address?

To setup the ‘proxy_on’ function, you need to know your proxy server’s address. One way to find this, is Control Panels -> Internet Options -> Connections -> LAN Settings. If your network requires a proxy server, it should be configured here.

LAN Settings - Proxy Server

LAN Settings – Proxy Server

However, on many corporate networks, Windows devices are configured to use a proxy auto-config (PAC) file. According to Wikipedia, a PAC file defines how web browsers and other user agents can automatically choose a network’s appropriate proxy server. The downside of a PAC file is that you cannot easily figure out what proxy server you are connected to.

LAN Settings - Using PAC file

LAN Settings – Using PAC file

To discover your proxy server with a PAC file, open a Windows command prompt and execute the following command. Use the command’s output to populate the script’s PROXY_SERVER and PORT variables.

reg query “HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings” | find /i “proxyserver”

Checking Your Proxy from the Command Prompt

Checking Your Proxy from the Command Prompt

Resources

Arch Linux Wiki – Proxy Settings

Tips on Git

git(1) Manual Page

Customizing Git – Git Configuration

msysGit Wiki – Git on Windows

UNIX: Set Environment Variable

, , , , , , , , , ,

2 Comments