Posts Tagged VPC
Architecting a Successful SaaS: Interacting with your Customer’s Cloud Accounts
Posted by Gary A. Stafford in AWS, Cloud, Technology Consulting on June 2, 2020
Originally published on the AWS APN Blog.
Introduction
You’re a software vendor selling a SaaS product hosted on the cloud. As part of your solution, the SaaS product needs to interact with your customer’s cloud-based resources—their data, their applications, or perhaps their compute resources. As a vendor, you must architect your SaaS product, and its capability to interact with your customer’s cloud-based resources to be scalable, reliable, performant, cost optimized, and most importantly, secure. In this post, we will explore several common AWS services and architectural patterns used by SaaS vendors to interact with their customer’s AWS accounts. Although multi-cloud and hybrid cloud SaaS interactions are also prevalent in many SaaS solutions, for brevity we will limit ourselves to interactions across accounts within AWS.
Reasons for Interaction
There are many reasons SaaS products need to interact with a customer’s AWS Accounts. Examples of SaaS products that require some level of account interaction often fall into the categories of logging and monitoring, security, compliance, data analytics, DevOps, workflow management, and resource optimization. SaaS products, such as the ones in these categories, regularly interact with resources in the subscribing customer’s AWS Account. Examples of interactions include the following.
- Data Analytics: A product that connects to private data sources within their customer’s cloud accounts;
- Logging and Monitoring: A product that continuously aggregates, analyzes and visualizes logs from their customer’s cloud accounts;
- Security: A product that uses ML and AI to continuously scan their customer’s cloud accounts for suspicious activities;
- Workflow Management: A product that remotely manages resources in their customer’s cloud account to achieve cost optimization;
- Security: A product that intercepts and inspects traffic, in real-time, to and from their customer’s cloud-based applications, detecting fraud and other malicious activities;
- DevOps: A product that creates resources and deploys assets to their customer’s cloud accounts;
- Compliance: A product that continuously audits their customer’s cloud account activities for regulatory compliance and corporate governance violations;
Scaling Interactions
As individuals and businesses, we often interact and share resources between our own AWS Accounts. We can delegate access across our accounts using AWS Single Sign-On (SSO) or IAM Roles, referred to as cross-account access. A typical example is allowing a Developer, an IAM User in the Development account, limited access to the Production account, to troubleshoot a customer issue. We can implement cross-account VPC Sharing within our AWS Organization or VPC Peering to route traffic, privately, between two or more accounts. These methods are used when applications, deployed to different AWS Accounts, such as accounting and shipping, need to communicate with each other.
Below, we see a typical example of an Enterprise AWS account structure, modeled around the Software Development Life Cycle (SDLC) and using good DevOps practices.
Cloud services and architectural patterns that work for an individual customer of the cloud, may not scale for a SaaS vendor. Implementing a scalable and reliable architecture to interact with tens, hundreds, or thousands of a SaaS vendor’s customer’s AWS Accounts, can be vastly more complex and challenging than across a single organization’s accounts, even a large organization. Similarly, losing or exposing data can be devastating to an organization. Losing or exposing hundreds or thousands of SaaS customer’s data and potentially their own customer’s data is likely fatal to a SaaS vendor’s reputation and their business.
Below, we see a typical example of a SaaS vendor’s AWS account structure, modeled around a multi-tier pricing and support strategy. Free Trial and Business customers are pooled together using a multi-tenant architecture, while enterprise customers are isolated using a single-tenant architecture.
Tenancy Model
The tenant isolation strategy of a SaaS vendor can have a direct impact on the methods the vendor implements to integrate with their customer’s accounts. For those unfamiliar with the terms Tenancy Model or Tenant Isolation, think housing. A tenant (a SaaS vendor’s customer) might reside in a single-family house, referred to as a single-tenant architecture. The tenant is isolated or siloed from their neighbors (other SaaS customers). Alternately, a tenant might reside in an apartment building, living adjacent to other tenants, referred to as multi-tenant architecture. Tenants (customers) share the same structure with limited isolation compared to single-tenant architecture. Multiple customer resources are pooled. To learn more about tenancy, I recommend the presentation, AWS re:Invent 2019: SaaS tenant isolation patterns, available on YouTube, by Tod Golding of AWS.
An application’s tenancy model is seldom binary. The degree to which SaaS customer’s resources are isolated or conversely, pooled, varies based on a vendor’s architecture and is influenced by the underlying technologies used by the vendor. Different tiers of the application—front-end, back-end, data—may differ in their degree and means of tenant isolation. For example, on AWS, isolation may be achieved at different levels of the network. A SaaS vendor may assign each customer to their own AWS Account, providing a very high level of isolation. Alternatively, a vendor might choose to use a single account and separate customers within their own VPC. Customers may be housed within the same VPC, but separated by compute instance or other software constructs.
Additionally, many SaaS vendors have more than one tenancy model. Frequently, a SaaS vendor will offer a multi-tier pricing structure. For example, a Business Tier, which is based on a shared, multi-tenant architecture, and an Enterprise Tier, which is based on an isolated, single-tenant architecture. Vendors may also offer a free tier or trial offer, which is almost always multi-tenant.
The cloud services and architectural patterns used by vendors to integrate with customer accounts, each have their own advantages and disadvantages. Some AWS services and architectural patterns may be inherently better at scaling to meet a SaaS vendor’s large growing customer base. They are well-suited for multi-tenant architecture. Other services and architectural patterns may provide a high level of interaction but lack scalability. They may be better suited for a single-tenant architecture.
AWS Services
There are several AWS services capable of enabling integrations between a vendor’s SaaS product and their customer’s resources, within their AWS Accounts. Some AWS services were even designed with SaaS in mind, including AWS PrivateLink and Amazon EventBridge. Services are often combined in common architectural patterns. Here are eight examples of AWS services capable of enabling integrations between a vendor’s SaaS product and their customer’s resources, within their AWS Accounts.
Cross-Account IAM Roles
According to AWS, customers can use an IAM Role to delegate access to resources that are in different AWS accounts, often referred to as a cross-account role. With IAM roles, you can establish trust relationships between your trusting account and other AWS trusted accounts. By setting up cross-account access in this way, you do not need to create individual IAM users in each account. With IAM roles, you can establish trust relationships between your trusting account and other AWS trusted accounts.
External IDs
Using External IDs are a good way to improve the security of cross-account role handling in a SaaS solution, and should be used by vendors who are implementing a SaaS product that uses cross-account roles. Choosing this option restricts access to the role only through the AWS CLI, Tools for Windows PowerShell, or the AWS API.
Using cross-account roles with external IDs, a SaaS customer allows their SaaS vendor programmatic access to their AWS account. The cross-account role’s policies tightly control the permissions of the vendor (trusted account) and limit access within the customer’s account (trusting account). Vendors use cross-account access for many activities, including resource optimization, DevOps functions, patching and upgrades, and managing their agents.
AWS Transfer for SFTP
According to AWS, AWS Transfer for SFTP is a fully managed service that enables the transfer of files directly into and out of Amazon S3 using the Secure File Transfer Protocol (SFTP)—also known as Secure Shell (SSH) File Transfer Protocol. Announced in November 2018, AWS SFTP is fully compatible with the SFTP standard, connects to your Active Directory, LDAP, and other identity systems, and works with Route 53 DNS routing.
Using S3 to transfer objects (files) between a SaaS vendor’s account and their customer’s account is very common. There are multiple AWS services, including AWS Transfer for SFTP, which can be used to interact with a customer’s account using S3. A vendor may choose to pull from or push to the customer’s S3 bucket using one of these services. Alternately, a customer might push to or pull from a vendor’s S3 bucket, as shown below.
To use AWS Transfer for SFTP, you set up users and assign them each an IAM role. This role determines the level of access that they have to your S3 bucket. When you create an SFTP user, you make several decisions about user access. These include which Amazon S3 buckets the user can access, what portions of each S3 bucket are accessible, and what privileges the user has, for example, PUT
or GET
.
Below, we see an example of the SaaS vendor exposing a secure FTP site, backed by S3. The customers use SFTP to exchange data between their account and their SaaS application. File transfer may be managed by the customer or by a vendor agent, deployed into the customer’s account.
In late April 2020, AWS announced AWS Transfer Family, which is the aggregated name of AWS Transfer for SFTP, FTPS, and FTP. In addition to Secure File Transfer Protocol (SFTP), announce the expansion of the service to add support for File Transfer Protocol over SSL (FTPS) and File Transfer Protocol (FTP).
Amazon S3 Access Points
According to AWS, Amazon S3 Access Points, a feature of S3, simplifies managing data access at scale for applications using shared data sets on S3. Announced in December 2019, Access points are unique hostnames that customers create to enforce distinct permissions and network controls for any request made through the access point. Amazon S3 Access Points can easily scale access for hundreds of applications by creating individualized access points with names and permissions customized for each application.
Access Points support AWS Identity and Access Management (IAM) resource policies that allow you to control the use of the access point by resource, user, or other conditions. For an application or user to be able to access objects through an access point, both the access point and the underlying bucket must permit the request. Using an Access Point Policy, you can enable another account to access objects within an S3 bucket. The access point policy gives you fine-grain control over which users and which permissions, such as to GET
and PUT
objects, are allowed.
Below, we see an example of the SaaS vendor exposing an S3 Access Point. The customers use the supplied access points to exchange data between their account and their SaaS application. The interchange of data may be managed by the customer or by a vendor agent, deployed into the customer’s account.
Amazon API Gateway
According to AWS, Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the front door for applications to access data, business logic, or functionality from backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications.
Announced in July 2015, Amazon API Gateway and the accompanying APIs serve as a scalable, reliable, performant, and secure connections to backend resources in either the customer or SaaS vendor’s account. API Gateway backend resources include Lambda functions, HTTP endpoints, VPC Links, and other AWS services. Those AWS services include Amazon SQS and Kinesis. These services are often used to exchange data between a SaaS vendor’s account and their customer’s accounts. Eric Johnson of AWS gave a great presentation on API Gateway, available on YouTube, AWS re:Invent 2019: I didn’t know Amazon API Gateway did that.
Several security features can be used to secure communications between the vendor and the customer accounts when using the API Gateway. Security features include encrypting transmitted data using SSL/TLS Certificates (HTTPS), AWS WAF (Web Application Firewall) integration, Client Certificates to ensure HTTP requests to your back-end services are originating from the API Gateway, Authorization using Amazon Cognito User Pools or a Lambda function, API Keys, and VPC Links.
Below, we see an example of the SaaS vendor exposing an API, backed by an SQS queue. The customers use the API to programmatically exchange data between their account and their SaaS application. The interchange of data may be managed by the customer or by a vendor agent, deployed into the customer’s account.
VPC Peering
According to AWS, Amazon Virtual Private Cloud (Amazon VPC) enables customers to launch AWS resources into a virtual network that they have defined. A VPC peering connection is a networking connection between two VPCs that allows you to route traffic between them using private IPv4 addresses or IPv6 addresses. You can create a VPC peering connection between your VPCs, or with a VPC in another AWS account. As announced in November 2018, VPCs can be in different regions (known as inter-region VPC peering).
Below, we see an example of the SaaS vendor and the customer using VPC peering to enable direct interaction between a customer’s application, within their account, and their SaaS application, in the vendor’s account.
One of the limitations of VPC peering, which might impact its ability to scale for SaaS vendors using multi-tenant architectures, is overlapping CIDR blocks. According to AWS, you cannot create a VPC peering connection between VPCs with matching or overlapping IPv4 CIDR blocks. Given tens or hundreds of customers, VPC peering presents challenges in managing individual CIDR blocks when using a multi-tenant SaaS architectural approach.
VPC Endpoints
According to AWS, a VPC endpoint enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink without requiring an Internet Gateway, NAT device, VPN connection, or AWS Direct Connect connection. Instances in your VPC do not require public IP addresses to communicate with resources in the service. Traffic between your VPC and the other service does not leave the Amazon network.
There are two types of VPC endpoints. An interface endpoint, Powered by AWS PrivateLink, is an elastic network interface (ENI) with a private IP address from the IP address range of your subnet that serves as an entry point for traffic destined to a supported service. Interface endpoints support a large and growing list of AWS services. The second type of VPC endpoint, a gateway endpoint, is a gateway that you specify as a target for a route in your route table for traffic destined to a supported AWS service. Both Amazon S3 and Amazon DynamoDB are currently supported by gateway endpoints.
AWS PrivateLink
According to AWS, AWS PrivateLink simplifies the security of data shared with cloud-based applications by eliminating the exposure of data to the public Internet. AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises applications, securely on the AWS network. AWS PrivateLink makes it easy to connect services across different accounts and VPCs to simplify the network architecture significantly. James Devine of AWS delivered a great presentation on PrivateLink, available on YouTube, AWS re:Invent 2018: Best Practices for AWS PrivateLink.
AWS Service Endpoints
AWS Service Endpoints will direct the traffic to AWS services over AWS PrivateLink while keeping the network traffic within the AWS network. AWS PrivateLink enables SaaS providers to offer services that will look and feel like they are hosted directly on a private network. These services are securely accessible both from the cloud and from on-premises via AWS Direct Connect and AWS VPN, in a highly available and scalable manner.
Below, we see an example of routing API calls from the customer’s account to the SaaS vendor’s account, using an interface VPC endpoint (interface endpoint) powered by AWS PrivateLink. The customer’s requests are sent through ENIs in each private subnet to a Network Load Balancer (NLB) within the vendor’s account, using PrivateLink. There is no need for an Internet Gateway to route traffic over the public Internet.
Below, we see an example of near real-time routing messages from an application within the customer’s account to an Amazon Kinesis Data Firehose delivery stream in the SaaS vendor’s account, using an interface endpoint powered by AWS PrivateLink. Again, there is no need for an Internet Gateway to route traffic over the public Internet. The interchange of data may be managed by the customer or by a vendor agent, deployed into the customer’s account.
Amazon EventBridge
According to AWS, Amazon EventBridge is a serverless event bus that makes it easy to connect applications using data from your applications, integrated SaaS applications, and AWS services. EventBridge delivers a stream of real-time data from event sources and routes that data to targets like AWS Lambda. You can set up routing rules to determine where to send your data to build application architectures that react in real-time to all of your data sources. EventBridge makes it easy to build event-driven applications because it takes care of event ingestion and delivery, security, authorization, and error handling for you. Mike Deck of AWS gave an excellent presentation on EventBridge, available on YouTube, AWS re:Invent 2019: Building event-driven architectures w/ Amazon EventBridge.
Amazon EventBridge SaaS Integrations
Amazon EventBridge ingests data from supported SaaS partners and routes it to AWS service targets. With EventBridge, you can use data from your SaaS applications to trigger workflows for customer support, business operations, and more. Currently supported SaaS partners include Auth0, Datadog, Epsagon, MongoDB, New Relic, and PagerDuty, amongst others.
Below, we see an example of routing custom events from the SaaS vendor’s account to multiple targets within the customer’s account using Amazon EventBridge. We can also use interface endpoints powered by AWS PrivateLink within both accounts to eliminate the need to send data over the public Internet. Traffic to targets of EventBridge also remains on the AWS global infrastructure.
JDBC and ODBC Database Connections
Lastly, for data sources on AWS, interactions between vendor and customer AWS Accounts can also be made using JDBC (Java Database Connectivity) and ODBC (Open Database Connectivity) database connections. Services such as Amazon Redshift, Amazon Athena, Amazon RDS, and Amazon Aurora can all be accessed using one or both of these two methods. There multiple ways to enhance the security of the connections, including SSL/TLS connections, data encryption at rest and in flight, AWS IAM service-linked roles, VPCs, and VPC peering.
Below, we see an example of the application in the SaaS vendor’s account communicating over JDBC, with a private Amazon RDS instance in the customer’s account. We can communicate between private subnets, within two different accounts using VPC peering. Like PrivateLink, traffic stays on the global AWS backbone, and never traverses the public Internet.
Conclusion
In this post, we examined several AWS services that help a SaaS vendor interact with their customer’s accounts. In future posts, we will take a more in-depth look at the advantages and disadvantages of each service with respect to their scalability and a vendor’s tenant isolation strategy.
This blog represents my own viewpoints and not of my employer, Amazon Web Services.
Docker Enterprise Edition: Multi-Environment, Single Control Plane Architecture for AWS
Posted by Gary A. Stafford in AWS, Cloud, Continuous Delivery, DevOps, Enterprise Software Development, Software Development on September 6, 2017
Designing a successful, cloud-based containerized application platform requires a balance of performance and security with cost, reliability, and manageability. Ensuring that a platform meets all functional and non-functional requirements, while remaining within budget and is easily maintainable, can be challenging.
As Cloud Architect and DevOps Team Lead, I recently participated in the development of two architecturally similar, lightweight, cloud-based containerized application platforms. From the start, both platforms were architected to maximize security and performance, while minimizing cost and operational complexity. The later platform was built on AWS with Docker Enterprise Edition.
Docker Enterprise Edition
Released in March of this year, Docker Enterprise Edition (Docker EE) is a secure, full-featured container-based management platform. There are currently eight versions of Docker EE, available for Windows Server, Azure, AWS, and multiple Linux distros, including RHEL, CentOS, Ubuntu, SUSE, and Oracle.
Docker EE is one of several production-grade container orchestration Platforms as a Service (PaaS). Some of the other container platforms in this category include:
- Google Container Engine (GCE, based on Google’s Kubernetes)
- AWS EC2 Container Service (ECS)
- Microsoft Azure Container Service (ACS)
- Mesosphere Enterprise DC/OS (based on Apache Mesos)
- Red Hat OpenShift (based on Kubernetes)
- Rancher Labs (based on Kubernetes, Cattle, Mesos, or Swarm)
Docker Community Edition (CE), Kubernetes, and Apache Mesos are free and open-source. Some providers, such as Rancher Labs, offer enterprise support for an additional fee. Cloud-based services, such as Red Hat Openshift Online, AWS, GCE, and ACS, charge the typical usage monthly fee. Docker EE, similar to Mesosphere Enterprise DC/OS and Red Hat OpenShift, is priced on a per node/per year annual subscription model.
Docker EE is currently offered in three subscription tiers, including Basic, Standard, and Advanced. Additionally, Docker offers Business Day and Business Critical support. Docker EE’s Advanced Tier adds several significant features, including secure multi-tenancy with node-based isolation, and image security scanning and continuous vulnerability scanning, as part of Docker EE’s Docker Trusted Registry.
Architecting for Affordability and Maintainability
Building an enterprise-scale application platform, using public cloud infrastructure, such as AWS, and a licensed Containers-as-a-Service (CaaS) platform, such as Docker EE, can quickly become complex and costly to build and maintain. Based on current list pricing, the cost of a single Linux node ranges from USD 75 per month for basic support, up to USD 300 per month for Docker Enterprise Edition Advanced with Business Critical support. Although cost is relative to the value generated by the application platform, none the less, architects should always strive to avoid unnecessary complexity and cost.
Reoccurring operational costs, such as licensed software subscriptions, support contracts, and monthly cloud-infrastructure charges, are often overlooked by project teams during the build phase. Accurately forecasting reoccurring costs of a fully functional Production platform, under expected normal load, is essential. Teams often overlook how Docker image registries, databases, data lakes, and data warehouses, quickly swell, inflating monthly cloud-infrastructure charges to maintain the platform. The need to control cloud costs have led to the growth of third-party cloud management solutions, such as CloudCheckr Cloud Management Platform (CMP).
Shared Docker Environment Model
Most software development projects require multiple environments in which to continuously develop, test, demonstrate, stage, and release code. Creating separate environments, replete with their own Docker EE Universal Control Plane (aka Control Plane or UCP), Docker Trusted Registry (DTR), AWS infrastructure, and third-party components, would guarantee a high-level of isolation and performance. However, replicating all elements in each environment would add considerable build and run costs, as well as unnecessary complexity.
On both recent projects, we choose to create a single AWS Virtual Private Cloud (VPC), which contained all of the non-production environments required by our project teams. In parallel, we built an entirely separate Production VPC for the Production environment. I’ve seen this same pattern repeated with Red Hat OpenStack and Microsoft Azure.
Production
Isolating Production from the lower environments is essential to ensure security, and to eliminate non-production traffic from impacting the performance of Production. Corporate compliance and regulatory policies often dictate complete Production isolation. Having separate infrastructure, security appliances, role-based access controls (RBAC), configuration and secret management, and encryption keys and SSL certificates, are all required.
For complete separation of Production, different AWS accounts are frequently used. Separate AWS accounts provide separate billing, usage reporting, and AWS Identity and Access Management (IAM), amongst other advantages.
Performance and Staging
Unlike Production, there are few reasons to completely isolate lower-environments from one another. The exception I’ve encountered is Performance and Staging. These two environments are frequently separated from other environments to ensure the accuracy of performance testing and release staging activities. Performance testing, in particular, can generate enormous load on systems, which if not isolated, will impair adjacent environments, applications, and monitoring systems.
On a few recent projects, to reduce cost and complexity, we repurposed the UAT environment for performance testing, once user-acceptance testing was complete. Performance testing was conducted during off-peak development and testing periods, with access to adjacent environments blocked.
The multi-purpose UAT environment further served as a Staging environment. Applications were deployed and released to the UAT and Performance environments, following a nearly-identical process used for Production. Hotfixes to Production were also tested in this environment.
Example of Shared Environments
To demonstrate how to architect a shared non-production Docker EE environment, which minimizes cost and complexity, let’s examine the example shown below. In the example, built on AWS with Docker EE, there are four typical non-production environments, CI/CD, Development, Test, and UAT, and one Production environment.
In the example, there are two separate VPCs, the Production VPC, and the Non-Production VPC. There is no reason to configure VPC Peering between the two VPCs, as there is no need for direct communication between the two. Within the Non-Production VPC, to the left in the diagram, there is a cluster of three Docker EE UCP Manager EC2 nodes, a cluster of three DTR Worker EC2 nodes, and the four environments, consisting of varying numbers of EC2 Worker nodes. Production, to the right of the diagram, has its own cluster of three UCP Manager EC2 nodes and a cluster of six EC2 Worker nodes.
Single Non-Production UCP
As a primary means of reducing cost and complexity, in the example, a single minimally-sized Docker EE UCP cluster of three Manager nodes orchestrate activities across all four non-production environments. Alternately, you would have to create a UCP cluster for each environment; that means nine more Worker Nodes to configure and maintain.
The UCP users, teams, organizations, access controls, Docker Secrets, overlay networks, and other UCP features, for all non-production environments, are managed through the single Control Plane. All deployments to all the non-production environments, from the CI/CD server, are performed through the single Control Plane. Each UCP Manager node is deployed to a different AWS Availability Zone (AZ) to ensure high-availability.
Shared DTR
As another means of reducing cost and complexity, in the example, a Docker EE DTR cluster of three Worker nodes contain all Docker image repositories. Both the non-production and the Production environments use this DTR as a secure source of all Docker images. Not having to replicate image repositories, access controls, infrastructure, and figuring out how to migrate images between two separate DTR clusters, is a significant time, cost, and complexity savings. Each DTR Worker node is also deployed to a different AZ to ensure high-availability.
Using a shared DTR between non-production and Production is an important security consideration your project team needs to consider. A single DTR, shared between non-production and Production, comes with inherent availability and security risks, which should be understood in advance.
Separate Non-Production Worker Nodes
In the shared non-production environments example, each environment has dedicated AWS EC2 instances configured as Docker EE Worker nodes. The number of Worker nodes is determined by the requirements for each environment, as dictated by the project’s Development, Testing, Security, and DevOps teams. Like the UCP and DTR clusters, each Worker node, within an individual environment, is deployed to a different AZ to ensure high-availability and mimic the Production architecture.
Minimizing the number of Worker nodes in each environment, as well as the type and size of each EC2 node, offers a significant potential cost and administrative savings.
Separate Environment Ingress
In the example, the UCP, DTR, and each of the four environments are accessed through separate URLs, using AWS Hosted Zone CNAME records (subdomains). Encrypted HTTPS traffic is routed through a series of security appliances, depending on traffic type, to individual private AWS Elastic Load Balancers (ELB), one for both UCPs, the DTR, and each of the environments. Each ELB load-balances traffic to the Docker EE nodes associated the specific traffic. All firewalls, ELBs, and the UCP and DTR are secured with a high-grade wildcard SSL certificate.
Separate Data Sources
In the shared non-production environments example, there is one Amazon Relational Database Service (RDS) instance in non-Production and one Production. Both RDS instances are replicated across multiple Availability Zones. Within the single shared non-production RDS instance, there are four separate databases, one per non-production environment. This architecture sacrifices the potential database performance of separate RDS instances for additional cost and complexity.
Maintaining Environment Separation
Node Labels
To obtain sufficient environment separation while using a single UCP, each Docker EE Worker node is tagged with an environment
node label. The node label indicates which environment the Worker node is associated with. For example, in the screenshot below, a Worker node is assigned to the Development environment by tagging it with the key of environment
and the value of dev
.
* The Docker EE screens shown here are from UCP 2.1.5, not the recently released 2.2.x, which has an updated UI appearance.Each service’s Docker Compose file uses deployment placement constraints, which indicate where Docker should or should not deploy services. In the hello-world Docker Compose file example below, the node.labels.environment
constraint is set to the ENVIRONMENT
variable, which is set during container deployment by the CI/CD server. This constraint directs Docker to only deploy the hello-world service to nodes which contain the placement constraint of node.labels.environment
, whose value matches the ENVIRONMENT
variable value.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Hello World Service Stack | |
# DTR_URL: Docker Trusted Registry URL | |
# IMAGE: Docker Image to deply | |
# ENVIRONMENT: Environment to deploy into | |
version: '3.2' | |
services: | |
hello-world: | |
image: ${DTR_URL}/${IMAGE} | |
deploy: | |
placement: | |
constraints: | |
– node.role == worker | |
– node.labels.environment == ${ENVIRONMENT} | |
replicas: 4 | |
update_config: | |
parallelism: 4 | |
delay: 10s | |
restart_policy: | |
condition: any | |
max_attempts: 3 | |
delay: 10s | |
logging: | |
driver: fluentd | |
options: | |
tag: docker.{{.Name}} | |
env: SERVICE_NAME,ENVIRONMENT | |
environment: | |
SERVICE_NAME: hello-world | |
ENVIRONMENT: ${ENVIRONMENT} | |
command: "java \ | |
-Dspring.profiles.active=${ENVIRONMENT} \ | |
-Djava.security.egd=file:/dev/./urandom \ | |
-jar hello-world.jar" |
Deploying from CI/CD Server
The ENVIRONMENT
value is set as an environment variable, which is then used by the CI/CD server, running a docker stack deploy
or a docker service update
command, within a deployment pipeline. Below is an example of how to use the environment variable as part of a Jenkins pipeline as code Jenkinsfile.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env groovy | |
// Deploy Hello World Service Stack | |
node('java') { | |
properties([parameters([ | |
choice(choices: ["ci", "dev", "test", "uat"].join("\n"), | |
description: 'Environment', name: 'ENVIRONMENT') | |
])]) | |
stage('Git Checkout') { | |
dir('service') { | |
git branch: 'master', | |
credentialsId: 'jenkins_github_credentials', | |
url: 'ssh://git@garystafford/hello-world.git' | |
} | |
dir('credentials') { | |
git branch: 'master', | |
credentialsId: 'jenkins_github_credentials', | |
url: 'ssh://git@garystafford/ucp-bundle-jenkins.git' | |
} | |
} | |
dir('service') { | |
stage('Build and Unit Test') { | |
sh './gradlew clean cleanTest build' | |
} | |
withEnv(["IMAGE=hello-world:${BUILD_NUMBER}"]) { | |
stage('Docker Build Image') { | |
withCredentials([[$class: 'StringBinding', | |
credentialsId: 'docker_username', | |
variable: 'DOCKER_PASSWORD'], | |
[$class: 'StringBinding', | |
credentialsId: 'docker_username', | |
variable: 'DOCKER_USERNAME']]) { | |
sh "docker login -u ${DOCKER_USERNAME} -p ${DOCKER_PASSWORD} ${DTR_URL}" | |
} | |
sh "docker build –no-cache -t ${DTR_URL}/${IMAGE} ." | |
} | |
stage('Docker Push Image') { | |
sh "docker push ${DTR_URL}/${IMAGE}" | |
} | |
withEnv(['DOCKER_TLS_VERIFY=1', | |
"DOCKER_CERT_PATH=${WORKSPACE}/credentials/", | |
"DOCKER_HOST=${DOCKER_HOST}"]) { | |
stage('Docker Stack Deploy') { | |
try { | |
sh "docker service rm ${params.ENVIRONMENT}_hello-world" | |
sh 'sleep 30s' // wait for service to be completely removed if it exists | |
} catch (err) { | |
echo "Error: ${err}" // catach and move on if it doesn't already exist | |
} | |
sh "docker stack deploy \ | |
–compose-file=docker-compose.yml ${params.ENVIRONMENT}" | |
} | |
} | |
} | |
} | |
} |
Centralized Logging and Metrics Collection
Centralized logging and metrics collection systems are used for application and infrastructure dashboards, monitoring, and alerting. In the shared non-production environment examples, the centralized logging and metrics collection systems are internal to each VPC, but reside on separate EC2 instances and are not registered with the Control Plane. In this way, the logging and metrics collection systems should not impact the reliability, performance, and security of the applications running within Docker EE. In the example, Worker nodes run a containerized copy of fluentd, which collects and pushes logs to ELK’s Elasticsearch.
Logging and metrics collection systems could also be supplied by external cloud-based SaaS providers, such as Loggly, Sysdig and Datadog, or by the platform’s cloud-provider, such as Amazon CloudWatch.
With four environments running multiple containerized copies of each service, figuring out which log entry came from which service instance, requires multiple data points. As shown in the example Kibana UI below, the environment value, along with the service name and container ID, as well as the git commit hash and branch, are added to each log entry for easier troubleshooting. To include the environment, the value of the ENVIRONMENT
variable is passed to Docker’s fluentd log driver as an env
option. This same labeling method is used to tag metrics.
Separate Docker Service Stacks
For further environment separation within the single Control Plane, services are deployed as part of the same Docker service stack. Each service stack contains all services that comprise an application running within a single environment. Multiple stacks may be required to support multiple, distinct applications within the same environment.
For example, in the screenshot below, a hello-world service container, built with a Docker image, tagged with build 59 of the Jenkins continuous integration pipeline, is deployed as part of both the Development (dev) and Test service stacks. The CD and UAT service stacks each contain different versions of the hello-world service.
Separate Docker Overlay Networks
For additional environment separation within the single non-production UCP, all Docker service stacks associated with an environment, reside on the same Docker overlay network. Overlay networks manage communications among the Docker Worker nodes, enabling service-to-service communication for all services on the same overlay network while isolating services running on one network from services running on another network.
in the example screenshot below, the hello-world service, a member of the test service stack, is running on the test_default
overlay network.
Cleaning Up
Having distinct environment-centric Docker service stacks and overlay networks makes it easy to clean up an environment, without impacting adjacent environments. Both service stacks and overlay networks can be removed to clear an environment’s contents.
Separate Performance Environment
In the alternative example below, a Performance environment has been added to the Non-Production VPC. To ensure a higher level of isolation, the Performance environment has its own UPC, RDS, and ELBs. The Performance environment shares the DTR, as well as the security, logging, and monitoring components, with the rest of the non-production environments.
Below, the Performance environment has half the number of Worker nodes as Production. Performance results can be scaled for expected Production performance, given more nodes. Alternately, the number of nodes can be scaled up temporarily to match Production, then scaled back down to a minimum after testing is complete.
Shared DevOps Tooling
All environments leverage shared Development and DevOps resources, deployed to a separate VPC. Resources include Agile Application Lifecycle Management (ALM), such as JIRA or CA Agile Central, source control repository management (SCM), such as GitLab or Bitbucket, binary repository management, such as Artifactory or Nexus, and a CI/CD solution, such as Jenkins, TeamCity, or Bamboo.
From the DevOps VPC, Docker images are pushed and pulled from the DTR in the Non-Production VPC. Deployments of container-based application are executed from the DevOps VPC CI/CD server to the non-production, Performance, and Production UCPs. Separate DevOps CI/CD pipelines and access controls are essential in maintaining the separation of the non-production and Production environments.
Complete Platform
Several common components found in a Docker EE cloud-based AWS platform were discussed in the post. However, a complete AWS application platform has many more moving parts. Below is a comprehensive list of components, including DevOps tooling, organized into two categories: 1) common components that can be potentially shared across the non-production environments to save cost and complexity, and 2) components that should be replicated in each non-environment for security and performance.
Shared Non-Production Components:
- AWS
-
- Virtual Private Cloud (VPC), Region, Availability Zones
- Route Tables, Network ACLs, Internet Gateways
- Subnets
- Some Security Groups
- IAM Groups, User, Roles, Policies (RBAC)
- Relational Database Service (RDS)
- ElastiCache
- API Gateway, Lambdas
- S3 Buckets
- Bastion Servers, NAT Gateways
- Route 53 Hosted Zone (Registered Domain)
- EC2 Key Pairs
- Hardened Linux AMI
- Docker EE
-
- UCP and EC2 Manager Nodes
- DTR and EC2 Worker Nodes
- UCP and DTR Users, Teams, Organizations
- DTR Image Repositories
- Secret Management
- Third-Party Components/Products
-
- SSL Certificates
- Security Components: Firewalls, Virus Scanning, VPN Servers
- Container Security
- End-User IAM
- Directory Service
- Log Aggregation
- Metric Collection
- Monitoring, Alerting
- Configuration and Secret Management
- DevOps
-
- CI/CD Pipelines as Code
- Infrastructure as Code
- Source Code Repositories
- Binary Artifact Repositories
Isolated Non-Production Components:
- AWS
-
- Route 53 Hosted Zones and Associated Records
- Elastic Load Balancers (ELB)
- Elastic Compute Cloud (EC2) Worker Nodes
- Elastic IPs
- ELB and EC2 Security Groups
- RDS Databases (Single RDS Instance with Separate Databases)
All opinions in this post are my own and not necessarily the views of my current employer or their clients.