Posts Tagged Postgres
Getting Started with PostgreSQL using Amazon RDS, CloudFormation, pgAdmin, and Python
Posted by Gary A. Stafford in AWS, Cloud, Python, Software Development on August 9, 2019
Introduction
In the following post, we will explore how to get started with Amazon Relational Database Service (RDS) for PostgreSQL. CloudFormation will be used to build a PostgreSQL master database instance and a single read replica in a new VPC. AWS Systems Manager Parameter Store will be used to store our CloudFormation configuration values. Amazon RDS Event Notifications will send text messages to our mobile device to let us know when the RDS instances are ready for use. Once running, we will examine a variety of methods to interact with our database instances, including pgAdmin, Adminer, and Python.
Technologies
The primary technologies used in this post include the following.
PostgreSQL
According to its website, PostgreSQL, commonly known as Postgres, is the world’s most advanced Open Source relational database. Originating at UC Berkeley in 1986, PostgreSQL has more than 30 years of active core development. PostgreSQL has earned a strong reputation for its proven architecture, reliability, data integrity, robust feature set, extensibility. PostgreSQL runs on all major operating systems and has been ACID-compliant since 2001.
Amazon RDS for PostgreSQL
According to Amazon, Amazon Relational Database Service (RDS) provides six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. RDS is available on several database instance types - optimized for memory, performance, or I/O.
Amazon RDS for PostgreSQL makes it easy to set up, operate, and scale PostgreSQL deployments in the cloud. Amazon RDS supports the latest PostgreSQL version 11, which includes several enhancements to performance, robustness, transaction management, query parallelism, and more.
AWS CloudFormation
According to Amazon, CloudFormation provides a common language to describe and provision all the infrastructure resources within AWS-based cloud environments. CloudFormation allows you to use a JSON- or YAML-based template to model and provision all the resources needed for your applications across all AWS regions and accounts, in an automated and secure manner.
Demonstration
Architecture
Below, we see an architectural representation of what will be built in the demonstration. This is not a typical three-tier AWS architecture, wherein the RDS instances would be placed in private subnets (data tier) and accessible only by the application tier, running on AWS. The architecture for the demonstration is designed for interacting with RDS through external database clients such as pgAdmin, and applications like our local Python scripts, detailed later in the post.
Source Code
All source code for this post is available on GitHub in a single public repository, postgres-rds-demo.
. ├── LICENSE.md ├── README.md ├── cfn-templates │ ├── event.template │ ├── rds.template ├── parameter_store_values.sh ├── python-scripts │ ├── create_pagila_data.py │ ├── database.ini │ ├── db_config.py │ ├── query_postgres.py │ ├── requirements.txt │ └── unit_tests.py ├── sql-scripts │ ├── pagila-insert-data.sql │ └── pagila-schema.sql └── stack.yml
To clone the GitHub repository, execute the following command.
git clone --branch master --single-branch --depth 1 --no-tags \ https://github.com/garystafford/aws-rds-postgres.git
Prerequisites
For this demonstration, I will assume you already have an AWS account. Further, that you have the latest copy of the AWS CLI and Python 3 installed on your development machine. Optionally, for pgAdmin and Adminer, you will also need to have Docker installed.
Steps
In this demonstration, we will perform the following steps.
- Put CloudFormation configuration values in Parameter Store;
- Execute CloudFormation templates to create AWS resources;
- Execute SQL scripts using Python to populate the new database with sample data;
- Configure pgAdmin and Python connections to RDS PostgreSQL instances;
AWS Systems Manager Parameter Store
With AWS, it is typical to use services like AWS Systems Manager Parameter Store and AWS Secrets Manager to store overt, sensitive, and secret configuration values. These values are utilized by your code, or from AWS services like CloudFormation. Parameter Store allows us to follow the proper twelve-factor, cloud-native practice of separating configuration from code.
To demonstrate the use of Parameter Store, we will place a few of our CloudFormation configuration items into Parameter Store. The demonstration’s GitHub repository includes a shell script, parameter_store_values.sh
, which will put the necessary parameters into Parameter Store.
Below, we see several of the demo’s configuration values, which have been put into Parameter Store.
SecureString
Whereas our other parameters are stored in Parameter Store as String datatypes, the database’s master user password is stored as a SecureString data-type. Parameter Store uses an AWS Key Management Service (KMS) customer master key (CMK) to encrypt the SecureString parameter value.
SMS Text Alert Option
Before running the Parameter Store script, you will need to change the /rds_demo/alert_phone
parameter value in the script (shown below) to your mobile device number, including country code, such as ‘+12038675309’. Amazon SNS will use it to send SMS messages, using Amazon RDS Event Notification. If you don’t want to use this messaging feature, simply ignore this parameter and do not execute the event.template
CloudFormation template in the proceeding step.
aws ssm put-parameter \ --name /rds_demo/alert_phone \ --type String \ --value "your_phone_number_here" \ --description "RDS alert SMS phone number" \ --overwrite
Run the following command to execute the shell script, parameter_store_values.sh
, which will put the necessary parameters into Parameter Store.
sh ./parameter_store_values.sh
CloudFormation Templates
The GitHub repository includes two CloudFormation templates, cfn-templates/event.template
and cfn-templates/rds.template
. This event template contains two resources, which are an AWS SNS Topic and an AWS RDS Event Subscription. The RDS template also includes several resources, including a VPC, Internet Gateway, VPC security group, two public subnets, one RDS master database instance, and an AWS RDS Read Replica database instance.
The resources are split into two CloudFormation templates so we can create the notification resources, first, independently of creating or deleting the RDS instances. This will ensure we get all our SMS alerts about both the creation and deletion of the databases.
Template Parameters
The two CloudFormation templates contain a total of approximately fifteen parameters. For most, you can use the default values I have set or chose to override them. Four of the parameters will be fulfilled from Parameter Store. Of these, the master database password is treated slightly differently because it is secure (encrypted in Parameter Store). Below is a snippet of the template showing both types of parameters. The last two are fulfilled from Parameter Store.
DBInstanceClass: Type: String Default: "db.t3.small" DBStorageType: Type: String Default: "gp2" DBUser: Type: String Default: "{{resolve:ssm:/rds_demo/master_username:1}}" DBPassword: Type: String Default: "{{resolve:ssm-secure:/rds_demo/master_password:1}}" NoEcho: True
Choosing the default CloudFormation parameter values will result in two minimally-configured RDS instances running the PostgreSQL 11.4 database engine on a db.t3.small instance with 10 GiB of General Purpose (SSD) storage. The db.t3 DB instance is part of the latest generation burstable performance instance class. The master instance is not configured for Multi-AZ high availability. However, the master and read replica each run in a different Availability Zone (AZ) within the same AWS Region.
Parameter Versioning
When placing parameters into Parameter Store, subsequent updates to a parameter result in the version number of that parameter being incremented. Note in the examples above, the version of the parameter is required by CloudFormation, here, ‘1’. If you chose to update a value in Parameter Store, thus incrementing the parameter’s version, you will also need to update the corresponding version number in the CloudFormation template’s parameter.
{ "Parameter": { "Name": "/rds_demo/rds_username", "Type": "String", "Value": "masteruser", "Version": 1, "LastModifiedDate": 1564962073.774, "ARN": "arn:aws:ssm:us-east-1:1234567890:parameter/rds_demo/rds_username" } }
Validating Templates
Although I have tested both templates, I suggest validating the templates yourself, as you usually would for any CloudFormation template you are creating. You can use the AWS CLI CloudFormation validate-template
CLI command to validate the template. Alternately, or I suggest additionally, you can use CloudFormation Linter, cfn-lint
command.
aws cloudformation validate-template \ --template-body file://cfn-templates/rds.template cfn-lint -t cfn-templates/cfn-templates/rds.template
Create the Stacks
To execute the first CloudFormation template and create a CloudFormation Stack containing the two event notification resources, run the following create-stack
CLI command.
aws cloudformation create-stack \ --template-body file://cfn-templates/event.template \ --stack-name RDSEventDemoStack
The first stack only takes less than one minute to create. Using the AWS CloudFormation Console, make sure the first stack completes successfully before creating the second stack with the command, below.
aws cloudformation create-stack \ --template-body file://cfn-templates/rds.template \ --stack-name RDSDemoStack
Wait for my Text
In my tests, the CloudFormation RDS stack takes an average of 25–30 minutes to create and 15–20 minutes to delete, which can seem like an eternity. You could use the AWS CloudFormation console (shown below) or continue to use the CLI to follow the progress of the RDS stack creation.
However, if you recall, the CloudFormation event template creates an AWS RDS Event Subscription. This resource will notify us when the databases are ready by sending text messages to our mobile device.
In the CloudFormation events template, the RDS Event Subscription is configured to generate Amazon Simple Notification Service (SNS) notifications for several specific event types, including RDS instance creation and deletion.
MyEventSubscription: Properties: Enabled: true EventCategories: - availability - configuration change - creation - deletion - failover - failure - recovery SnsTopicArn: Ref: MyDBSNSTopic SourceType: db-instance Type: AWS::RDS::EventSubscription
Amazon SNS will send SMS messages to the mobile number you placed into Parameter Store. Below, we see messages generated during the creation of the two instances, displayed on an Apple iPhone.
Amazon RDS Dashboard
Once the RDS CloudFormation stack has successfully been built, the easiest way to view the results is using the Amazon RDS Dashboard, as shown below. Here we see both the master and read replica instances have been created and are available for our use.
The RDS dashboard offers CloudWatch monitoring of each RDS instance.
The RDS dashboard also provides detailed configuration information about each RDS instance.
The RDS dashboard’s Connection & security tab is where we can obtain connection information about our RDS instances, including the RDS instance’s endpoints. Endpoints information will be required in the next part of the demonstration.
Sample Data
Now that we have our PostgreSQL database instance and read replica successfully provisioned and configured on AWS, with an empty database, we need some test data. There are several sources of sample PostgreSQL databases available on the internet to explore. We will use the Pagila sample movie rental database by pgFoundry. Although the database is several years old, it provides a relatively complex relational schema (table relationships shown below) and plenty of sample data to query, about 100 database objects and 46K rows of data.
In the GitHub repository, I have included the two Pagila database SQL scripts required to install the sample database’s data structures (DDL), sql-scripts/pagila-schema.sql
, and the data itself (DML), sql-scripts/pagila-insert-data.sql
.
To execute the Pagila SQL scripts and install the sample data, I have included a Python script. If you do not want to use Python, you can skip to the Adminer section of this post. Adminer also has the capability to import SQL scripts.
Before running any of the included Python scripts, you will need to install the required Python packages and configure the database.ini
file.
Python Packages
To install the required Python packages using the supplied python-scripts/requirements.txt
file, run the below commands.
cd python-scripts pip3 install --upgrade -r requirements.txt
We are using two packages, psycopg2 and configparser, for the scripts. Psycopg is a PostgreSQL database adapter for Python. According to their website, Psycopg is the most popular PostgreSQL database adapter for the Python programming language. The configparser
module allows us to read configuration from files similar to Microsoft Windows INI files. The unittest package is required for a set of unit tests includes the project, but not discussed as part of the demo.
Database Configuration
The python-scripts/database.ini
file, read by configparser
, provides the required connection information to our RDS master and read replica instance’s databases. Use the input parameters and output values from the CloudFormation RDS template, or the Amazon RDS Dashboard to obtain the required connection information, as shown in the example, below. Your host
values will be unique for your master and read replica. The host values are the instance’s endpoint, listed in the RDS Dashboard’s Configuration tab.
[docker] host=localhost port=5432 database=pagila user=masteruser password=5up3r53cr3tPa55w0rd [master] host=demo-instance.dkfvbjrazxmd.us-east-1.rds.amazonaws.com port=5432 database=pagila user=masteruser password=5up3r53cr3tPa55w0rd [replica] host=demo-replica.dkfvbjrazxmd.us-east-1.rds.amazonaws.com port=5432 database=pagila user=masteruser password=5up3r53cr3tPa55w0rd
With the INI file configured, run the following command, which executes a supplied Python script, python-scripts/create_pagila_data.py
, to create the data structure and insert sample data into the master RDS instance’s Pagila database. The database will be automatically replicated to the RDS read replica instance. From my local laptop, I found the Python script takes approximately 40 seconds to create all 100 database objects and insert 46K rows of movie rental data. That is compared to about 13 seconds locally, using a Docker-based instance of PostgreSQL.
python3 ./create_pagila_data.py --instance master
The Python script’s primary function, create_pagila_db()
, reads and executes the two external SQL scripts.
def create_pagila_db(): """ Creates Pagila database by running DDL and DML scripts """ try: global conn with conn: with conn.cursor() as curs: curs.execute(open("../sql-scripts/pagila-schema.sql", "r").read()) curs.execute(open("../sql-scripts/pagila-insert-data.sql", "r").read()) conn.commit() print('Pagila SQL scripts executed') except (psycopg2.OperationalError, psycopg2.DatabaseError, FileNotFoundError) as err: print(create_pagila_db.__name__, err) close_conn() exit(1)
If the Python script executes correctly, you should see output indicating there are now 28 tables in our master RDS instance’s database.
pgAdmin
pgAdmin is a favorite tool for interacting with and managing PostgreSQL databases. According to its website, pgAdmin is the most popular and feature-rich Open Source administration and development platform for PostgreSQL.
The project includes an optional Docker Swarm stack.yml
file. The stack will create a set of three Docker containers, including a local copy of PostgreSQL 11.4, Adminer, and pgAdmin 4. Having a local copy of PostgreSQL, using the official Docker image, is helpful for development and trouble-shooting RDS issues.
Use the following commands to deploy the Swarm stack.
# create stack docker swarm init docker stack deploy -c stack.yml postgres # get status of new containers docker stack ps postgres --no-trunc docker container ls
If you do not want to spin up the whole Docker Swarm stack, you could use the docker run
command to create just a single pgAdmin Docker container. The pgAdmin 4 Docker image being used is the image recommended by pgAdmin.
docker pull dpage/pgadmin4 docker run -p 81:80 \ -e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \ -e "PGADMIN_DEFAULT_PASSWORD=SuperSecret" \ -d dpage/pgadmin4 docker container ls | grep pgadmin4
Database Server Configuration
Once pgAdmin is up and running, we can configure the master and read replica database servers (RDS instances) using the connection string information from your database.ini
file or from the Amazon RDS Dashboard. Below, I am configuring the master RDS instance (server).
With that task complete, below, we see the master RDS instance and the read replica, as well as my local Docker instance configured in pgAdmin (left side of screengrab). Note how the Pagila database has been replicated automatically, from the RDS master to the read replica instance.
Building SQL Queries
Switching to the Query tab, we can run regular SQL queries against any of the database instances. Below, I have run a simple SELECT query against the master RDS instance’s Pagila database, returning the complete list of movie titles, along with their genre and release date.
The pgAdmin Query tool even includes an Explain tab to view a graphical representation of the same query, very useful for optimization. Here we see the same query, showing an analysis of the execution order. A popup window displays information about the selected object.
Query the Read Replica
To demonstrate the use of the read replica, below I’ve run the same query against the RDS read replica’s copy of the Pagila database. Any schema and data changes against the master instance are replicated to the read replica(s).
Adminer
Adminer is another good general-purpose database management tool, similar to pgAdmin, but with a few different capabilities. According to its website, with Adminer, you get a tidy user interface, rich support for multiple databases, performance, and security, all from a single PHP file. Adminer is my preferred tool for database admin tasks. Amazingly, Adminer works with MySQL, MariaDB, PostgreSQL, SQLite, MS SQL, Oracle, SimpleDB, Elasticsearch, and MongoDB.
Below, we see the Pagila database’s tables and views displayed in Adminer, along with some useful statistical information about each database object.
Similar to pgAdmin, we can also run queries, along with other common development and management tasks, from within the Adminer interface.
Import Pagila with Adminer
Another great feature of Adminer is the ability to easily import and export data. As an alternative to Python, you could import the Pagila data using Adminer’s SQL file import function. Below, you see an example of importing the Pagila database objects into the Pagila database, using the file upload function.
IDE
For writing my AWS infrastructure as code files and Python scripts, I prefer JetBrains PyCharm Professional Edition (v19.2). PyCharm, like all the JetBrains IDEs, has the ability to connect to and manage PostgreSQL database. You can write and run SQL queries, including the Pagila SQL import scripts. Microsoft Visual Studio Code is another excellent, free choice, available on multiple platforms.
Python and RDS
Although our IDE, pgAdmin, and Adminer are useful to build and test our queries, ultimately, we still need to connect to the Amazon RDS PostgreSQL instances and perform data manipulation from our application code. The GitHub repository includes a sample python script, python-scripts/query_postgres.py
. This script uses the same Python packages and connection functions as our Pagila data creation script we ran earlier. This time we will perform the same SELECT query using Python as we did previously with pgAdmin and Adminer.
cd python-scripts python3 ./query_postgres.py --instance master
With a successful database connection established, the scripts primary function, get_movies(return_count)
, performs the SELECT query. The function accepts an integer representing the desired number of movies to return from the SELECT query. A separate function within the script handles closing the database connection when the query is finished.
def get_movies(return_count=100): """ Queries for all films, by genre and year """ try: global conn with conn: with conn.cursor() as curs: curs.execute(""" SELECT title AS title, name AS genre, release_year AS released FROM film f JOIN film_category fc ON f.film_id = fc.film_id JOIN category c ON fc.category_id = c.category_id ORDER BY title LIMIT %s; """, (return_count,)) movies = [] row = curs.fetchone() while row is not None: movies.append(row) row = curs.fetchone() return movies except (psycopg2.OperationalError, psycopg2.DatabaseError) as err: print(get_movies.__name__, err) finally: close_conn() def main(): set_connection('docker') for movie in get_movies(10): print('Movie: {0}, Genre: {1}, Released: {2}' .format(movie[0], movie[1], movie[2]))
Below, we see an example of the Python script’s formatted output, limited to only the first ten movies.
Using the Read Replica
For better application performance, it may be optimal to redirect some or all of the database reads to the read replica, while leaving writes, updates, and deletes to hit the master instance. The script can be easily modified to execute the same query against the read replica rather than the master RDS instance by merely passing the desired section, ‘replica’ versus ‘master’, in the call to the set_connection(section)
function. The section parameter refers to one of the two sections in the database.ini
file. The configparser
module will handle retrieving the correct connection information.
set_connection('replica')
Cleaning Up
When you are finished with the demonstration, the easiest way to clean up all the AWS resources and stop getting billed is to delete the two CloudFormation stacks using the AWS CLI, in the following order.
aws cloudformation delete-stack \ --stack-name RDSDemoStack # wait until the above resources are completely deleted aws cloudformation delete-stack \ --stack-name RDSEventDemoStack
You should receive the following SMS notifications as the first CloudFormation stack is being deleted.
You can delete the running Docker stack using the following command. Note, you will lose all your pgAdmin server connection information, along with your local Pagila database.
docker stack rm postgres
Conclusion
In this brief post, we just scraped the surface of the many benefits and capabilities of Amazon RDS for PostgreSQL. The best way to learn PostgreSQL and the benefits of Amazon RDS is by setting up your own RDS instance, insert some sample data, and start writing queries in your favorite database client or programming language.
All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.
Developing Cloud-Native Data-Centric Spring Boot Applications for Pivotal Cloud Foundry
Posted by Gary A. Stafford in AWS, Cloud, Continuous Delivery, DevOps, Java Development, PCF, Software Development on March 23, 2018
In this post, we will explore the development of a cloud-native, data-centric Spring Boot 2.0 application, and its deployment to Pivotal Software’s hosted Pivotal Cloud Foundry service, Pivotal Web Services. We will add a few additional features, such as Spring Data, Lombok, and Swagger, to enhance our application.
According to Pivotal, Spring Boot makes it easy to create stand-alone, production-grade Spring-based Applications. Spring Boot takes an opinionated view of the Spring platform and third-party libraries. Spring Boot 2.0 just went GA on March 1, 2018. This is the first major revision of Spring Boot since 1.0 was released almost 4 years ago. It is also the first GA version of Spring Boot that provides support for Spring Framework 5.0.
Pivotal Web Services’ tagline is ‘The Agile Platform for the Agile Team Powered by Cloud Foundry’. According to Pivotal, Pivotal Web Services (PWS) is a hosted environment of Pivotal Cloud Foundry (PCF). PWS is hosted on AWS in the US-East region. PWS utilizes two availability zones for redundancy. PWS provides developers a Spring-centric PaaS alternative to AWS Elastic Beanstalk, Elastic Container Service (Amazon ECS), and OpsWorks. With PWS, you get the reliability and security of AWS, combined with the rich-functionality and ease-of-use of PCF.
To demonstrate the feature-rich capabilities of the Spring ecosystem, the Spring Boot application shown in this post incorporates the following complimentary technologies:
- Spring Boot Actuator: Sub-project of Spring Boot, adds several production grade services to Spring Boot applications with little developer effort
- Spring Data JPA: Sub-project of Spring Data, easily implement JPA based repositories and data access layers
- Spring Data REST: Sub-project of Spring Data, easily build hypermedia-driven REST web services on top of Spring Data repositories
- Spring HATEOAS: Create REST representations that follow the HATEOAS principle from Spring-based applications
- Springfox Swagger 2: We are using the Springfox implementation of the Swagger 2 specification, an automated JSON API documentation for API’s built with Spring
- Lombok: The
@Data
annotation generates boilerplate code that is typically associated with simple POJOs (Plain Old Java Objects) and beans:@ToString
,@EqualsAndHashCode
,@Getter
,@Setter
, and@RequiredArgsConstructor
Source Code
All source code for this post can be found on GitHub. To get started quickly, use one of the two following commands (gist).
# clone the official v2.1.1 release for this post | |
git clone --depth 1 --branch v2.1.1 \ | |
https://github.com/garystafford/spring-postgresql-demo.git \ | |
&& cd spring-postgresql-demo \ | |
&& git checkout -b v2.1.1 | |
# clone the latest version of code (newer than article) | |
git clone --depth 1 --branch master \ | |
https://github.com/garystafford/spring-postgresql-demo.git \ | |
&& cd spring-postgresql-demo |
For this post, I have used JetBrains IntelliJ IDEA and Git Bash on Windows for development. However, all code should be compatible with most popular IDEs and development platforms. The project assumes you have Docker and the Cloud Foundry Command Line Interface (cf CLI) installed locally.
Code samples in this post are displayed as Gists, which may not display correctly on some mobile and social media browsers. Links to gists are also provided.
Demo Application
The Spring Boot application demonstrated in this post is a simple election-themed RESTful API. The app allows API consumers to create, read, update, and delete, candidates, elections, and votes, via its exposed RESTful HTTP-based resources.
The Spring Boot application consists of (7) JPA Entities that mirror the tables and views in the database, (7) corresponding Spring Data Repositories, (2) Spring REST Controller, (4) Liquibase change sets, and associated Spring, Liquibase, Swagger, and PCF configuration files. I have intentionally chosen to avoid the complexities of using Data Transfer Objects (DTOs) for brevity, albeit a security concern, and directly expose the entities as resources.
Controller Resources
This application is a simple CRUD application. The application contains a few simple HTTP GET resources in each of the two controller classes, as an introduction to Spring REST controllers. For example, the CandidateController
contains the /candidates/summary
and /candidates/summary/{election}
resources (shown below in Postman). Typically, you would expose your data to the end-user as controller resources, as opposed to exposing the entities directly. The ease of defining controller resources is one of the many powers of Spring Boot.
Paging and Sorting
As an introduction to Spring Data’s paging and sorting features, both the VoteRepository
and VotesElectionsViewRepository
Repository Interfaces extend Spring Data’s PagingAndSortingRepository<T,ID>
interface, instead of the default CrudRepository<T,ID>
interface. With paging and sorting enabled, you may both sort and limit the amount of data returned in the response payload. For example, to reduce the size of your response payload, you might choose to page through the votes in blocks of 25 votes at a time. In that case, as shown below in Postman, if you needed to return just votes 26-50, you would append the /votes
resource with ?page=1&size=25
. Since paging starts on page 0 (zero), votes 26-50 will on page 1.
Swagger
This project also includes the Springfox implementation of the Swagger 2 specification. Along with the Swagger 2 dependency, the project takes a dependency on io.springfox:springfox-swagger-ui
. The Springfox Swagger UI dependency allows us to view and interactively test our controller resources through Swagger’s browser-based UI, as shown below.
All Swagger configuration can be found in the project’s SwaggerConfig
Spring Configuration class.
Gradle
This post’s Spring Boot application is built with Gradle, although it could easily be converted to Maven if desired. According to Gradle, Gradle is the modern tool used to build, automate and deliver everything from mobile apps to microservices.
Data
In real life, most applications interact with one or more data sources. The Spring Boot application demonstrated in this post interacts with data from a PostgreSQL database. PostgreSQL, or simply Postgres, is the powerful, open-source object-relational database system, which has supported production-grade applications for 15 years. The application’s single elections
database consists of (6) tables, (3) views, and (2) function, which are both used to generate random votes for this demonstration.
Spring Data makes interacting with PostgreSQL easy. In addition to the features of Spring Data, we will use Liquibase. Liquibase is known as the source control for your database. With Liquibase, your database development lifecycle can mirror your Spring development lifecycle. Both DDL (Data Definition Language) and DML (Data Manipulation Language) changes are versioned controlled, alongside the Spring Boot application source code.
Locally, we will take advantage of Docker to host our development PostgreSQL database, using the official PostgreSQL Docker image. With Docker, there is no messy database installation and configuration of your local development environment. Recreating and deleting your PostgreSQL database is simple.
To support the data-tier in our hosted PWS environment, we will implement ElephantSQL, an offering from the Pivotal Services Marketplace. ElephantSQL is a hosted version of PostgreSQL, running on AWS. ElephantSQL is referred to as PostgreSQL as a Service, or more generally, a Database as a Service or DBaaS. As a Pivotal Marketplace service, we will get easy integration with our PWS-hosted Spring Boot application, with near-zero configuration.
Docker
First, set up your local PostgreSQL database using Docker and the official PostgreSQL Docker image. Since this is only a local database instance, we will not worry about securing our database credentials (gist).
# create container | |
docker run --name postgres \ | |
-e POSTGRES_USERNAME=postgres \ | |
-e POSTGRES_PASSWORD=postgres1234 \ | |
-e POSTGRES_DB=elections \ | |
-p 5432:5432 \ | |
-d postgres | |
# view container | |
docker container ls | |
# trail container logs | |
docker logs postgres --follow |
Your running PostgreSQL container should resemble the output shown below.
Data Source
Most IDEs allow you to create and save data sources. Although this is not a requirement, it makes it easier to view the database’s resources and table data. Below, I have created a data source in IntelliJ from the running PostgreSQL container instance. The port, username, password, and database name were all taken from the above Docker command.
Liquibase
There are multiple strategies when it comes to managing changes to your database. With Liquibase, each set of changes are handled as change sets. Liquibase describes a change set as an atomic change that you want to apply to your database. Liquibase offers multiple formats for change set files, including XML, JSON, YAML, and SQL. For this post, I have chosen SQL, specifically PostgreSQL SQL dialect, which can be designated in the IntelliJ IDE. Below is an example of the first changeset, which creates four tables and two indexes.
As shown below, change sets are stored in the db/changelog/changes
sub-directory, as configured in the master change log file (db.changelog-master.yaml
). Change set files follow an incremental naming convention.
The empty PostgreSQL database, before any Liquibase changes, should resemble the screengrab shown below.
To automatically run Liquibase database migrations on startup, the org.liquibase:liquibase-core
dependency must be added to the project’s build.gradle
file. To apply the change sets to your local, empty PostgreSQL database, simply start the service locally with the gradle bootRun
command. As the app starts after being compiled, any new Liquibase change sets will be applied.
You might ask how does Liquibase know the change sets are new. During the initial startup of the Spring Boot application, in addition to any initial change sets, Liquibase creates two database tables to track changes, the databasechangelog
and databasechangeloglock
tables. Shown below are the two tables, along with the results of the four change sets included in the project, and applied by Liquibase to the local PostgreSQL elections database.
Below we see the contents of the databasechangelog
table, indicating that all four change sets were successfully applied to the database. Liquibase checks this table before applying change sets.
ElephantSQL
Before we can deploy our Spring Boot application to PWS, we need an accessible PostgreSQL instance in the Cloud; I have chosen ElephantSQL. Available through the Pivotal Services Marketplace, ElephantSQL currently offers one free and three paid service plans for their PostgreSQL as a Service. I purchased the Panda service plan as opposed to the free Turtle service plan. I found the free service plan was too limited in the maximum number of database connections for multiple service instances.
Previewing and purchasing an ElephantSQL service plan from the Pivotal Services Marketplace, assuming you have an existing PWS account, literally takes a single command (gist).
# view elephantsql service plans | |
cf marketplace -s elephantsql | |
# purchase elephantsql service plan | |
cf create-service elephantsql panda elections | |
# display details of running service | |
cf service elections |
The output of the command should resemble the screengrab below. Note the total concurrent connections and total storage for each plan.
To get details about the running ElephantSQL service, use the cf service elections
command.
From the ElephantSQL Console, we can obtain the connection information required to access our PostgreSQL elections database. You will need the default database name, username, password, and URL.
Service Binding
Once you have created the PostgreSQL database service, you need to bind the database service to the application. We will bind our application and the database, using the PCF deployment manifest file (manifest.yml
), found in the project’s root directory. Binding is done using the services
section (shown below).
The key/value pairs in the env
section of the deployment manifest will become environment variables, local to the deployed Spring Boot service. These key/value pairs in the manifest will also override any configuration set in Spring’s external application properties file (application.yml
). This file is located in the resources
sub-directory. Note the SPRING_PROFILES_ACTIVE: test
environment variable in the manifest.yml
file. This variable designates which Spring Profile will be active from the multiple profiles defined in the application.yml
file.
Deployment to PWS
Next, we run gradle build
followed by cf push
to deploy one instance of the Spring Boot service to PWS and associate it with our ElephantSQL database instance. Below is the expected output from the cf push
command.
Note the route highlighted below. This is the URL where your Spring Boot service will be available.
To confirm your ElephantSQL database was populated by Liquibase when PWS started the deployed Spring application instance, we can check the ElephantSQL Console’s Stats tab. Note the database tables and rows in each table, signifying Liquibase ran successfully. Alternately, you could create another data source in your IDE, connected to ElephantSQL; this can be helpful for troubleshooting.
To access the running service and check that data is being returned, point your browser (or Postman) to the URL returned from the cf push
command output (see second screengrab above) and hit the /candidates
resource. Obviously, your URL, referred to as a route by PWS, will be different and unique. In the response payload, you should observe a JSON array of eight candidate objects. Each candidate was inserted into the Candidate table of the database, by Liquibase, when Liquibase executed the second of the four change sets on start-up.
With Spring Boot Actuator and Spring Data REST, our simple Spring Boot application has dozens of resources exposed automatically, without extensive coding of resource controllers. Actuator exposes resources to help manage and troubleshoot the application, such as info
, health
, mappings
(shown below), metrics
, env
, and configprops
, among others. All Actuator resources are exposed explicitly, thus they can be disabled for Production deployments. With Spring Boot 2.0, all Actuator resources are now preceded with /actuator/
.
According to Pivotal, Spring Data REST builds on top of Spring Data repositories, analyzes an application’s domain model and exposes hypermedia-driven HTTP resources for aggregates contained in the model, such as our /candidates
resource. A partial list of the application’s exposed resources are listed in the GitHub project’s README file.
In Spring’s approach to building RESTful web services, HTTP requests are handled by a controller. Spring Data REST automatically exposes CRUD resources for our entities. With Spring Data JPA, POJOs like our Candidate class are annotated with @Entity
, indicating that it is a JPA entity. Lacking a @Table
annotation, it is assumed that this entity will be mapped to a table named Candidate
.
With Spring’s Data REST’s RESTful HTTP-based API, traditional database Create, Read, Update, and Delete commands for each PostgreSQL database table are automatically mapped to equivalent HTTP methods, including POST, GET, PUT, PATCH, and DELETE.
Below is an example, using Postman, to create a new Candidate using an HTTP POST method.
Below is an example, using Postman, to update a new Candidate using an HTTP PUT method.
With Spring Data REST, we can even retrieve data from read-only database Views, as shown below. This particular JSON response payload was returned from the candidates_by_elections
database View, using the /election-candidates
resource.
Scaling Up
Once your application is deployed and you have tested its functionality, you can easily scale out or scale in the number instances, memory, and disk, with the cf scale
command (gist).
# scale up to 2 instances | |
cf scale cf-spring -i 2 | |
# review status of both instances | |
cf app pcf-postgresql-demo |
Below is sample output from scaling up the Spring Boot application to two instances.
Optionally, you can activate auto-scaling, which will allow the application to scale based on load.
Following the PCF architectural model, auto-scaling is actually another service from the Pivotal Services Marketplace, PCF App Autoscaler, as seen below, running alongside our ElephantSQL service.
With PCF App Autoscaler, you set auto-scaling minimum and maximum instance limits and define scaling rules. Below, I have configured auto-scaling to scale out the number of application instances when the average CPU Utilization of all instances hits 80%. Conversely, the application will scale in when the average CPU Utilization recedes below 40%. In addition to CPU Utilization, PCF App Autoscaler also allows you to set scaling rules based on HTTP Throughput, HTTP Latency, RabbitMQ Depth (queue depth), and Memory Utilization.
Furthermore, I set the auto-scaling minimum number of instances to two and the maximum number of instances to four. No matter how much load is placed on the application, PWS will not scale above four instances. Conversely, PWS will maintain a minimum of two running instances at all times.
Conclusion
This brief post demonstrates both the power and simplicity of Spring Boot to quickly develop highly-functional, data-centric RESTful API applications with minimal coding. Further, when coupled with Pivotal Cloud Foundry, Spring developers have a highly scalable, resilient cloud-native application hosting platform.