Archive for category Java Development

Utilizing In-memory Data Caching to Enhance the Performance of Data Lake-based Applications

Significantly improve the performance and reduce the cost of data lake-based analytics applications using Amazon ElastiCache for Redis

Introduction

The recent post, Developing Spring Boot Applications for Querying Data Lakes on AWS using Amazon Athena, demonstrated how to develop a Cloud-native analytics application using Spring Boot. The application queried data in an Amazon S3-based data lake via an AWS Glue Data Catalog utilizing the Amazon Athena API.

Securely exposing data in a data lake using RESTful APIs can fulfill many data-consumer needs. However, access to that data can be significantly slower than access from a database or data warehouse. For example, in the previous post, we imported the OpenAPI v3 specification from the Spring Boot service into Postman. The API specification contained approximately 17 endpoints.

Running a suite of integration tests against the Spring Boot service using Postman

From my local development laptop, the Postman API test run times for all service endpoints took an average of 32.4 seconds. The Spring Boot service was running three Kubernetes pod replicas on Amazon EKS in the AWS US East (N. Virginia) Region.

Sample of test results ran from service (no Redis cache)

Compare the data lake query result times to equivalent queries made against a minimally-sized Amazon RDS for PostgreSQL database instance containing the same data. The average run times for all PostgreSQL queries averaged 10.8 seconds from a similar Spring Boot service. Although not a precise benchmark, we can clearly see that access to the data in the Amazon S3-based data lake is substantially slower, approximately 3x slower, than that of the PostgreSQL database. Tuning the database would easily create an even greater disparity.

Sample of test results for queries ran from service against Athena vs. PostgreSQL (quicker is better)

Caching for Data Lakes

According to AWS, the speed and throughput of a database can be the most impactful factor for overall application performance. Consequently, in-memory data caching can be one of the most effective strategies to improve overall application performance and reduce database costs. This same caching strategy can be applied to analytics applications built atop data lakes, as this post will demonstrate.

High-level AWS architecture demonstrated in this post

In-memory Caching

According to Hazelcast, memory caching (aka in-memory caching), often simply referred to as caching, is a technique in which computer applications temporarily store data in a computer’s main memory (e.g., RAM) to enable fast retrievals of that data. The RAM used for the temporary storage is known as the cache. As an application tries to read data, typically from a data storage system like a database, it checks to see if the desired record already exists in the cache. If it does, the application will read the data from the cache, thus eliminating the slower access to the database. If the desired record is not in the cache, then the application reads the record from the source. When it retrieves that data, it writes it to the cache so that when the application needs that same data in the future, it can quickly retrieve it from the cache.

Further, according to Hazelcast, as an application tries to read data, typically from a data storage system like a database, it checks to see if the desired record already exists in the cache. If it does, the application will read the data from the cache, thus eliminating the slower access to the database. If the desired record is not in the cache, then the application reads the record from the source. When it retrieves that data, it writes it to the cache so that when the application needs that same data in the future, it can quickly retrieve it from the cache.

Redis In-memory Data Store

According to their website, Redis is the open-source, in-memory data store used by millions of developers as a database, cache, streaming engine, and message broker. Redis provides data structures such as stringshasheslistssetssorted sets with range queries, bitmapshyperloglogsgeospatial indexes, and streams. In addition, Redis has built-in replicationLua scriptingLRU evictiontransactions, and different levels of on-disk persistence and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

Amazon ElastiCache for Redis

According to AWS, Amazon ElastiCache for Redis, the fully-managed version of Redis, according to AWS, is a blazing fast in-memory data store that provides sub-millisecond latency to power internet-scale real-time applications. Redis applications can work seamlessly with ElastiCache for Redis without any code changes. ElastiCache for Redis combines the speed, simplicity, and versatility of open-source Redis with manageability, security, and scalability from AWS. As a result, Redis is an excellent choice for implementing a highly available in-memory cache to decrease data access latency, increase throughput, and ease the load on relational and NoSQL databases.

ElastiCache Performance Results

In the following post, we will add in-memory caching to the Spring Boot service introduced in the previous post. In preliminary tests with Amazon ElastiCache for Redis, the Spring Boot service delivered a 34x improvement in average response times. For example, test runs with the best-case scenario of a Redis cache hit ratio of 100% averaged 0.95 seconds compared to 32.4 seconds without Redis.

Sample of uncached versus cached test results (quicker is better)

Source Code

All the source code and Docker and Kubernetes resources are open-source and available on GitHub.

git clone --depth 1 -b redis \
https://github.com/garystafford/athena-spring-app.git

In addition, a Docker image for the Redis-base Spring Boot service is available on Docker Hub. For this post, use the latest tag with the .redis suffix.

Spring Boot service images are available on Docker Hub

Code Changes

The following code changes are required to the Spring Boot service to implement Spring Boot Cache with Redis.

Gradle Build

The gradle.build file now implements two additional dependencies, Spring Boot’s spring-boot-starter-cache and spring-boot-starter-data-redis (lines 45–46).

plugins {
id 'org.springframework.boot' version '2.7.1'
id 'io.spring.dependency-management' version '1.0.11.RELEASE'
id 'java'
id 'io.freefair.lombok' version '6.5.0-rc1'
}
group = 'aws.example'
version = '2.0.0'
sourceCompatibility = '17'
def awsSdkVersion = '2.17.225'
def springBootVersion = '2.7.1'
def restAssuredVersion = '5.1.1'
repositories {
mavenCentral()
}
dependencies {
// aws sdk
runtimeOnly "software.amazon.awssdk:bom:${awsSdkVersion}"
implementation "software.amazon.awssdk:athena:${awsSdkVersion}"
// spring
annotationProcessor "org.springframework.boot:spring-boot-configuration-processor:${springBootVersion}"
implementation "org.springframework.boot:spring-boot-starter-web:${springBootVersion}"
implementation "org.springframework.boot:spring-boot-starter-web:${springBootVersion}"
implementation "org.springframework.boot:spring-boot-starter-actuator:${springBootVersion}"
developmentOnly "org.springframework.boot:spring-boot-devtools:${springBootVersion}"
implementation 'org.springdoc:springdoc-openapi-ui:1.6.9'
implementation 'org.springframework:spring-context:5.3.20'
// testings
testImplementation "org.springframework.boot:spring-boot-starter-test:${springBootVersion}"
testImplementation "io.rest-assured:rest-assured:${restAssuredVersion}"
testImplementation "io.rest-assured:json-path:${restAssuredVersion}"
testImplementation "io.rest-assured:xml-path:${restAssuredVersion}"
testImplementation "io.rest-assured:json-schema-validator:${restAssuredVersion}"
// monitoring
implementation 'io.micrometer:micrometer-registry-prometheus:1.9.1'
// spring caching with redis
implementation "org.springframework.boot:spring-boot-starter-cache:${springBootVersion}"
implementation "org.springframework.boot:spring-boot-starter-data-redis:${springBootVersion}"
}
tasks.named('test') {
useJUnitPlatform()
}
view raw build.gradle hosted with ❤ by GitHub

Application Properties

The application properties file, application.yml, has been modified for both the dev and prod Spring Profiles. The dev Spring Profile expects Redis to be running on localhost. Correspondingly, the project’s docker-compose.yml file now includes a Redis container for local development. The time-to-live (TTL) for all Redis caches is arbitrarily set to one minute for dev and five minutes for prod. To increase application performance and reduce the cost of querying the data lake using Athena, increase Redis’s TTL. Note that increasing the TTL will reduce data freshness.

spring:
profiles:
active: dev
server:
port: 8080
servlet:
contextPath: /v1
athena:
region: us-east-1
workgroup: primary
catalog: AwsDataCatalog
database: tickit_demo
limit: 25
client-execution-timeout: 100000
retry-sleep: 1000
results-bucket: ${RESULTS_BUCKET}
named-query-id: ${NAMED_QUERY_ID}
spring:
config:
activate:
on-profile: dev
redis:
host: localhost
port: 6379
cache:
redis:
time-to-live: 60000 # 1000 ms * 60 * 1 = 1 min
enable-statistics: true
server:
port: 8080
logging:
level:
root: DEBUG
management:
endpoints:
web:
exposure:
include: '*'
jmx:
exposure:
include: '*'
spring:
config:
activate:
on-profile: prod
redis:
host: ${REDIS_HOST}
port: 6379
cache:
redis:
time-to-live: 300000 # 1000 ms * 60 * 5 = 5 min
enable-statistics: true
logging:
level:
root: INFO
management:
endpoints:
web:
exposure:
include: health, prometheus
jmx:
exposure:
include: health
view raw application.yml hosted with ❤ by GitHub

Athena Application Class

The AthenaApplication class declaration is now decorated with Spring Framework’s EnableCaching annotation (line 22). Additionally, two new Beans have been added (lines 58–68). Spring Redis provides an implementation for the Spring cache abstraction through the org.springframework.data.redis.cache package. The RedisCacheManager cache manager creates caches by default upon the first write. The RedisCacheConfiguration cache configuration helps to customize RedisCache behavior such as caching null values, cache key prefixes, and binary serialization.

package com.example.athena;
import com.example.athena.common.PreparedStatement;
import com.example.athena.common.View;
import com.example.athena.config.ConfigProperties;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.cache.annotation.EnableCaching;
import org.springframework.context.annotation.Bean;
import org.springframework.data.redis.cache.RedisCacheConfiguration;
import org.springframework.data.redis.cache.RedisCacheManager;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.web.servlet.config.annotation.CorsRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
import java.time.Duration;
@SpringBootApplication
@EnableConfigurationProperties(ConfigProperties.class)
@EnableCaching
public class AthenaApplication {
private final PreparedStatement preparedStatement;
private final View view;
@Autowired
public AthenaApplication(PreparedStatement preparedStatement, View view) {
this.preparedStatement = preparedStatement;
this.view = view;
}
public static void main(String[] args) {
SpringApplication.run(AthenaApplication.class, args);
}
@Bean
void CreatePreparedStatement() {
preparedStatement.CreatePreparedStatement();
}
@Bean
void createView() {
view.CreateView();
}
@Bean
public WebMvcConfigurer corsConfigurer() {
return new WebMvcConfigurer() {
@Override
public void addCorsMappings(CorsRegistry registry) {
registry.addMapping("/**").allowedOrigins("*");
}
};
}
@Bean
public RedisCacheManager cacheManager(RedisConnectionFactory connectionFactory) {
return RedisCacheManager.create(connectionFactory);
}
@Bean
public RedisCacheConfiguration cacheConfiguration() {
return RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(5))
.disableCachingNullValues();
}
}

POJO Data Model Classes

Spring Boot Redis caching uses Java serialization and deserialization. Therefore, all the POJO data model classes must implement Serializable (line 14).

package com.example.athena.tickit.model.ecomm;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.io.Serializable;
import java.math.BigDecimal;
import java.time.LocalDateTime;
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Listing implements Serializable {
private int id;
private int sellerId;
private int eventId;
private int dateId;
private int numTickets;
private BigDecimal pricePerTicket;
private LocalDateTime listTime;
}
view raw Listing.java hosted with ❤ by GitHub

Service Classes

Each public method in the Service classes is now decorated with Spring Framework’s Cachable annotation (lines 42 and 66). For example, the findById(int id) method in the CategoryServiceImp class is annotated with @Cacheable(value = "categories", key = "#id"). The method’s key parameter uses Spring Expression Language (SpEL) expression for computing the key dynamically. Default is null, meaning all method parameters are considered as a key unless a custom keyGenerator has been configured. If no value is found in the Redis cache for the computed key, the target method will be invoked, and the returned value will be stored in the associated cache.

package com.example.athena.tickit.service;
import com.example.athena.common.AthenaClientFactory;
import com.example.athena.common.AthenaCommon;
import com.example.athena.config.ConfigProperties;
import com.example.athena.tickit.model.ecomm.Listing;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.cache.annotation.Cacheable;
import org.springframework.stereotype.Service;
import software.amazon.awssdk.services.athena.AthenaClient;
import software.amazon.awssdk.services.athena.model.*;
import software.amazon.awssdk.services.athena.paginators.GetQueryResultsIterable;
import java.math.BigDecimal;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.List;
import static java.lang.Integer.parseInt;
@Service
public class ListingServiceImp implements ListingService {
private static final Logger logger = LoggerFactory.getLogger(ListingServiceImp.class);
private final ConfigProperties configProperties;
private final AthenaClientFactory athenaClientFactory;
private final AthenaCommon athenaCommon;
@Autowired
public ListingServiceImp(ConfigProperties configProperties, AthenaClientFactory athenaClientFactory, AthenaCommon athenaCommon) {
this.configProperties = configProperties;
this.athenaClientFactory = athenaClientFactory;
this.athenaCommon = athenaCommon;
}
@Cacheable(value = "listings")
public List<Listing> findAll(Integer limit, Integer offset) {
if (limit == null || limit < 1 || limit > configProperties.getLimit()) {
limit = configProperties.getLimit();
}
if (offset == null || offset < 1) {
offset = 0;
}
String whereClause = "WHERE listid IS NOT NULL";
String query = String.format("""
SELECT *
FROM refined_tickit_public_listing
%s
ORDER BY listid
OFFSET %s
LIMIT %s;""", whereClause, offset, limit);
return startQuery(query);
}
@Cacheable(value = "listing", key = "#id")
public Listing findById(int id) {
String query = String.format("""
SELECT DISTINCT *
FROM refined_tickit_public_listing
WHERE listid=%s""", id);
Listing listing;
try {
listing = startQuery(query).get(0);
} catch (java.lang.IndexOutOfBoundsException e) {
logger.error(e.getMessage());
return null;
}
return listing;
}
private List<Listing> startQuery(String query) {
logger.debug(String.format("Query: %s", query.replace("\n", " ")));
AthenaClient athenaClient = athenaClientFactory.createClient();
String queryExecutionId = athenaCommon.submitAthenaQuery(athenaClient, query);
athenaCommon.waitForQueryToComplete(athenaClient, queryExecutionId);
List<Listing> listings = processResultRows(athenaClient, queryExecutionId);
athenaClient.close();
return listings;
}
private List<Listing> processResultRows(AthenaClient athenaClient, String queryExecutionId) {
List<Listing> listings = new ArrayList<>();
try {
GetQueryResultsRequest getQueryResultsRequest = GetQueryResultsRequest.builder()
.queryExecutionId(queryExecutionId).build();
GetQueryResultsIterable getQueryResultsResults = athenaClient.getQueryResultsPaginator(getQueryResultsRequest);
List<Row> rows;
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.S");
for (GetQueryResultsResponse result : getQueryResultsResults) {
rows = result.resultSet().rows();
for (Row myRow : rows.subList(1, rows.size())) { // skip first row – column names
List<Datum> allData = myRow.data();
Listing listing = new Listing();
listing.setId(parseInt(allData.get(0).varCharValue()));
listing.setSellerId(parseInt(allData.get(1).varCharValue()));
listing.setEventId(parseInt(allData.get(2).varCharValue()));
listing.setDateId(parseInt(allData.get(3).varCharValue()));
listing.setNumTickets(Integer.parseInt(allData.get(4).varCharValue()));
listing.setPricePerTicket(new BigDecimal(allData.get(5).varCharValue()));
listing.setListTime(LocalDateTime.parse(allData.get(6).varCharValue(), formatter));
listings.add(listing);
}
}
} catch (AthenaException e) {
logger.error(e.getMessage());
}
return listings;
}
}

Controller Classes

There are no changes required to the Controller classes.

Amazon ElastiCache for Redis

Multiple options are available for creating an Amazon ElastiCache for Redis cluster, including cluster mode, multi-AZ option, auto-failover option, node type, number of replicas, number of shards, replicas per shard, Availability Zone placements, and encryption at rest and encryption in transit options. The results in this post are based on a minimally-configured Redis version 6.2.6 cluster, with one shard, two cache.r6g.large nodes, and cluster mode, multi-AZ option, and auto-failover all disabled. In addition, encryption at rest and encryption in transit were also disabled. This cluster configuration is sufficient for development and testing, but not Production.

Amazon ElastiCache for Redis cluster used for this post
Amazon ElastiCache for Redis cluster monitoring console showing caching activity

Testing the Cache

To test Amazon ElastiCache for Redis, we will use Postman again with the imported OpenAPI v3 specification. With all data evicted from existing Redis caches, the first time the Postman tests run, they cause the service’s target methods to be invoked and the returned data stored in the associated caches.

Running a suite of integration tests against the Spring Boot service using Postman

To confirm this caching behavior, use the Redis CLI’s --scan option. To access the redis-cli, I deployed a single Redis pod to Amazon EKS. The first time the --scan command runs, we should get back an empty list of keys. After the first Postman test run, the same --scan command should return a list of cached keys.

List of cached keys in Redis

Use the Redis CLI’s MONITOR option to further confirm data is being cached, as indicated by the set command.

Athena query results being cached in Redis

After the initial caching of data, use the Redis CLI’s MONITOR option, again, to confirm the cache is being hit instead of calling the target methods, which would then invoke the Athena API. Rerunning the Postman tests, we should see get commands as opposed to set commands.

Monitoring cache hits in Redis

Lastly, to confirm the Spring Boot service is effectively using Redis to cache data, we can also check Amazon Athena’s Recent queries tab in the AWS Management Console. After repeated sequential test runs within the TTL window, we should only see one Athena query per endpoint.

Amazon Athena Recent queries tab in the AWS Management Console

Conclusion

In this brief follow-up to the recent post, Developing Spring Boot Applications for Querying Data Lakes on AWS using Amazon Athena, we saw how to substantially increase data lake application performance using Amazon ElastiCache for Redis. Although this caching technique is often associated with databases, it can also be effectively applied to data lake-based applications, as demonstrated in the post.


This blog represents my viewpoints and not of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners. All diagrams and illustrations are property of the author unless otherwise noted.

, , , , ,

Leave a comment

Developing Spring Boot Applications for Querying Data Lakes on AWS using Amazon Athena

Learn how to develop Cloud-native, RESTful Java services that query data in an AWS-based data lake using Amazon Athena’s API

Introduction

AWS provides a collection of fully-managed services that makes building and managing secure data lakes faster and easier, including AWS Lake Formation, AWS Glue, and Amazon S3. Additional analytics services such as Amazon EMR, AWS Glue Studio, and Amazon Redshift allow Data Scientists and Analysts to run high-performance queries on large volumes of semi-structured and structured data quickly and economically.

What is not always as obvious is how teams develop internal and external customer-facing analytics applications built on top of data lakes. For example, imagine sellers on an eCommerce platform, the scenario used in this post, want to make better marketing decisions regarding their products by analyzing sales trends and buyer preferences. Further, suppose the data required for the analysis must be aggregated from multiple systems and data sources; the ideal use case for a data lake.

Example of a personalized sales report generated from the Spring Boot service’s salesbyseller endpoint

In this post, we will explore an example Java Spring Boot RESTful Web Service that allows end-users to query data stored in a data lake on AWS. The RESTful Web Service will access data stored as Apache Parquet in Amazon S3 through an AWS Glue Data Catalog using Amazon Athena. The service will use Spring Boot and the AWS SDK for Java to expose a secure, RESTful Application Programming Interface (API).

High-level AWS architecture demonstrated in this post

Amazon Athena is a serverless, interactive query service based on Presto, used to query data and analyze big data in Amazon S3 using standard SQL. Using Athena functionality exposed by the AWS SDK for Java and Athena API, the Spring Boot service will demonstrate how to access tablesviewsprepared statements, and saved queries (aka named queries).

Amazon Athena Query Editor

TL;DR

Do you want to explore the source code for this post’s Spring Boot service or deploy it to Kubernetes before reading the full article? All the source code, Docker, and Kubernetes resources are open-source and available on GitHub.

git clone --depth 1 -b main \
https://github.com/garystafford/athena-spring-app.git

A Docker image for the Spring Boot service is also available on Docker Hub.

Spring Boot service image available on Docker Hub

Data Lake Data Source

There are endless data sources to build a demonstration data lake on AWS. This post uses the TICKIT sample database provided by AWS and designed for Amazon Redshift, AWS’s cloud data warehousing service. The database consists of seven tables. Two previous posts and associated videos, Building a Data Lake on AWS with Apache Airflow and Building a Data Lake on AWS, detail the setup of the data lake used in this post using AWS Glue and optionally Apache Airflow with Amazon MWAA.

Those two posts use the data lake pattern of segmenting data as bronze (aka raw), silver (aka refined), and gold (aka aggregated), popularized by Databricks. The data lake simulates a typical scenario where data originates from multiple sources, including an e-commerce platform, a CRM system, and a SaaS provider must be aggregated and analyzed.

High-level data lake architecture demonstrated in the previous post

Spring Projects with IntelliJ IDE

Although not a requirement, I used JetBrains IntelliJ IDEA 2022 (Ultimate Edition) to develop and test the post’s Spring Boot service. Bootstrapping Spring projects with IntelliJ is easy. Developers can quickly create a Spring project using the Spring Initializr plugin bundled with the IntelliJ.

JetBrains IntelliJ IDEA plugin support for Spring projects

The Spring Initializr plugin’s new project creation wizard is based on start.spring.io. The plugin allows you to quickly select the Spring dependencies you want to incorporate into your project.

Adding dependencies to a new Spring project in IntelliJ

Visual Studio Code

There are also several Spring extensions for the popular Visual Studio Code IDE, including Microsoft’s Spring Initializr Java Support extension.

Spring Initializr Java Support extension for Visual Studio Code by Microsoft

Gradle

This post uses Gradle instead of Maven to develop, test, build, package, and deploy the Spring service. Based on the packages selected in the new project setup shown above, the Spring Initializr plugin’s new project creation wizard creates a build.gradle file. Additional packages, such as LombakMicrometer, and Rest Assured, were added separately.

plugins {
id 'org.springframework.boot' version '2.7.1'
id 'io.spring.dependency-management' version '1.0.11.RELEASE'
id 'java'
id 'io.freefair.lombok' version '6.5.0-rc1'
}
group = 'aws.example'
version = '1.0.0'
sourceCompatibility = '17'
def awsSdkVersion = '2.17.225'
def springBootVersion = '2.7.1'
def restAssuredVersion = '5.1.1'
repositories {
mavenCentral()
}
dependencies {
// aws sdk
runtimeOnly "software.amazon.awssdk:bom:${awsSdkVersion}"
implementation "software.amazon.awssdk:athena:${awsSdkVersion}"
// spring
annotationProcessor "org.springframework.boot:spring-boot-configuration-processor:${springBootVersion}"
implementation "org.springframework.boot:spring-boot-starter-web:${springBootVersion}"
implementation "org.springframework.boot:spring-boot-starter-web:${springBootVersion}"
implementation "org.springframework.boot:spring-boot-starter-actuator:${springBootVersion}"
developmentOnly "org.springframework.boot:spring-boot-devtools:${springBootVersion}"
implementation 'org.springdoc:springdoc-openapi-ui:1.6.9'
implementation 'org.springframework:spring-context:5.3.20'
// testings
testImplementation "org.springframework.boot:spring-boot-starter-test:${springBootVersion}"
testImplementation "io.rest-assured:rest-assured:${restAssuredVersion}"
testImplementation "io.rest-assured:json-path:${restAssuredVersion}"
testImplementation "io.rest-assured:xml-path:${restAssuredVersion}"
testImplementation "io.rest-assured:json-schema-validator:${restAssuredVersion}"
// monitoring
implementation 'io.micrometer:micrometer-registry-prometheus:1.9.1'
}
tasks.named('test') {
useJUnitPlatform()
}
view raw build.gradle hosted with ❤ by GitHub

Amazon Corretto

The Spring boot service is developed for and compiled with the most recent version of Amazon Corretto 17. According to AWS, Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK). Corretto comes with long-term support that includes performance enhancements and security fixes. Corretto is certified as compatible with the Java SE standard and is used internally at Amazon for many production services.

Source Code

Each API endpoint in the Spring Boot RESTful Web Service has a corresponding POJO data model class, service interface and service implementation class, and controller class. In addition, there are also common classes such as configuration, a client factory, and Athena-specific request/response methods. Lastly, there are additional class dependencies for views and prepared statements.

Java class relationships related to querying the Amazon Athena refined_tickit_public_category table

The project’s source code is arranged in a logical hierarchy by package and class type.

.
└── com
└── example
└── athena
├── AthenaApplication.java
├── common
│   ├── AthenaClientFactory.java
│   ├── AthenaClientFactoryImp.java
│   ├── AthenaCommon.java
│   ├── NamedQuery.java
│   ├── PreparedStatement.java
│   └── View.java
├── config
│   └── ConfigProperties.java
└── tickit
├── controller
│   ├── BuyerLikesByCategoryController.java
│   ├── CategoryController.java
│   ├── DateDetailController.java
│   ├── EventController.java
│   ├── ListingController.java
│   ├── SaleBySellerController.java
│   ├── SaleController.java
│   ├── SalesByCategoryController.java
│   ├── UserController.java
│   └── VenueController.java
├── model
│   ├── crm
│   │   └── User.java
│   ├── ecomm
│   │   ├── DateDetail.java
│   │   ├── Listing.java
│   │   └── Sale.java
│   ├── resultsets
│   │   ├── BuyerLikesByCategory.java
│   │   ├── SaleBySeller.java
│   │   └── SalesByCategory.java
│   └── saas
│   ├── Category.java
│   ├── Event.java
│   └── Venue.java
└── service
├── BuyerLikesByCategoryServiceImp.java
├── BuyersLikesByCategoryService.java
├── CategoryService.java
├── CategoryServiceImp.java
├── DateDetailService.java
├── DateDetailServiceImp.java
├── EventService.java
├── EventServiceImp.java
├── ListingService.java
├── ListingServiceImp.java
├── SaleBySellerService.java
├── SaleBySellerServiceImp.java
├── SaleService.java
├── SaleServiceImp.java
├── SalesByCategoryService.java
├── SalesByCategoryServiceImp.java
├── UserService.java
├── UserServiceImp.java
├── VenueService.java
└── VenueServiceImp.java

Amazon Athena Access

There are three standard methods for accessing Amazon Athena with the AWS SDK for Java: 1) the AthenaClient service client, 2) the AthenaAsyncClient service client for accessing Athena asynchronously, and 3) using the JDBC driver with the AWS SDK. The AthenaClient and AthenaAsyncClient service clients are both parts of the software.amazon.awssdk.services.athena package. For simplicity, this post’s Spring Boot service uses the AthenaClient service client instead of Java’s asynchronously programming model. AWS supplies basic code samples as part of their documentation as a starting point for writing Athena applications using the SDK. The code samples also use the AthenaClient service client.

POJO-based Data Model Class

For each API endpoint in the Spring Boot RESTful Web Service, there is a corresponding Plain Old Java Object (POJO). According to Wikipedia, a POGO is an ordinary Java object, not bound by any particular restriction. The POJO class is similar to a JPA Entity, representing persistent data stored in a relational database. In this case, the POJO uses Lombok’s @Data annotation. According to the documentation, this annotation generates getters for all fields, a useful toString method, and hashCode and equals implementations that check all non-transient fields. It also generates setters for all non-final fields and a constructor.

package com.example.athena.tickit.model.saas;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.time.LocalDateTime;
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Event {
private int id;
private int venueId;
private int catId;
private int dateId;
private String name;
private LocalDateTime startTime;
}
view raw Event.java hosted with ❤ by GitHub

Each POJO corresponds directly to a ‘silver’ table in the AWS Glue Data Catalog. For example, the Event POJO corresponds to the refined_tickit_public_event table in the tickit_demo Data Catalog database. The POJO defines the Spring Boot service’s data model for data read from the corresponding AWS Glue Data Catalog table.

Glue Data Catalog refined_tickit_public_event table

The Glue Data Catalog table is the interface between the Athena query and the underlying data stored in Amazon S3 object storage. The Athena query targets the table, which returns the underlying data from S3.

Tickit Category data stored as Apache Parquet files in Amazon S3

Service Class

Retrieving data from the data lake via AWS Glue, using Athena, is handled by a service class. For each API endpoint in the Spring Boot RESTful Web Service, there is a corresponding Service Interface and implementation class. The service implementation class uses Spring Framework’s @Service annotation. According to the documentation, it indicates that an annotated class is a “Service,” initially defined by Domain-Driven Design (Evans, 2003) as “an operation offered as an interface that stands alone in the model, with no encapsulated state.” Most importantly for the Spring Boot service, this annotation serves as a specialization of @Component, allowing for implementation classes to be autodetected through classpath scanning.

package com.example.athena.tickit.service;
import com.example.athena.common.AthenaClientFactory;
import com.example.athena.common.AthenaCommon;
import com.example.athena.config.ConfigProperties;
import com.example.athena.tickit.model.saas.Event;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import software.amazon.awssdk.services.athena.AthenaClient;
import software.amazon.awssdk.services.athena.model.*;
import software.amazon.awssdk.services.athena.paginators.GetQueryResultsIterable;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.List;
import static java.lang.Integer.parseInt;
@Service
public class EventServiceImp implements EventService {
private static final Logger logger = LoggerFactory.getLogger(EventServiceImp.class);
private final ConfigProperties configProperties;
private final AthenaClientFactory athenaClientFactory;
private final AthenaCommon athenaCommon;
@Autowired
public EventServiceImp(ConfigProperties configProperties, AthenaClientFactory athenaClientFactory, AthenaCommon athenaCommon) {
this.configProperties = configProperties;
this.athenaClientFactory = athenaClientFactory;
this.athenaCommon = athenaCommon;
}
public List<Event> findAll(Integer limit, Integer offset) {
if (limit == null || limit < 1 || limit > configProperties.getLimit()) {
limit = configProperties.getLimit();
}
if (offset == null || offset < 1) {
offset = 0;
}
String whereClause = "WHERE eventid IS NOT NULL";
String query = String.format("""
SELECT *
FROM refined_tickit_public_event
%s
ORDER BY eventid
OFFSET %s
LIMIT %s;""", whereClause, offset, limit);
return startQuery(query);
}
public Event findById(int id) {
String query = String.format("""
SELECT DISTINCT *
FROM refined_tickit_public_event
WHERE eventid=%s""", id);
Event event;
try {
event = startQuery(query).get(0);
} catch (IndexOutOfBoundsException e) {
logger.error(e.getMessage());
return null;
}
return event;
}
private List<Event> startQuery(String query) {
logger.debug(String.format("Query: %s", query.replace("\n", " ")));
AthenaClient athenaClient = athenaClientFactory.createClient();
String queryExecutionId = athenaCommon.submitAthenaQuery(athenaClient, query);
athenaCommon.waitForQueryToComplete(athenaClient, queryExecutionId);
List<Event> events = processResultRows(athenaClient, queryExecutionId);
athenaClient.close();
return events;
}
private List<Event> processResultRows(AthenaClient athenaClient, String queryExecutionId) {
List<Event> events = new ArrayList<>();
try {
// Max Results can be set but if it's not set,
// it will choose the maximum page size
GetQueryResultsRequest getQueryResultsRequest = GetQueryResultsRequest.builder()
.queryExecutionId(queryExecutionId).build();
GetQueryResultsIterable getQueryResultsResults = athenaClient.getQueryResultsPaginator(getQueryResultsRequest);
List<Row> rows;
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.S");
for (GetQueryResultsResponse result : getQueryResultsResults) {
rows = result.resultSet().rows();
for (Row myRow : rows.subList(1, rows.size())) { // skip first row – column names
List<Datum> allData = myRow.data();
Event event = new Event();
event.setId(parseInt(allData.get(0).varCharValue()));
event.setVenueId(parseInt(allData.get(1).varCharValue()));
event.setCatId(parseInt(allData.get(2).varCharValue()));
event.setDateId(parseInt(allData.get(3).varCharValue()));
event.setName(allData.get(4).varCharValue());
event.setStartTime(LocalDateTime.parse(allData.get(5).varCharValue(), formatter));
events.add(event);
}
}
} catch (AthenaException e) {
logger.error(e.getMessage());
}
return events;
}
}

Using Spring’s common constructor-based Dependency Injection (DI) method (aka constructor injection), the service auto-wires an instance of the AthenaClientFactory interface. Note that we are auto-wiring the service interface, not the service implementation, allowing us to wire in a different implementation if desired, such as for testing.

The service calls the AthenaClientFactoryclass’s createClient() method, which returns a connection to Amazon Athena using one of several available authentication methods. The authentication scheme will depend on where the service is deployed and how you want to securely connect to AWS. Some options include environment variables, local AWS profile, EC2 instance profile, or token from the web identity provider.

return AthenaClient.builder()
.credentialsProvider(EnvironmentVariableCredentialsProvider.create())
.build();

The service class transforms the payload returned by an instance of GetQueryResultsResponse into an ordered collection (also known as a sequence), List<E>, where E represents a POJO. For example, with the data lake’srefined_tickit_public_event table, the service returns a List<Event>. This pattern repeats itself for tables, views, prepared statements, and named queries. Column data types can be transformed and formatted on the fly, new columns added, and existing columns skipped.

List<Row> rows;
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.S");
for (GetQueryResultsResponse result : getQueryResultsResults) {
rows = result.resultSet().rows();
for (Row myRow : rows.subList(1, rows.size())) { // skip first row – column names
List<Datum> allData = myRow.data();
Event event = new Event();
event.setId(parseInt(allData.get(0).varCharValue()));
event.setVenueId(parseInt(allData.get(1).varCharValue()));
event.setCatId(parseInt(allData.get(2).varCharValue()));
event.setDateId(parseInt(allData.get(3).varCharValue()));
event.setName(allData.get(4).varCharValue());
event.setStartTime(LocalDateTime.parse(allData.get(5).varCharValue(), formatter));
events.add(event);
}
}

For each endpoint defined in the Controller class, for example, get()findAll(), and FindById(), there is a corresponding method in the Service class. Below, we see an example of the findAll() method in the SalesByCategoryServiceImp service class. This method corresponds to the identically named method in the SalesByCategoryController controller class. Each of these service methods follows a similar pattern of constructing a dynamic Athena SQL query based on input parameters, which is passed to Athena through the AthenaClient service client using an instance of GetQueryResultsRequest.

public List<SalesByCategory> findAll(String calendarDate, Integer limit, Integer offset) {
if (limit == null || limit < 1 || limit > configProperties.getLimit()) {
limit = configProperties.getLimit();
}
if (offset == null || offset < 1) {
offset = 0;
}
String whereClause = "WHERE caldate IS NOT NULL";
if (calendarDate != null) {
whereClause = whereClause + " AND caldate=date('" + calendarDate + "')";
}
String query = String.format("""
SELECT *
FROM tickit_sales_by_day_and_category
%s
OFFSET %s
LIMIT %s;""", whereClause, offset, limit);
return startQuery(query);
}

Controller Class

Lastly, there is a corresponding Controller class for each API endpoint in the Spring Boot RESTful Web Service. The controller class uses Spring Framework’s @RestController annotation. According to the documentation, this annotation is a convenience annotation that is itself annotated with @Controller and @ResponseBody. Types that carry this annotation are treated as controllers where @RequestMapping methods assume @ResponseBody semantics by default.

The controller class takes a dependency on the corresponding service class application component using constructor-based Dependency Injection (DI). Like the service example above, we are auto-wiring the service interface, not the service implementation.

package com.example.athena.tickit.controller;
import com.example.athena.tickit.model.saas.Event;
import com.example.athena.tickit.service.EventService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.util.List;
@RestController
@RequestMapping(value = "/events")
public class EventController {
private final EventService service;
@Autowired
public EventController(EventService service) {
this.service = service;
}
@RequestMapping(method = RequestMethod.GET)
public ResponseEntity<List<Event>> findAll(
@RequestParam(required = false) Integer limit,
@RequestParam(required = false) Integer offset
) {
List<Event> events = service.findAll(limit, offset);
if (events.size() == 0) {
return ResponseEntity.status(HttpStatus.NOT_FOUND).body(null);
}
return ResponseEntity.status(HttpStatus.OK).body(events);
}
@RequestMapping(value = "/{id}", method = RequestMethod.GET)
public ResponseEntity<Event> findById(@PathVariable("id") int id) {
Event event = service.findById(id);
if (event == null) {
return ResponseEntity.status(HttpStatus.NOT_FOUND).body(null);
}
return ResponseEntity.status(HttpStatus.OK).body(event);
}
}

The controller is responsible for serializing the ordered collection of POJOs into JSON and returning that JSON payload in the body of the HTTP response to the initial HTTP request.

Querying Views

In addition to querying AWS Glue Data Catalog tables (aka Athena tables), we also query views. According to the documentation, a view in Amazon Athena is a logical table, not a physical table. Therefore, the query that defines a view runs each time the view is referenced in a query.

For convenience, each time the Spring Boot service starts, the main AthenaApplication class calls the View.java class’s CreateView() method to check for the existence of the view, view_tickit_sales_by_day_and_category. If the view does not exist, it is created and becomes accessible to all application end-users. The view is queried through the service’s /salesbycategory endpoint.

Java class relationships related to querying the Amazon Athena view

This confirm-or-create pattern is repeated for the prepared statement in the main AthenaApplication class (detailed in the next section).

package com.example.athena;
import com.example.athena.common.PreparedStatement;
import com.example.athena.common.View;
import com.example.athena.config.ConfigProperties;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.web.servlet.config.annotation.CorsRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
@SpringBootApplication
@EnableConfigurationProperties(ConfigProperties.class)
public class AthenaApplication {
private final PreparedStatement preparedStatement;
private final View view;
@Autowired
public AthenaApplication(PreparedStatement preparedStatement, View view) {
this.preparedStatement = preparedStatement;
this.view = view;
}
public static void main(String[] args) {
SpringApplication.run(AthenaApplication.class, args);
}
@Bean
void CreatePreparedStatement() {
preparedStatement.CreatePreparedStatement();
}
@Bean
void createView() {
view.CreateView();
}
@Bean
public WebMvcConfigurer corsConfigurer() {
return new WebMvcConfigurer() {
@Override
public void addCorsMappings(CorsRegistry registry) {
registry.addMapping("/**").allowedOrigins("*");
}
};
}
}

Below, we see the View class called by the service at startup.

package com.example.athena.common;
import com.example.athena.config.ConfigProperties;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import software.amazon.awssdk.services.athena.AthenaClient;
import software.amazon.awssdk.services.athena.model.GetTableMetadataRequest;
import software.amazon.awssdk.services.athena.model.GetTableMetadataResponse;
import software.amazon.awssdk.services.athena.model.MetadataException;
@Component
public class View {
private static final Logger logger = LoggerFactory.getLogger(View.class);
private final AthenaClientFactory athenaClientFactoryImp;
private final ConfigProperties configProperties;
private final AthenaCommon athenaCommon;
@Autowired
public View(AthenaClientFactory athenaClientFactoryImp,
ConfigProperties configProperties,
AthenaCommon athenaCommon) {
this.athenaClientFactoryImp = athenaClientFactoryImp;
this.configProperties = configProperties;
this.athenaCommon = athenaCommon;
}
public void CreateView() {
String viewName = "view_tickit_sales_by_day_and_category";
String createViewSqlStatement = String.format("""
CREATE VIEW %s AS
SELECT cast(d.caldate AS DATE) AS caldate,
c.catgroup,
c.catname,
sum(round(cast(s.pricepaid AS DECIMAL(8,2)) * s.qtysold, 2)) AS saleamount,
sum(cast(s.commission AS DECIMAL(8,2))) AS commission
FROM refined_tickit_public_sales AS s
LEFT JOIN refined_tickit_public_event AS e ON e.eventid = s.eventid
LEFT JOIN refined_tickit_public_date AS d ON d.dateid = s.dateid
LEFT JOIN refined_tickit_public_category AS c ON c.catid = e.catid
GROUP BY caldate,
catgroup,
catname
ORDER BY caldate,
catgroup,
catname;""", viewName);
try (AthenaClient athenaClient = athenaClientFactoryImp.createClient()) {
try {
GetTableMetadataResponse getPreparedStatementRequest = getGetTableMetadataResponse(viewName, athenaClient);
logger.debug(String.format("View already exists: %s", getPreparedStatementRequest.tableMetadata().name()));
} catch (MetadataException e) { // View does not exist
String queryExecutionId = athenaCommon.submitAthenaQuery(athenaClient, createViewSqlStatement);
athenaCommon.waitForQueryToComplete(athenaClient, queryExecutionId);
// Confirm View was created
GetTableMetadataResponse getPreparedStatementRequest = getGetTableMetadataResponse(viewName, athenaClient);
logger.debug(String.format("View created successfully: %s", getPreparedStatementRequest.tableMetadata().name()));
}
}
}
private GetTableMetadataResponse getGetTableMetadataResponse(String viewName, AthenaClient athenaClient) {
GetTableMetadataRequest getTableMetadataRequest = GetTableMetadataRequest.builder()
.catalogName(configProperties.getCatalog())
.databaseName(configProperties.getDatabase())
.tableName(viewName)
.build();
return athenaClient.getTableMetadata(getTableMetadataRequest);
}
}
view raw View.java hosted with ❤ by GitHub

Aside from the fact the /salesbycategory endpoint queries a view, everything else is identical to querying a table. This endpoint uses the same model-service-controller pattern.

Executing Prepared Statements

According to the documentation, you can use the Athena parameterized query feature to prepare statements for repeated execution of the same query with different query parameters. The prepared statement used by the service, tickit_sales_by_seller, accepts a single parameter, the ID of the seller (sellerid). The prepared statement is executed using the /salesbyseller endpoint. This scenario simulates an end-user of the analytics application who wants to retrieve enriched sales information about their sales.

package com.example.athena.common;
import com.example.athena.config.ConfigProperties;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import software.amazon.awssdk.services.athena.AthenaClient;
import software.amazon.awssdk.services.athena.model.CreatePreparedStatementRequest;
import software.amazon.awssdk.services.athena.model.GetPreparedStatementRequest;
import software.amazon.awssdk.services.athena.model.GetPreparedStatementResponse;
import software.amazon.awssdk.services.athena.model.ResourceNotFoundException;
@Component
public class PreparedStatement {
private static final Logger logger = LoggerFactory.getLogger(PreparedStatement.class);
private final AthenaClientFactory athenaClientFactoryImp;
private final ConfigProperties configProperties;
@Autowired
public PreparedStatement(AthenaClientFactory athenaClientFactoryImp, ConfigProperties configProperties) {
this.athenaClientFactoryImp = athenaClientFactoryImp;
this.configProperties = configProperties;
}
public void CreatePreparedStatement() {
String preparedStatementName = "tickit_sales_by_seller";
String preparedStatementSql = """
SELECT cast(d.caldate AS DATE) AS caldate,
s.pricepaid,
s.qtysold,
round(cast(s.pricepaid AS DECIMAL(8,2)) * s.qtysold, 2) AS saleamount,
cast(s.commission AS DECIMAL(8,2)) AS commission,
round((cast(s.commission AS DECIMAL(8,2)) / (cast(s.pricepaid AS DECIMAL(8,2)) * s.qtysold)) * 100, 2) AS commissionprcnt,
e.eventname,
concat(u1.firstname, ' ', u1.lastname) AS seller,
concat(u2.firstname, ' ', u2.lastname) AS buyer,
c.catgroup,
c.catname
FROM refined_tickit_public_sales AS s
LEFT JOIN refined_tickit_public_listing AS l ON l.listid = s.listid
LEFT JOIN refined_tickit_public_users AS u1 ON u1.userid = s.sellerid
LEFT JOIN refined_tickit_public_users AS u2 ON u2.userid = s.buyerid
LEFT JOIN refined_tickit_public_event AS e ON e.eventid = s.eventid
LEFT JOIN refined_tickit_public_date AS d ON d.dateid = s.dateid
LEFT JOIN refined_tickit_public_category AS c ON c.catid = e.catid
WHERE s.sellerid = ?
ORDER BY caldate,
eventname;""";
try (AthenaClient athenaClient = athenaClientFactoryImp.createClient()) {
try {
GetPreparedStatementResponse getPreparedStatementResponse = getGetPreparedStatementResponse(preparedStatementName, athenaClient);
logger.debug(String.format("Prepared statement already exists: %s", getPreparedStatementResponse.preparedStatement().statementName()));
} catch (ResourceNotFoundException e) { // PreparedStatement does not exist
CreatePreparedStatementRequest createPreparedStatementRequest = CreatePreparedStatementRequest.builder()
.statementName(preparedStatementName)
.description("Returns all sales by seller based on the seller's userid")
.workGroup(configProperties.getWorkGroup())
.queryStatement(preparedStatementSql).build();
athenaClient.createPreparedStatement(createPreparedStatementRequest);
// Confirm PreparedStatement was created
GetPreparedStatementResponse getPreparedStatementResponse = getGetPreparedStatementResponse(preparedStatementName, athenaClient);
logger.debug(String.format("Prepared statement created successfully: %s", getPreparedStatementResponse.preparedStatement().statementName()));
}
}
}
private GetPreparedStatementResponse getGetPreparedStatementResponse(String preparedStatementName, AthenaClient athenaClient) {
GetPreparedStatementRequest getPreparedStatementRequest = GetPreparedStatementRequest.builder()
.statementName(preparedStatementName)
.workGroup(configProperties.getWorkGroup()).build();
return athenaClient.getPreparedStatement(getPreparedStatementRequest);
}
}

The pattern of querying data is similar to tables and views, except instead of using the common SELECT...FROM...WHERE SQL query pattern, we use the EXECUTE...USING pattern.

package com.example.athena.tickit.service;
import com.example.athena.common.AthenaClientFactory;
import com.example.athena.common.AthenaCommon;
import com.example.athena.tickit.model.resultsets.SaleBySeller;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import software.amazon.awssdk.services.athena.AthenaClient;
import software.amazon.awssdk.services.athena.model.*;
import software.amazon.awssdk.services.athena.paginators.GetQueryResultsIterable;
import java.math.BigDecimal;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.List;
@Service
public class SaleBySellerServiceImp implements SaleBySellerService {
private static final Logger logger = LoggerFactory.getLogger(SaleBySellerServiceImp.class);
private final AthenaClientFactory athenaClientFactory;
private final AthenaCommon athenaCommon;
@Autowired
public SaleBySellerServiceImp(AthenaClientFactory athenaClientFactory, AthenaCommon athenaCommon) {
this.athenaClientFactory = athenaClientFactory;
this.athenaCommon = athenaCommon;
}
public List<SaleBySeller> find(int id) {
String query = String.format("""
EXECUTE tickit_sales_by_seller USING %s;""", id);
return startQuery(query);
}
private List<SaleBySeller> startQuery(String query) {
logger.debug(String.format("Query: %s", query.replace("\n", " ")));
AthenaClient athenaClient = athenaClientFactory.createClient();
String queryExecutionId = athenaCommon.submitAthenaQuery(athenaClient, query);
athenaCommon.waitForQueryToComplete(athenaClient, queryExecutionId);
List<SaleBySeller> saleBySellers = processResultRows(athenaClient, queryExecutionId);
athenaClient.close();
return saleBySellers;
}
private List<SaleBySeller> processResultRows(AthenaClient athenaClient, String queryExecutionId) {
List<SaleBySeller> saleBySellers = new ArrayList<>();
try {
GetQueryResultsRequest getQueryResultsRequest = GetQueryResultsRequest.builder()
.queryExecutionId(queryExecutionId).build();
GetQueryResultsIterable getQueryResultsResults = athenaClient.getQueryResultsPaginator(getQueryResultsRequest);
List<Row> rows;
for (GetQueryResultsResponse result : getQueryResultsResults) {
rows = result.resultSet().rows();
for (Row myRow : rows.subList(1, rows.size())) { // skip first row – column names
List<Datum> allData = myRow.data();
SaleBySeller saleBySeller = new SaleBySeller();
saleBySeller.setCalDate(LocalDate.parse(allData.get(0).varCharValue()));
saleBySeller.setPricePaid(new BigDecimal(allData.get(1).varCharValue()));
saleBySeller.setQtySold(Integer.parseInt(allData.get(2).varCharValue()));
saleBySeller.setSaleAmount(new BigDecimal(allData.get(3).varCharValue()));
saleBySeller.setCommission(new BigDecimal(allData.get(4).varCharValue()));
saleBySeller.setCommissionPrcnt(Double.valueOf(allData.get(5).varCharValue()));
saleBySeller.setEventName(allData.get(6).varCharValue());
saleBySeller.setSeller(allData.get(7).varCharValue());
saleBySeller.setBuyer(allData.get(8).varCharValue());
saleBySeller.setCatGroup(allData.get(9).varCharValue());
saleBySeller.setCatName(allData.get(10).varCharValue());
saleBySellers.add(saleBySeller);
}
}
} catch (AthenaException e) {
logger.error(e.getMessage());
}
return saleBySellers;
}
}

For example, to execute the prepared statement for a seller with an ID of 3, we would use EXECUTE tickit_sales_by_seller USING 3;. We pass the seller’s ID of 3 as a path parameter similar to other endpoints exposed by the service: /v1/salesbyseller/3.

Sales by seller query results from Athena using the seller’s ID as a parameter for the prepared statement

Again, aside from the fact the /salesbyseller endpoint executes a prepared statement and passes a parameter; everything else is identical to querying a table or a view, using the same model-service-controller pattern.

Working with Named Queries

In addition to tables, views, and prepared statements, Athena has the concept of saved queries, referred to as named queries in the Athena API and when using AWS CloudFormation. You can use the Athena console or API to save, edit, run, rename, and delete queries. The queries are persisted using a NamedQueryId, a unique identifier (UUID) of the query. You must reference the NamedQueryId when working with existing named queries.

Example of saved query (named query) used in this post

There are multiple ways to use and reuse existing named queries programmatically. For this demonstration, I created the named query, buyer_likes_by_category, in advance and then stored the resulting NamedQueryId as an application property, injected at runtime or kubernetes deployment time through a local environment variable.

package com.example.athena.tickit.service;
import com.example.athena.common.AthenaClientFactory;
import com.example.athena.common.AthenaCommon;
import com.example.athena.config.ConfigProperties;
import com.example.athena.tickit.model.resultsets.BuyerLikesByCategory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import software.amazon.awssdk.services.athena.AthenaClient;
import software.amazon.awssdk.services.athena.model.*;
import software.amazon.awssdk.services.athena.paginators.GetQueryResultsIterable;
import java.util.ArrayList;
import java.util.List;
import static java.lang.Integer.parseInt;
@Service
public class BuyerLikesByCategoryServiceImp implements BuyersLikesByCategoryService {
private static final Logger logger = LoggerFactory.getLogger(BuyerLikesByCategoryServiceImp.class);
private final ConfigProperties configProperties;
private final AthenaClientFactory athenaClientFactory;
private final AthenaCommon athenaCommon;
@Autowired
public BuyerLikesByCategoryServiceImp(ConfigProperties configProperties, AthenaClientFactory athenaClientFactory, AthenaCommon athenaCommon) {
this.configProperties = configProperties;
this.athenaClientFactory = athenaClientFactory;
this.athenaCommon = athenaCommon;
}
public List<BuyerLikesByCategory> get() {
return getNamedQueryResults(configProperties.getNamedQueryId());
}
private List<BuyerLikesByCategory> getNamedQueryResults(String queryId) {
logger.debug(String.format("NamedQueryId: %s", queryId));
AthenaClient athenaClient = athenaClientFactory.createClient();
GetNamedQueryRequest getNamedQueryRequest = GetNamedQueryRequest.builder()
.namedQueryId(queryId)
.build();
GetNamedQueryResponse getNamedQueryResponse = athenaClient.getNamedQuery(getNamedQueryRequest);
String queryExecutionId = athenaCommon.submitAthenaQuery(athenaClient, getNamedQueryResponse.namedQuery().queryString());
athenaCommon.waitForQueryToComplete(athenaClient, queryExecutionId);
List<BuyerLikesByCategory> buyerLikesByCategories = processResultRows(athenaClient, queryExecutionId);
athenaClient.close();
return buyerLikesByCategories;
}
private List<BuyerLikesByCategory> processResultRows(AthenaClient athenaClient, String queryExecutionId) {
List<BuyerLikesByCategory> buyerLikesByCategories = new ArrayList<>();
try {
// Max Results can be set but if it's not set,
// it will choose the maximum page size
GetQueryResultsRequest getQueryResultsRequest = GetQueryResultsRequest.builder()
.queryExecutionId(queryExecutionId).build();
GetQueryResultsIterable getQueryResultsResults = athenaClient.getQueryResultsPaginator(getQueryResultsRequest);
List<Row> rows;
for (GetQueryResultsResponse result : getQueryResultsResults) {
rows = result.resultSet().rows();
for (Row myRow : rows.subList(1, rows.size())) { // skip first row – column names
List<Datum> allData = myRow.data();
BuyerLikesByCategory buyerLikesByCategory = new BuyerLikesByCategory();
buyerLikesByCategory.setSports(parseInt(allData.get(0).varCharValue()));
buyerLikesByCategory.setTheatre(parseInt(allData.get(1).varCharValue()));
buyerLikesByCategory.setConcerts(parseInt(allData.get(2).varCharValue()));
buyerLikesByCategory.setJazz(parseInt(allData.get(3).varCharValue()));
buyerLikesByCategory.setClassical(parseInt(allData.get(4).varCharValue()));
buyerLikesByCategory.setOpera(parseInt(allData.get(5).varCharValue()));
buyerLikesByCategory.setRock(parseInt(allData.get(6).varCharValue()));
buyerLikesByCategory.setVegas(parseInt(allData.get(7).varCharValue()));
buyerLikesByCategory.setBroadway(parseInt(allData.get(8).varCharValue()));
buyerLikesByCategory.setMusicals(parseInt(allData.get(9).varCharValue()));
buyerLikesByCategories.add(buyerLikesByCategory);
}
}
} catch (AthenaException e) {
logger.error(e.getMessage());
}
return buyerLikesByCategories;
}
}

Alternately, you might iterate through a list of named queries to find one that matches the name at startup. However, this method would undoubtedly impact service performance, startup time, and cost. Lastly, you could use a method like NamedQuery() included in the unused NamedQuery class at startup, similar to the view and prepared statement. That named query’s unique NamedQueryId would be persisted as a system property, referencable by the service class. The downside is that you would create a duplicate of the named query each time you start the service. Therefore, this method is also not recommended.

Configuration

Two components responsible for persisting configuration for the Spring Boot service are the application.yml properties file and ConfigProperties class. The class uses Spring Framework’s @ConfigurationProperties annotation. According to the documentation, this annotation is used for externalized configuration. Add this to a class definition or a @Bean method in a @Configuration class if you want to bind and validate some external Properties (e.g., from a .properties or .yml file). Binding is performed by calling setters on the annotated class or, if @ConstructorBinding in use, by binding to the constructor parameters.

The @ConfigurationProperties annotation includes the prefix of athena. This value corresponds to the athena prefix in the the application.yml properties file. The fields in the ConfigProperties class are bound to the properties in the the application.yml. For example, the property, namedQueryId, is bound to the property, athena.named.query.id. Further, that property is bound to an external environment variable, NAMED_QUERY_ID. These values could be supplied from an external configuration system, a Kubernetes secret, or external secrets management system.

spring:
profiles:
active: dev
server:
port: 8080
servlet:
contextPath: /v1
athena:
region: us-east-1
workgroup: primary
catalog: AwsDataCatalog
database: tickit_demo
limit: 25
client-execution-timeout: 100000
retry-sleep: 1000
results-bucket: ${RESULTS_BUCKET}
named-query-id: ${NAMED_QUERY_ID}
spring:
config:
activate:
on-profile: dev
logging:
level:
root: DEBUG
management:
endpoints:
web:
exposure:
include: '*'
jmx:
exposure:
include: '*'
spring:
config:
activate:
on-profile: prod
logging:
level:
root: INFO
management:
endpoints:
web:
exposure:
include: health, prometheus
jmx:
exposure:
include: health
view raw application.yml hosted with ❤ by GitHub

AWS IAM: Authentication and Authorization

For the Spring Boot service to interact with Amazon Athena, AWS Glue, and Amazon S3, you need to establish an AWS IAM Role, which the service assumes once authenticated. The Role must be associated with an attached IAM Policy containing the requisite Athena, Glue, and S3 permissions. For development, the service uses a policy similar to the one shown below. Please note this policy is broader than recommended for Production; it does not represent the security best practice of least privilege. In particular, the use of the overly-broad * for Resources should be strictly avoided when creating policies.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:CreatePreparedStatement",
"athena:ListPreparedStatements",
"glue:CreateTable",
"athena:CreateNamedQuery",
"athena:ListNamedQueries",
"athena:GetTableMetadata",
"athena:GetPreparedStatement",
"athena:GetQueryResults",
"athena:GetQueryExecution",
"athena:GetNamedQuery"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"glue:BatchGetPartition",
"glue:GetTable"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
"s3:CreateBucket",
"s3:PutObject",
"s3:PutBucketPublicAccessBlock"
],
"Resource": [
"arn:aws:s3:::aws-athena-query-results-*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::date-lake-demo-*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListAllMyBuckets"
],
"Resource": [
"*"
]
}
]
}

In addition to the authorization granted by the IAM Policy, AWS Lake Formation can be used with Amazon S3, AWS Glue, and Amazon Athena to grant fine-grained database-, table-, column-, and row-level access to datasets.

Swagger UI and the OpenAPI Specification

The easiest way to view and experiment with all the endpoints available through the controller classes is using the Swagger UI, included in the example Spring Boot service, by way of the springdoc-openapi Java library. The Swagger UI is accessed at /v1/swagger-ui/index.html.

Swagger UI showing endpoints exposed by the service’s controller classes

The OpenAPI Specification (formerly Swagger Specification) is an API description format for REST APIs. The /v1/v3/api-docs endpoint allows you to generate an OpenAPI v3 specification file. The OpenAPI file describes the entire API.

Spring Boot service’s OpenAPI v3 specification

The OpenAPI v3 specification can be saved as a file and imported into applications like Postman, the API platform for building and using APIs.

Calling the service’s /users API endpoint using Postman
Running a suite of integration tests against the Spring Boot service using Postman

Integration Tests

Included in the Spring Boot service’s source code is a limited number of example integration tests, not to be confused with unit tests. Each test class uses Spring Framework’s @SpringBootTest annotation. According to the documentation, this annotation can be specified on a test class that runs Spring Boot-based tests. It provides several features over and above the regular Spring TestContext Framework.

package com.example.athena;
import io.restassured.http.ContentType;
import org.hamcrest.Matchers;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;
import static io.restassured.RestAssured.get;
import static io.restassured.RestAssured.given;
import static io.restassured.http.ContentType.JSON;
@SpringBootTest
class CategoriesResourceTests {
private static final String ResourcePath = "/v1/categories";
private static final int resultsetLimit = 25;
@Test
void findAll() {
given()
.contentType(ContentType.JSON)
.accept(ContentType.JSON)
.when()
.get(ResourcePath)
.then()
.assertThat()
.contentType(JSON)
.statusCode(200)
.body("$.size()", Matchers.greaterThanOrEqualTo(1))
.body("$.size()", Matchers.lessThanOrEqualTo(resultsetLimit));
}
@Test
void findAllWithLimit() {
given()
.contentType(ContentType.JSON)
.accept(ContentType.JSON)
.queryParam("limit", 3)
.when()
.get(ResourcePath)
.then()
.assertThat()
.contentType(JSON)
.statusCode(200)
.body("$.size()", Matchers.equalTo(3));
}
@Test
void findAllWithOffset() {
given()
.contentType(ContentType.JSON)
.accept(ContentType.JSON)
.queryParam("offset", 2)
.when()
.get(ResourcePath)
.then()
.assertThat()
.contentType(JSON)
.statusCode(200)
.body("$.size()", Matchers.greaterThanOrEqualTo(1))
.body("$.size()", Matchers.lessThanOrEqualTo(resultsetLimit));
}
@Test
void findAllWithLimitAndOffset() {
given()
.contentType(ContentType.JSON)
.accept(ContentType.JSON)
.queryParam("limit", 3)
.queryParam("offset", 2)
.when()
.get(ResourcePath)
.then()
.assertThat()
.contentType(JSON)
.statusCode(200)
.body("$.size()", Matchers.equalTo(3));
}
@Test
void findById() {
// Get the first 'id' available
int id = get(ResourcePath + "?limit=1")
.then()
.extract()
.path("[0].id");
given()
.contentType(ContentType.JSON)
.accept(ContentType.JSON)
.when()
.get(ResourcePath + "/{id}", id)
.then()
.assertThat()
.contentType(JSON)
.statusCode(200)
.body("id", Matchers.equalTo(id));
}
}

The integration tests use Rest Assured’s given-when-then pattern of testing, made popular as part of Behavior-Driven Development (BDD). In addition, each test uses the JUnit’s @Test annotation. According to the documentation, this annotation signals that the annotated method is a test method. Therefore, methods using this annotation must not be private or static and must not return a value.

@Test
void findById() {
// Get the first 'id' available
int id = get(ResourcePath + "?limit=1")
.then()
.extract()
.path("[0].id");
given()
.contentType(ContentType.JSON)
.accept(ContentType.JSON)
.when()
.get(ResourcePath + "/{id}", id)
.then()
.assertThat()
.contentType(JSON)
.statusCode(200)
.body("id", Matchers.equalTo(id));
}

Run the integration tests using Gradle from the project’s root: ./gradlew clean build test. A detailed ‘Test Summary’ is produced in the project’s build directory as HTML for easy review.

Test Details
Test Details

Load Testing the Service

In Production, the Spring Boot service will need to handle multiple concurrent users executing queries against Amazon Athena.

Athena’s Recent Queries console shows multi concurrent queries being queued and executed

We could use various load testing tools to evaluate the service’s ability to handle multiple concurrent users. One of the simplest is my favorite go-based utility, hey, which sends load to a URL using a provided number of requests in the provided concurrency level and prints stats. It also supports HTTP2 endpoints. So, for example, we could execute 500 HTTP requests with a concurrency level of 25 against the Spring Boot service’s /users endpoint using hey. The post’s integration tests were run against three Kubernetes replica pods of the service deployed to Amazon EKS.

hey -n 500 -c 25 -T "application/json;charset=UTF-8" \
-h2 https://athena.example-api.com/v1/users

From Athena’s Recent Queries console, we see many simultaneous queries being queued and executed by a hey through the Spring Boot service’s endpoint.

Athena’s Recent Queries console shows simultaneous queries being queued and executed

Metrics

The Spring Boot service implements the micrometer-registry-prometheus extension. The Micrometer metrics library exposes runtime and application metrics. Micrometer defines a core library, providing a registration mechanism for metrics and core metric types. These metrics are exposed by the service’s /v1/actuator/prometheus endpoint.

Metrics exposed using the Prometheus endpoint

Using the Micrometer extension, metrics exposed by the /v1/actuator/prometheus endpoint can be scraped and visualized by tools such as Prometheus. Conveniently, AWS offers the fully-managed Amazon Managed Service for Prometheus (AMP), which easily integrates with Amazon EKS.

Graph of HTTP server requests scraped by Prometheus from the Spring Boot service

Using Prometheus as a datasource, we can build dashboards in Grafana to observe the service’s metrics. Like AMP, AWS also offers the fully-managed Amazon Managed Grafana (AMG).

Grafana dashboard showing metrics from Prometheus for Spring Boot service deployed to Amazon EKS
Grafana dashboard showing JVM metrics from Prometheus for Spring Boot service deployed to Amazon EKS

Conclusion

This post taught us how to create a Spring Boot RESTful Web Service, allowing end-user applications to securely query data stored in a data lake on AWS. The service used AWS SDK for Java to access data stored in Amazon S3 through an AWS Glue Data Catalog using Amazon Athena.


This blog represents my own viewpoints and not of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners. All diagrams and illustrations are property of the author unless otherwise noted.

, , , , , ,

Leave a comment

Building and Deploying Cloud-Native Quarkus-based Java Applications to Kubernetes

Developing, testing, building, and deploying Native Quarkus-based Java microservices to Kubernetes on AWS, using GitOps

Introduction

Although it may no longer be the undisputed programming language leader, according to many developer surveys, Java still ranks right up there with Go, Python, C/C++, and JavaScript. Given Java’s continued popularity, especially amongst enterprises, and the simultaneous rise of cloud-native software development, vendors have focused on creating purpose-built, modern JVM-based frameworks, tooling, and standards for developing applications — specifically, microservices.

Leading JVM-based microservice application frameworks typically provide features such as native support for a Reactive programming modelMicroProfileGraalVM Native ImageOpenAPI and Swagger definition generation, GraphQLCORS (Cross-Origin Resource Sharing), gRPC (gRPC Remote Procedure Calls), CDI (Contexts and Dependency Injection), service discovery, and distributed tracing.

Leading JVM-based Microservices Frameworks

Review lists of the most popular cloud-native microservices framework for Java, and you are sure to find Spring Boot with Spring CloudMicronautHelidon, and Quarkus at or near the top.

Spring Boot with Spring Cloud

According to their website, Spring makes programming Java quicker, easier, and safer for everybody. Spring’s focus on speed, simplicity, and productivity has made it the world’s most popular Java framework. Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can just run. Spring Boot’s many purpose-built features make it easy to build and run your microservices in production at scale. However, the distributed nature of microservices brings challenges. Spring Cloud can help with service discovery, load-balancing, circuit-breaking, distributed tracing, and monitoring with several ready-to-run cloud patterns. It can even act as an API gateway.

Helidon

Oracle’s Helidon is a cloud-native, open‑source set of Java libraries for writing microservices that run on a fast web core powered by Netty. Helidon supports MicroProfile, a reactive programming model, and, similar to Micronaut, Spring, and Quarkus, it supports GraalVM Native Image.

Micronaut

According to their website, the Micronaut framework is a modern, open-source, JVM-based, full-stack toolkit for building modular, easily testable microservice and serverless applications. Micronaut supports a polyglot programming model, discovery services, distributed tracing, and aspect-oriented programming (AOP). In addition, Micronaut offers quick startup time, blazing-fast throughput, and a minimal memory footprint.

Quarkus

Quarkus, developed and sponsored by RedHat, is self-described as the ‘Supersonic Subatomic Java.’ Quarkus is a cloud-native, Kubernetes-native, [Linux] container first, microservices first framework for writing Java applications. Quarkus is a Kubernetes Native Java stack tailored for OpenJDK HotSpot and GraalVM, crafted from over fifty best-of-breed Java libraries and standards.

Developing Native Quarkus Microservices

In the following post, we will develop, build, test, deploy, and monitor a native Quarkus microservice application to Kubernetes. The RESTful service will expose a rich Application Programming Interface (API) and interacts with a PostgreSQL database on the backend.

High-level AWS architecture diagram of Quarkus application’s Production environment

Some of the features of the Quarkus application in this post include:

TL;DR

Do you want to explore the source code for this post’s Quarkus microservice application or deploy it to Kubernetes before reading the full article? All the source code and Kubernetes resources are open-source and available on GitHub:

git clone --depth 1 -b main \
https://github.com/garystafford/tickit-srv.git

The latest Docker Image is available on docker.io:

docker pull garystafford/tickit-srv:<latest-tag>

Quarkus Projects with IntelliJ IDE

Although not a requirement, I used JetBrains IntelliJ IDEA 2022 (Ultimate Edition) to develop and test the post’s Quarkus application. Bootstrapping Quarkus projects with IntelliJ is easy. Using the Quarkus plugin bundled with the Ultimate edition, developers can quickly create a Quarkus project.

JetBrains IntelliJ IDEA native support for Quarkus projects

The Quarkus plugin’s project creation wizard is based on code.quarkus.io. If you have bootstrapped a Spring Initializr project, code.quarkus.io works very similar to start.spring.io.

Adding extensions for a new Quarkus project in IntelliJ

Visual Studio Code

RedHat also provides a Quarkus extension for the popular Visual Studio Code IDE.

Visual Studio Code IDE with Quarkus extensions installed

Gradle

This post uses Gradle instead of Maven to develop, test, build, package, and deploy the Quarkus application to Kubernetes. Based on the packages selected in the new project setup shown above, the Quarkus plugin’s project creation wizard creates the following build.gradle file (Lombak added separately).

plugins {
id 'java'
id 'io.quarkus'
id 'io.freefair.lombok' version '6.4.3'
}
repositories {
mavenCentral()
mavenLocal()
}
dependencies {
implementation enforcedPlatform("${quarkusPlatformGroupId}:${quarkusPlatformArtifactId}:${quarkusPlatformVersion}")
implementation("io.quarkus:quarkus-container-image-docker")
implementation("io.quarkus:quarkus-kubernetes")
implementation("io.quarkus:quarkus-kubernetes-config")
implementation("io.quarkus:quarkus-resteasy-reactive")
implementation("io.quarkus:quarkus-resteasy-reactive-jackson")
implementation("io.quarkus:quarkus-hibernate-reactive-panache")
implementation("io.quarkus:quarkus-reactive-pg-client")
implementation("io.quarkus:quarkus-smallrye-health")
implementation("io.quarkus:quarkus-smallrye-openapi")
implementation("io.quarkus:quarkus-micrometer-registry-prometheus")
testImplementation("io.quarkus:quarkus-junit5")
testImplementation("io.rest-assured:rest-assured")
}
group 'com.tickit'
version '1.0.0'
java {
sourceCompatibility = JavaVersion.VERSION_17
targetCompatibility = JavaVersion.VERSION_17
}
compileJava {
options.encoding = 'UTF-8'
options.compilerArgs << '-parameters'
}
compileTestJava {
options.encoding = 'UTF-8'
}
view raw build.gradle hosted with ❤ by GitHub

The wizard also created the following gradle.properties file, which has been updated to the latest release of Quarkus available at the time of this post, 2.9.2.

quarkusPlatformArtifactId=quarkus-bom
quarkusPlatformGroupId=io.quarkus.platform
quarkusPlatformVersion=2.9.2.Final
quarkusPluginId=io.quarkus
quarkusPluginVersion=2.9.2.Final

Gradle and Quarkus

You can use the Quarkus CLI or the Quarkus Maven plugin to scaffold a Gradle project. Taking a dependency on the Quarkus plugin adds several additional Quarkus tasks to Gradle. We will use Gradle to develop, test, build, containerize, and deploy the Quarkus microservice application to Kubernetes. The quarkusDevquarkusTest, and quarkusBuild tasks will be particularly useful in this post.

Addition Quarkus Gradle tasks as seen in IntelliJ

Java Compilation

The Quarkus application in this post is compiled as a native image with the most recent Java 17 version of Mandrela downstream distribution of the GraalVM community edition.

GraalVM and Native Image

According to the documentation, GraalVM is a high-performance JDK distribution. It is designed to accelerate the execution of applications written in Java and other JVM languages while also providing runtimes for JavaScript, Ruby, Python, and other popular languages.

Further, according to GraalVM, Native Image is a technology to ahead-of-time compile Java code to a stand-alone executable, called a native image. This executable includes the application classes, classes from its dependencies, runtime library classes, and statically linked native code from the JDK. The Native Image builder (native-image) is a utility that processes all classes of an application and their dependencies, including those from the JDK. It statically analyzes data to determine which classes and methods are reachable during the application execution.

Mandrel

Mandrel is a downstream distribution of the GraalVM community edition. Mandrel’s main goal is to provide a native-image release specifically to support Quarkus. The aim is to align the native-image capabilities from GraalVM with OpenJDK and Red Hat Enterprise Linux libraries to improve maintainability for native Quarkus applications. Mandrel can best be described as a distribution of a regular OpenJDK with a specially packaged GraalVM Native Image builder (native-image).

Docker Image

Once complied, the native Quarkus executable will run within the quarkus-micro-image:1.0 base runtime image deployed to Kubernetes. Quarkus provides this base image to ease the containerization of native executables. It has a minimal footprint (10.9 compressed/29.5 MB uncompressed) compared to other images. For example, the latest UBI (Universal Base Image) Quarkus Mandrel image (ubi-quarkus-mandrel:22.1.0.0-Final-java17) is 714 MB uncompressed, while the OpenJDK 17 image (openjdk:17-jdk) is 471 MB uncompressed. Even RedHat’s Universal Base Image Minimal image (ubi-minimal:8.6) is 93.4 MB uncompressed.

Uncompressed Quarkus-related Docker images for a size comparison

An even smaller option from Quarkus is a distroless base image (quarkus-distroless-image:1.0) is only 9.2 MB compressed / 22.7 MB uncompressed. Quarkus is careful to note that distroless image support is experimental and should not be used in production without rigorous testing.

PostgreSQL Database

For the backend data persistence tier of the Quarkus application, we will use PostgreSQL. All DDL (Data Definition Language) and DML (Data Manipulation Language) statements used in the post were tested with the most current version of PostgreSQL 14.

There are many PostgreSQL-compatible sample databases available that could be used for this post. I am using the TICKIT sample database provided by AWS and designed for Amazon Redshift, AWS’s cloud data warehousing service. The database consists of seven tables — two fact tables and five dimensions tables — in a traditional data warehouse star schema.

For this post, I have remodeled the TICKIT database’s star schema into a normalized relational data model optimized for the Quarkus application. The most significant change to the database is splitting the original Users dimension table into two separate tables — buyer and seller. This change will allow for better separation of concerns (SoC), scalability, and increased protection of Personal Identifiable Information (PII).

TICKIT database relational data model used in post

Source Code

Each of the six tables in the PostgreSQL TICKIT database is represented by an Entity, Repository, and Resource Java class.

View of Quarkus application’s source code

Entity Class

Java Persistence is the API for managing persistence and object/relational mapping. The Java Persistence API (JPA) provides Java developers with an object/relational mapping facility for managing relational data in Java applications. Each table in the PostgreSQL TICKIT database is represented by a Java Persistence Entity, as indicated by the Entity annotation on the class declaration. The annotation specifies that the class is an entity.

JPA entity-relationship, mirroring the database’s data model

Each entity class extends the PanacheEntityBase class, part of the io.quarkus.hibernate.orm.panache package. According to the Quarkus documentation, You can specify your own custom ID strategy, which is done in this post’s example, by extending PanacheEntityBase instead of PanacheEntity.

If you do not want to bother defining getters/setters for your entities, which we did not in the post’s example, extending PanacheEntityBase, Quarkus will generate them for you. Alternately, extend PanacheEntity and take advantage of the default ID it provides if you are not using a custom ID strategy.

The example SaleEntity class shown below is typical of the Quarkus application’s entities. The entity class contains several additional JPA annotations in addition to Entity, including TableNamedQueriesIdSequenceGeneratorGeneratedValue, and Column. The entity class also leverages Project Lombok annotations. Lombok generates two boilerplate constructors, one that takes no arguments (NoArgsConstructor) and one that takes one argument for every field (AllArgsConstructor).

The SaleEntity class also defines two many-to-one relationships, with the ListingEntity and BuyerEntity entity classes. This relationship mirrors the database’s data model, as reflected in the schema diagram above. The relationships are defined using the ManyToOne and JoinColumn JPA annotations.

package com.tickit.sale;
import com.tickit.buyer.BuyerEntity;
import com.tickit.listing.ListingEntity;
import io.quarkus.hibernate.reactive.panache.PanacheEntityBase;
import lombok.AllArgsConstructor;
import lombok.NoArgsConstructor;
import javax.persistence.*;
import java.math.BigDecimal;
import java.time.LocalDateTime;
@Entity
@NoArgsConstructor
@AllArgsConstructor
@Table(name = "sale", schema = "public", catalog = "tickit")
@NamedQueries({
@NamedQuery(name = "SaleEntity.getBySellerId", query = """
select sale, listing, seller
from SaleEntity as sale
join sale.listing as listing
join listing.seller as seller
where seller.id = ?1"""
),
@NamedQuery(name = "SaleEntity.getByEventId", query = """
select sale, listing, event
from SaleEntity as sale
join sale.listing as listing
join listing.event as event
where event.id = ?1"""
)})
public class SaleEntity extends PanacheEntityBase {
@Id
@SequenceGenerator(
name = "saleSeq",
sequenceName = "sale_sale_id_seq",
schema = "public",
initialValue = 175000,
allocationSize = 1)
@GeneratedValue(
strategy = GenerationType.SEQUENCE,
generator = "saleSeq")
@Column(name = "saleid", nullable = false)
public int id;
@Column(name = "qtysold", nullable = false)
public short quantitySold;
@Column(name = "pricepaid", nullable = false, precision = 2)
public BigDecimal pricePaid;
@Column(name = "commission", nullable = false, precision = 2)
public BigDecimal commission;
@Column(name = "saletime", nullable = false)
public LocalDateTime saleTime;
@ManyToOne(optional = false)
@JoinColumn(name = "listid", referencedColumnName = "listid", nullable = false)
public ListingEntity listing;
@ManyToOne(optional = false)
@JoinColumn(name = "buyerid", referencedColumnName = "buyerid", nullable = false)
public BuyerEntity buyer;
}
view raw SaleEntity.java hosted with ❤ by GitHub

Given the relationships between the entities, a saleEntity object, represented as a nested JSON object, would look as follows:

{
"id": 27,
"quantitySold": 1,
"pricePaid": 111,
"commission": 16.65,
"saleTime": "2008-10-13T01:09:47",
"listing": {
"id": 28,
"numTickets": 1,
"pricePerTicket": 111,
"totalPrice": 111,
"listTime": "2008-10-08T03:56:33",
"seller": {
"id": 32241,
"username": "VRV70PKM",
"firstName": "Olga",
"lastName": "Sharpe",
"city": "Yuma",
"state": "DC",
"email": "Aliquam.adipiscing@urnanecluctus.org",
"phone": "(377) 405-5662",
"taxIdNumber": "265116930"
},
"event": {
"id": 1820,
"name": "The Farnsworth Invention",
"startTime": "2008-11-03T20:00:00",
"venue": {
"id": 220,
"name": "Lunt-Fontanne Theatre",
"city": "New York City",
"state": "NY",
"seats": 1500
},
"category": {
"id": 7,
"group": "Shows",
"name": "Plays",
"description": "All non-musical theatre"
}
}
},
"buyer": {
"id": 4695,
"username": "DRU13CBT",
"firstName": "Tamekah",
"lastName": "Frye",
"city": "Washington",
"state": "NB",
"email": "tempus.risus@vulputate.edu",
"phone": "(969) 804-4123",
"likeSports": false,
"likeTheatre": true,
"likeConcerts": true,
"likeJazz": false,
"likeClassical": true,
"likeOpera": false,
"likeRock": false,
"likeVegas": false,
"likeBroadway": true,
"likeMusicals": false
}
}
view raw sales.json hosted with ❤ by GitHub

Repository Class

Each table in the PostgreSQL TICKIT database also has a corresponding repository class, often referred to as the ‘repository pattern.’ The repository class implements the PanacheRepositoryBase interface, part of the io.quarkus.hibernate.orm.panache package. The PanacheRepositoryBase Java interface represents a Repository for a specific type of Entity. According to the documentation, if you are using repositories and have a custom ID strategy, then you will want to extend PanacheRepositoryBase instead of PanacheRepository and specify your ID type as an extra type parameter. Implementing the PanacheRepositoryBase will give you the same methods on the PanacheEntityBase.

A partial list of methods exposed by the PanacheRepositoryBase

The repository class allows us to leverage the methods already available through PanacheEntityBase and add additional custom methods. For example, the repository class contains a custom method listWithPaging. This method retrieves (GET) a list of SaleEntity objects with the added benefit of being able to indicate the page number, page size, sort by field, and sort direction.

Since there is a many-to-one relationship between the SaleEntity class and the ListingEntity and BuyerEntity entity classes, we also have two custom methods that retrieve all SaleEntity objects by either the BuyerEntity ID or the EventEntity ID. These two methods call the SQL queries in the SaleEntity, annotated with the JPA NamedQueries/NamedQuery annotations on the class declaration.

SmallRye Mutiny

Each method defined in the repository class returns a SmallRye Mutiny Uni<T>. According to the website, Mutiny is an intuitive, event-driven Reactive programming library for Java. Mutiny provides a simple but powerful asynchronous development model that lets you build reactive applications. Mutiny can be used in any Java application exhibiting asynchrony, including reactive microservices, data streaming, event processing, API gateways, and network utilities.

Uni

Again, according to Mutiny’s documentation, a Uni represents a stream that can only emit either an item or a failure event. A Uni<T> is a specialized stream that emits only an item or a failure. Typically, Uni<T> are great for representing asynchronous actions such as a remote procedure call, an HTTP request, or an operation producing a single result. A Uni represents a lazy asynchronous action. It follows the subscription pattern, meaning that the action is only triggered once a UniSubscriber subscribes to the Uni.

package com.tickit.sale;
import io.quarkus.hibernate.reactive.panache.PanacheRepositoryBase;
import io.quarkus.panache.common.Sort;
import io.smallrye.mutiny.Uni;
import javax.enterprise.context.ApplicationScoped;
import java.util.List;
import java.util.Objects;
@ApplicationScoped
public class SaleRepository implements PanacheRepositoryBase<SaleEntity, Integer> {
public Uni<List<SaleEntity>> listWithPaging(String sortBy, String orderBy, Integer page, Integer size) {
if (page < 1) page = 1;
if (size < 1) size = 5;
page = page1; // zero-based
if (sortBy == null) sortBy = "id";
Sort.Direction direction = Sort.Direction.Ascending;
if (Objects.equals(orderBy, "desc")) direction = Sort.Direction.Descending;
return SaleEntity.findAll(Sort.by(sortBy).direction(direction)).page(page, size).list();
}
public Uni<List<SaleEntity>> getBySellerId(Integer id) {
return SaleEntity.find("#SaleEntity.getBySellerId", id).list();
}
public Uni<List<SaleEntity>> getByEventId(Integer id) {
return SaleEntity.find("#SaleEntity.getByEventId", id).list();
}
}

Resource Class

Lastly, each table in the PostgreSQL TICKIT database has a corresponding resource class. According to the Quarkus documentation, all the operations defined within PanacheEntityBase are available on your repository, so using it is exactly the same as using the active record pattern, except you need to inject it. We inject the corresponding repository class into the resource class, exposing all the available methods of the repository and PanacheRepositoryBase. For example, note the custom listWithPaging method below, which was declared in the SaleRepository class.

A partial list of methods exposed by injecting the repository class into the resource class

Similar to the repository class, each method defined in the resource class also returns a SmallRye Mutiny (io.smallrye.mutinyUni<T>.

The repository defines HTTP methods (POSTGETPUT, and DELETE) corresponding to CRUD operations on the database (Create, Read, Update, and Delete). The methods are annotated with the corresponding javax.ws.rs annotation, indicating the type of HTTP request they respond to. The javax.ws.rs package contains high-level interfaces and annotations used to create RESTful service resources, such as our Quarkus application.

The POSTPUT, and DELETE annotated methods all have the io.quarkus.hibernate.reactive.panache.common.runtime package’s ReactiveTransactional annotation associated with them. We use this annotation on methods to run them in a reactive Mutiny.Session.Transation. If the annotated method returns a Uni, which they do, this has precisely the same behavior as if the method was enclosed in a call to Mutiny.Session.withTransaction(java.util.function.Function). If the method call fails, the complete transaction is rolled back.

package com.tickit.sale;
import io.quarkus.hibernate.reactive.panache.common.runtime.ReactiveTransactional;
import io.smallrye.mutiny.Uni;
import org.jboss.resteasy.reactive.ResponseStatus;
import javax.inject.Inject;
import javax.ws.rs.*;
import javax.ws.rs.core.MediaType;
import java.util.List;
@Path("sales")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class SaleResource {
@Inject
SaleRepository saleRepository;
@GET
public Uni<List<SaleEntity>> list(
@QueryParam("sort_by") String sortBy,
@QueryParam("order_by") String orderBy,
@QueryParam("page") int page,
@QueryParam("size") int size
) {
return saleRepository.listWithPaging(sortBy, orderBy, page, size);
}
@GET
@Path("{id}")
public Uni<SaleEntity> get(Integer id) {
return SaleEntity.findById(id);
}
@POST
@ResponseStatus(201)
@ReactiveTransactional
public Uni<SaleEntity> create(SaleEntity sale) {
return SaleEntity.persist(sale).replaceWith(sale);
}
@PUT
@Path("{id}")
@ReactiveTransactional
public Uni<SaleEntity> update(Integer id, SaleEntity sale) {
return SaleEntity.<SaleEntity>findById(id).onItem().ifNotNull().invoke(
entity -> {
entity.quantitySold = sale.quantitySold;
entity.pricePaid = sale.pricePaid;
entity.commission = sale.commission;
entity.saleTime = sale.saleTime;
entity.listing = sale.listing;
entity.buyer = sale.buyer;
}
);
}
@DELETE
@Path("{id}")
@ReactiveTransactional
public Uni<Void> delete(Integer id) {
return SaleEntity.deleteById(id).replaceWithVoid();
}
@GET
@Path("/event/{id}")
public Uni<List<SaleEntity>> getByEventId(Integer id) {
return saleRepository.getByEventId(id);
}
@GET
@Path("/listing/{id}")
public Uni<List<SaleEntity>> getByListingId(Integer id) {
return SaleEntity.list("listid", id);
}
@GET
@Path("/buyer/{id}")
public Uni<List<SaleEntity>> getByBuyerId(Integer id) {
return SaleEntity.list("buyerid", id);
}
@GET
@Path("/seller/{id}")
public Uni<List<SaleEntity>> getBySellerId(Integer id) {
return saleRepository.getBySellerId(id);
}
@GET
@Path("/count")
public Uni<Long> count() {
return SaleEntity.count();
}
}

Developer Experience

Quarkus has several features to enhance the developer experience. Features include Dev ServicesDev UIlive reload of code without requiring a rebuild and restart of the application, continuous testing where tests run immediately after code changes have been saved, configuration profiles, Hibernate ORM, JUnit, and REST Assured integrations. Using these Quarkus features, it’s easy to develop and test Quarkus applications.

Configuration Profiles

Similar to Spring, Quarkus works with configuration profiles. According to RedHat, you can use different configuration profiles depending on your environment. Configuration profiles enable you to have multiple configurations in the same application.properties file and select between them using a profile name. Quarkus recognizes three default profiles:

  • dev: Activated in development mode
  • test: Activated when running tests
  • prod: The default profile when not running in development or test mode

In the application.properties file, the profile is prefixed using %environment. format. For example, when defining Quarkus’ log level as INFO, you add the common quarkus.log.level=INFO property. However, to change only the test environment’s log level to DEBUG, corresponding to the test profile, you would add a property with the %test. prefix, such as %test.quarkus.log.level=DEBUG.

Dev Services

Quarkus supports the automatic provisioning of unconfigured services in development and test mode, referred to as Dev Services. If you include an extension and do not configure it, then Quarkus will automatically start the relevant service using Test containers behind the scenes and wire up your application to use this service.

When developing your Quarkus application, you could create your own local PostgreSQL database, for example, with Docker:

docker run –name postgres-dev \
-p 5432:5432 \
-e POSTGRES_DB=tickit \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres123 \
-d postgres:14.2-alpine3.15

And the corresponding application configuration properties:

quarkus.datasource.username=postgres
quarkus.datasource.password=postgres123
quarkus.datasource.reactive.url=vertx-reactive:postgresql://localhost:5432/tickit

Zero-Config Database

Alternately, we can rely on Dev Services, using a feature referred to as zero config setup. Quarkus provides you with a zero-config database out of the box; no database configuration is required. Quarkus takes care of provisioning the database, running your DDL and DML statements to create database objects and populate the database with test data, and finally, de-provisioning the database container when the development or test session is completed. The database Dev Services will be enabled when a reactive or JDBC datasource extension is present in the application and the database URL has not been configured.

Using the quarkusDev Gradle task, we can start the application running, as shown in the video below. Note the two new Docker containers that are created. Also, note the project’s import.sql SQL script is run automatically, executing all DDL and DML statements to prepare and populate the database.

Using the ‘quarkusDev’ Gradle task to start a Quarkus application’s API locally

Bootstrapping the TICKIT Database

When using Hibernate ORM with Quarkus, we have several options regarding how the database is handled when the Quarkus application starts. These are defined in the application.properties file. The quarkus.hibernate-orm.database.generation property determines whether the database schema is generated or not. drop-and-create is ideal in development mode, as shown above. This property defaults to none, however, if Dev Services is in use and no other extensions that manage the schema are present, this will default to drop-and-create. Accepted values: nonecreatedrop-and-createdropupdatevalidate. For development and testing modes, we are using Dev Services with the default value of drop-and-create. For this post, we assume the database and schema already exist in production.

A second property, quarkus.hibernate-orm.sql-load-script, provides the path to a file containing the SQL statements to execute when Hibernate ORM starts. In dev and test modes, it defaults to import.sql. Simply add an import.sql file in the root of your resources directory, Hibernate will be picked up without having to set this property. The project contains an import.sql script to create all database objects and a small amount of test data. You can also explicitly set different files for different profiles and prefix the property with the profile (e.g., %dev. or %test.).

%dev.quarkus.hibernate-orm.database.generation=drop-and-create
%dev.quarkus.hibernate-orm.sql-load-script=import.sql

Another option is Flyway, the popular database migration tool commonly used in JVM environments. Quarkus provides first-class support for using Flyway.

Dev UI

According to the documentation, Quarkus now ships with a new experimental Dev UI, which is available in dev mode (when you start Quarkus with Gradle’s quarkusDev task) at /q/dev by default. It allows you to quickly visualize all the extensions currently loaded, see their status and go directly to their documentation. In addition to access to loaded extensions, you can review logs and run tests in the Dev UI.

Quarkus Dev UI showing logs and tests

Configuration

From the Dev UI, you can access and modify the Quarkus application’s application configuration.

Quarkus Dev UI’s Config Editor

You also can view the configuration of Dev Services, including the running containers and no-config database config.

Dev Services configuration console

Quarkus REST Score Console

With RESTEasy Reactive extension loaded, you can access the Quarkus REST Score Console from the Dev UI. The REST Score Console shows endpoint performance through scores and color-coding: green, yellow, or red. RedHat published a recent blog that talks about the scoring process and how to optimize the performance endpoints. Three measurements show whether a REST reactive application can be optimized further.

Measurements of REST reactive application endpoints

Application Testing

Quarkus enables robust JVM-based and Native continuous testing by providing integrations with common test frameworks, such as including JUnitMockito, and REST Assured. Many of Quarkus’ testing features are enabled through annotations, such as QuarkusTestResourceQuarkusTestQuarkusIntegrationTest, and TransactionalQuarkusTest.

Quarkus supports the use of mock objects using two different approaches. You can either use CDI alternatives to mock out a bean for all test classes or use QuarkusMock to mock out beans on a per-test basis. This includes integration with Mockito.

The REST Assured integration is particularly useful for testing the Quarkus microservice API application. According to their website, REST Assured is a Java DSL for simplifying testing of REST-based services. It supports the most common HTTP request methods and can be used to validate and verify the response of these requests. REST Assured uses the given()when()then() methods of testing made popular as part of Behavior-Driven Development (BDD).

@Test
void listWithQueryParams() {
List<CategoryEntity> category = given()
.when()
.get("v1/categories?page=2&size=4&sort_by=id")
.then()
.statusCode(Response.Status.OK.getStatusCode())
.extract()
.as(new TypeRef<>() {});
Assertions.assertEquals(category.size(), 4);
Assertions.assertEquals(category.get(0).id, 5);
}
view raw test.java hosted with ❤ by GitHub

The tests can be run using the the quarkusTest Gradle task. The application contains a small number of integration tests to demonstrate this feature.

Quarkus application test results report

Swagger and OpenAPI

Quarkus provides the Smallrye OpenAPI extension compliant with the MicroProfile OpenAPI specification, which allows you to generate an API OpenAPI v3 specification and expose the Swagger UI. The /q/swagger-ui resource exposes the Swagger UI, allowing you to visualize and interact with the Quarkus API’s resources without having any implementation logic in place.

Swagger UI showing the Quarkus application’s API resources

Resources can be tested using the Swagger UI without writing any code.

Testing the Quarkus application’s API resource in the Swagger UI

OpenAPI Specification (formerly Swagger Specification) is an API description format for REST APIs. The /q/openapi resource allows you to generate an OpenAPI v3 specification file. An OpenAPI file allows you to describe your entire API.

OpenAPI v3 specification is accessible via the Quarkus application’s API resource

The OpenAPI v3 specification can be saved as a file and imported into applications like Postman, the API platform for building and using APIs.

Importing the OpenAPI file for the Quarkus microservice into Postman
Using the OpenAPI API specification in Postman to interact with the API’s resources

GitOps with GitHub Actions

For this post, GitOps is used to continuously test, build, package, and deploy the Quarkus microservice application to Kubernetes. Specifically, the post uses GitHub Actions. GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that allows you to automate your build, test, and deployment pipelines. Workflows are defined in the .github/workflows directory in a repository, and a repository can have multiple workflows, each of which can perform a different set of tasks.

Results of the GitHub Action Workflows

Two GitHub Actions are associated with this post’s GitHub repository. The first action, build-test.yml, natively builds and tests the source code in a native Mandrel container on each push to GitHub. The second action (shown below), docker-build-push.yml, builds and containerizes the natively-built executable, pushes it to Docker’s Container Registry (docker.io), and finally deploys the application to Kubernetes. This action is triggered by pushing a new Git Tag to GitHub.

Git Tags associated with the Quarkus application that triggers a deployment

There are several Quarkus configuration properties included in the action’s build step. Alternately, these properties could be defined in the application.properties file. However, I have decided to include them as part of the Gradle build task since they are specific to the type of build and container registry and Kubernetes platform I am pushing to artifacts.

name: Quarkus Native Docker Build, Push, Deploy
on:
push:
tags:
"*.*.*"
jobs:
build:
runs-on: ubuntu-latest
steps:
name: Check out the repo
uses: actions/checkout@v3
name: Set up JDK 17
uses: actions/setup-java@v3
with:
java-version: '17'
distribution: 'corretto'
cache: 'gradle'
name: Set the incremental Docker image tag
run: |
echo "RELEASE_VERSION=${GITHUB_REF:10}" >> $GITHUB_ENV
env | sort
name: Validate Gradle wrapper
uses: gradle/wrapper-validation-action@e6e38bacfdf1a337459f332974bb2327a31aaf4b
name: Build and push Quarkus native Docker image
uses: gradle/gradle-build-action@0d13054264b0bb894ded474f08ebb30921341cee
with:
arguments: |
build
-Dquarkus.package.type=native
-Dquarkus.native.builder-image=quay.io/quarkus/ubi-quarkus-mandrel:22.1.0.0-Final-java17
-Dquarkus.docker.dockerfile-native-path=src/main/docker/Dockerfile.native-micro
-Dquarkus.native.container-build=true
-Dquarkus.container-image.group=${GITHUB_REPOSITORY_OWNER}
-Dquarkus.container-image.tag=${{ env.RELEASE_VERSION }}
-Dquarkus.kubernetes.replicas=4
-Dquarkus.kubernetes-config.secrets=tickit
-Dquarkus.kubernetes-config.secrets.enabled=true
-Dquarkus.kubernetes.service-type=node-port
-Dquarkus.kubernetes.node-port=32319
-Dquarkus.kubernetes.part-of=tickit-app
-Dquarkus.kubernetes.version=1.0.0
-Dquarkus.kubernetes.name=tickit-srv
-Dquarkus.kubernetes.resources.requests.memory=64Mi
-Dquarkus.kubernetes.resources.requests.cpu=250m
-Dquarkus.kubernetes.resources.limits.memory=128Mi
-Dquarkus.kubernetes.resources.limits.cpu=500m
-Dquarkus.container-image.username=${{ secrets.DOCKERHUB_USERNAME }}
-Dquarkus.container-image.password=${{ secrets.DOCKERHUB_PASSWORD }}
-Dquarkus.container-image.push=true
–info
name: Display Kubernetes resources
run: cat build/kubernetes/kubernetes.yml
name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
name: Apply resources
uses: kodermax/kubectl-aws-eks@master
env:
KUBE_CONFIG_DATA: ${{ secrets.KUBE_CONFIG_DATA }}
KUBECTL_VERSION: "v1.23.6"
IAM_VERSION: "0.5.8"
with:
args: apply -f build/kubernetes/kubernetes.yml -n tickit
name: Get Kubernetes resources
uses: kodermax/kubectl-aws-eks@master
env:
KUBE_CONFIG_DATA: ${{ secrets.KUBE_CONFIG_DATA }}
KUBECTL_VERSION: "v1.23.6"
IAM_VERSION: "0.5.8"
with:
args: get all -n tickit
name: Upload Kubernetes artifact
uses: actions/upload-artifact@v3
with:
name: kubernetes-artifact
path: build/kubernetes/kubernetes.yml

Kubernetes Resources

The Kubernetes resources YAML file, created by the Quarkus build, is also uploaded and saved as an artifact in GitHub by the final step in the GitHub Action.

Kubernetes file saved as GitHub Action Artifact for reference

Quarkus automatically generates ServiceAccountRoleRoleBindingServiceDeployment resources.

apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-06-05 – 23:49:30 +0000
labels:
app.kubernetes.io/part-of: tickit-app
app.kubernetes.io/version: 1.0.0
app.kubernetes.io/name: tickit-srv
name: tickit-srv
apiVersion: v1
kind: Service
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-06-05 – 23:49:30 +0000
prometheus.io/scrape: "true"
prometheus.io/path: /q/metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
labels:
app.kubernetes.io/name: tickit-srv
app.kubernetes.io/part-of: tickit-app
app.kubernetes.io/version: 1.0.0
name: tickit-srv
spec:
ports:
name: http
nodePort: 32319
port: 80
targetPort: 8080
selector:
app.kubernetes.io/name: tickit-srv
app.kubernetes.io/part-of: tickit-app
app.kubernetes.io/version: 1.0.0
type: NodePort
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: view-secrets
rules:
apiGroups:
""
resources:
secrets
verbs:
get
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: tickit-srv-view
roleRef:
kind: ClusterRole
apiGroup: rbac.authorization.k8s.io
name: view
subjects:
kind: ServiceAccount
name: tickit-srv
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: tickit-srv-view-secrets
roleRef:
kind: Role
apiGroup: rbac.authorization.k8s.io
name: view-secrets
subjects:
kind: ServiceAccount
name: tickit-srv
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-06-05 – 23:49:30 +0000
prometheus.io/scrape: "true"
prometheus.io/path: /q/metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
labels:
app.kubernetes.io/part-of: tickit-app
app.kubernetes.io/version: 1.0.0
app.kubernetes.io/name: tickit-srv
name: tickit-srv
spec:
replicas: 3
selector:
matchLabels:
app.kubernetes.io/part-of: tickit-app
app.kubernetes.io/version: 1.0.0
app.kubernetes.io/name: tickit-srv
template:
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-06-05 – 23:49:30 +0000
prometheus.io/scrape: "true"
prometheus.io/path: /q/metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
labels:
app.kubernetes.io/part-of: tickit-app
app.kubernetes.io/version: 1.0.0
app.kubernetes.io/name: tickit-srv
spec:
containers:
env:
name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: garystafford/tickit-srv:1.1.3
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /q/health/live
port: 8080
scheme: HTTP
initialDelaySeconds: 0
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
name: tickit-srv
ports:
containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /q/health/ready
port: 8080
scheme: HTTP
initialDelaySeconds: 0
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 250m
memory: 64Mi
serviceAccountName: tickit-srv
view raw kubernetes.yml hosted with ❤ by GitHub

Choosing a Kubernetes Platform

The only cloud provider-specific code is in the second GitHub action.

jobs:
build:
runs-on: ubuntu-latest
steps:
name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
name: Apply resources
uses: kodermax/kubectl-aws-eks@master
env:
KUBE_CONFIG_DATA: ${{ secrets.KUBE_CONFIG_DATA }}
KUBECTL_VERSION: "v1.23.6"
IAM_VERSION: "0.5.8"
with:
args: apply -f build/kubernetes/kubernetes.yml -n tickit

In this case, the application is being deployed to an existing Amazon Elastic Kubernetes Service (Amazon EKS), a fully managed, certified Kubernetes conformant service from AWS. These steps can be easily replaced with steps to deploy to other Cloud platforms, such as Microsoft’s Azure Kubernetes Service (AKS) or Google Cloud’s Google Kubernetes Engine (GKE).

GitHub Secrets

Some of the properties use GitHub environment variables, and others use secure GitHub repository encrypted secrets. Secrets are used to secure Docker credentials used to push the Quarkus application image to Docker’s image repository, AWS IAM credentials, and the base64 encoded contents of the kubeconfig file required to deploy to Kubernetes on AWS when using the kodermax/kubectl-aws-eks@master GitHub action.

GitHub secure GitHub encrypted repository secrets for GitHub Actions

Docker

Reviewing the configuration properties included in the action’s build step, note the Mandrel container used to build the native Quarkus application, quay.io/quarkus/ubi-quarkus-mandrel:22.1.0.0-Final-java17. Also, note the project’s Docker file is used to build the final Docker image, pushed to the image repository, and then used to provision containers on Kubernetes, src/main/docker/Dockerfile.native-micro. This Dockerfile uses the quay.io/quarkus/quarkus-micro-image:1.0 base image to containerize the native Quarkus application.

FROM quay.io/quarkus/quarkus-micro-image:1.0
WORKDIR /work/
RUN chown 1001 /work \
&& chmod "g+rwX" /work \
&& chown 1001:root /work
COPY –chown=1001:root build/*-runner /work/application
EXPOSE 8080
USER 1001
CMD ["./application", "-Dquarkus.http.host=0.0.0.0"]

The properties also define the image’s repository name and tag (e.g., garystafford/tickit-srv:1.1.0).

Docker Image Registry showing the Quarkus application image
Docker Image Registry showing the latest Quarkus application image tags

Kubernetes

In addition to creating the ticket Namespace in advance, a Kubernetes secret is pre-deployed to the ticket Namespace. The GitHub Action also requires a Role and RoleBinding to deploy the workload to the Kubernetes cluster. Lastly, a HorizontalPodAutoscaler (HPA) is used to automatically scale the workload.

export NAMESPACE=tickit# Namespace
kubectl create namespace ${NAMESPACE}# Role and RoleBinding for GitHub Actions to deploy to Amazon EKS
kubectl apply -f kubernetes/github_actions_role.yml -n ${NAMESPACE}# Secret
kubectl apply -f kubernetes/secret.yml -n ${NAMESPACE}# HorizontalPodAutoscaler (HPA)
kubectl apply -f kubernetes/tickit-srv-hpa.yml -n ${NAMESPACE}

As part of the configuration properties included in the action’s build step, note the use of Kubernetes secrets.

-Dquarkus.kubernetes-config.secrets=tickit
-Dquarkus.kubernetes-config.secrets.enabled=true

This secret contains base64 encoded sensitive credentials and connection values to connect to the Production PostgreSQL database. For this post, I have pre-built an Amazon RDS for PostgreSQL database instance, created the ticket database and required database objects, and lastly, imported the sample data included in the GitHub repository, garystafford/tickit-srv-data.

apiVersion: v1
kind: Secret
metadata:
name: tickit
type: Opaque
data:
DB_USERNAME: Y2hhbmdlbWU=
DB_PASSWORD: Y2hhbmdlbWVhbHNv
DB_HOST: Y2hhbmdlLm1lLnVzLWVhc3QtMS5yZHMuYW1hem9uYXdzLmNvbQ==
DB_PORT: NTQzMg==
DB_DATABASE: dGlja2l0
view raw secret.yml hosted with ❤ by GitHub

The five keys seen in the Secret are used in the application.properties file to provide access to the Production PostgreSQL database from the Quakus application.

%prod.quarkus.datasource.username=${DB_USERNAME}
%prod.quarkus.datasource.password=${DB_PASSWORD}
%prod.quarkus.datasource.reactive.url=vertx-reactive:postgresql://${DB_HOST}:${DB_PORT}/${DB_DATABASE}
%prod.quarkus.hibernate-orm.database.generation=none
%prod.quarkus.hibernate-orm.sql-load-script=no-file

An even better alternative to using Kubernetes secrets on Amazon EKS is AWS Secrets and Configuration Provider (ASCP) for the Kubernetes Secrets Store CSI DriverAWS Secrets Manager stores secrets as files mounted in Amazon EKS pods.

AWS Architecture

The GitHub Action pushes the application’s image to Docker’s Container Registry (docker.io), then deploys the application to Kubernetes. Alternately, you could use AWS’s Amazon Elastic Container Registry (Amazon ECR). Amazon EKS pulls the image from Docker as it creates the Kubernetes Pod containers.

There are many ways to route traffic from a requestor to the Quarkus application running on Kubernetes. For this post, the Quarkus application is exposed as a Kubernetes Service on a NodePort. For this post, I have registered a domain, example-api.com, with Amazon Route 53 and a corresponding TLS certificate with AWS Certificate Manager. Inbound requests to the Quarkus application are directed to a subdomain, ticket.example-api.com using HTTPS or port 443. Amazon Route 53 routes those requests to a Layer 7 application load balancer (ALB). The ALB then routes those requests to the Amazon EKS Kubernetes cluster on the NodePort using simple round-robin load balancing. Requests will be routed automatically by Kubernetes to the appropriate worker node and Kubernetes pod. The response then traverses a similar path back to the requestor.

High-level AWS architecture diagram of Quarkus application’s Production environment

Results

If the GitHub action is successful, any push of code changes to GitHub results in the deployment of the application to Kubernetes.

Resources deployed to the ticket Namespace within the Kubernetes cluster

We can also view the deployed Quarkus application resources using the Kubernetes Dashboard.

Quarkus Application pod viewed in Kubernetes Dashboard

Metrics

The post’s Quarkus application implements the micrometer-registry-prometheus extension. The Micrometer metrics library exposes runtime and application metrics. Micrometer defines a core library, providing a registration mechanism for metrics and core metric types.

Sample of the metrics exposed by the Quarkus application API’s metrics resource

Using the Micrometer extension, a metrics resource is exposed at /q/metrics, which can be scraped and visualized by tools such as Prometheus. AWS offers its fully-managed Amazon Managed Service for Prometheus (AMP), which easily integrates with Amazon EKS.

Graph of HTTP Server Requests scraped by Prometheus from Quarkus Application

Using Prometheus as a datasource, we can build dashboards in Grafana to observe the Quarkus Application metrics. Similar to AMP, AWS offers its fully managed Amazon Managed Grafana (AMG).

Example of Grafana dashboard built from Quarkus Application metrics via Prometheus

Centralized Log Management

According to Quarkus documentation, internally, Quarkus uses JBoss Log Manager and the JBoss Logging facade. You can use the JBoss Logging facade inside your code or any of the supported Logging APIs, including JDK java.util.logging (aka JUL), JBoss LoggingSLF4J, and Apache Commons Logging. Quarkus will send them to JBoss Log Manager.

There are many ways to centralize logs. For example, you can send these logs to open-source centralized log management systems like GraylogElastic Stack, fka ELK (Elasticsearch, Logstash, Kibana), EFK (Elasticsearch, Fluentd, Kibana), and OpenSearch with Fluent Bit.

If you are using Kubernetes, the simplest way is to send logs to the console and integrate a central log manager inside your cluster. Since the Quarkus application in this post is running on Amazon EKS, I have chosen Amazon OpenSearch Service with Fluent Bit, an open-source and multi-platform Log Processor and Forwarder. Fluent Bit is fully compatible with Docker and Kubernetes environments. Amazon provides an excellent workshop on installing and configuring Amazon OpenSearch Service with Fluent Bit.

Amazon OpenSearch showing debug logs from the Quarkus application
Amazon OpenSearch logs filtered for errors thrown by the Quarkus application

Conclusion

As we learned in this post, Quarkus, the ‘Supersonic Subatomic Java’ framework, is a cloud-native, Kubernetes-native, container first, microservices first framework for writing Java applications. We observed how to build, test, and deploy a RESTful Quarkus Native application to Kubernetes.

Quarkus has capabilities and features well beyond this post’s scope. In a future post, we will explore other abilities of Quarkus, including observability, GraphQL integration, caching, database proxying, tracing and debugging, message queues, data pipelines, and streaming analytics.


This blog represents my own viewpoints and not of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners. All diagrams and illustrations are property of the author.

, , , , , , ,

Leave a comment

Eventual Consistency with Spring for Apache Kafka: Part 2 of 2

Using Spring for Apache Kafka to manage a Distributed Data Model in MongoDB across multiple microservices

As discussed in Part One of this post, given a modern distributed system composed of multiple microservices, each possessing a sub-set of a domain’s aggregate data, the system will almost assuredly have some data duplication. Given this duplication, how do we maintain data consistency? In this two-part post, we explore one possible solution to this challenge — Apache Kafka and the model of eventual consistency.

Part Two

In Part Two of this post, we will review how to deploy and run the storefront API components in a local development environment running on Kubernetes with Istio, using minikube. For simplicity’s sake, we will only run a single instance of each service. Additionally, we are not implementing custom domain names, TLS/HTTPS, authentication and authorization, API keys, or restricting access to any sensitive operational API endpoints or ports, all of which we would certainly do in an actual production environment.

To provide operational visibility, we will add Yahoo’s CMAK (Cluster Manager for Apache Kafka), Mongo ExpressKialiPrometheus, and Grafana to our system.

View of Storefront API traffic from Kiali

Prerequisites

This post will assume a basic level of knowledge of Kubernetes, minikube, Docker, and Istio. Furthermore, the post assumes you have already installed recent versions of minikube, kubectl, Docker, and Istio. Meaning, that the kubectl, istioctl, docker, and minikube commands are all available from the terminal.

Currently installed version of the required applications

For this post demonstration, I am using an Apple MacBook Pro running macOS as my development machine. I have the latest versions of Docker Desktop, minikube, kubectl, and Istio installed as of May 2021.

Source Code

The source code for this post is open-source and is publicly available on GitHub. Clone the GitHub project using the following command:

clone --branch 2021-istio \
--single-branch --depth 1 \
https://github.com/garystafford/storefront-demo.git

Minikube

Part of the Kubernetes project, minikube is local Kubernetes, focusing on making it easy to learn and develop for Kubernetes. Minikube quickly sets up a local Kubernetes cluster on macOS, Linux, and Windows. Given the number of Kubernetes resources we will be deploying to minikube, I would recommend at least 3 CPUs and 4–5 GBs of memory. If you choose to deploy multiple observability tools, you may want to increase both of these resources if you can afford it. I maxed out both CPUs and memory several times while setting up this demonstration, causing temporary lock-ups of minikube.

minikube --cpus 3 --memory 5g --driver=docker start start

The Docker driver allows you to install Kubernetes into an existing Docker install. If you are using Docker, please be aware that you must have at least an equivalent amount of resources allocated to Docker to apportion to minikube.

Before continuing, confirm minikube is up and running and confirm the current context of kubectl is minikube.

minikube status
kubectl config current-context

The statuses should look similar to the following:

Use the eval below command to point your shell to minikube’s docker-daemon. You can confirm this by using the docker image ls and docker container ls command to view running Kubernetes containers on minikube.

eval $(minikube -p minikube docker-env)
docker image ls
docker container ls

The output should look similar to the following:

You can also check the status of minikube from Docker Desktop. Minikube is running as a container, instantiated from a Docker image, gcr.io/k8s-minikube/kicbase. View the container’s Stats, as shown below.

Istio

Assuming you have downloaded and configured Istio, install it onto minikube. I currently have Istio 1.10.0 installed and have theISTIO_HOME environment variable set in my Oh My Zsh .zshrc file. I have also set Istio’s bin/ subdirectory in my PATH environment variable. The bin/ subdirectory contains the istioctl executable.

echo $ISTIO_HOME                                                                
> /Applications/Istio/istio-1.10.0
where istioctl
> /Applications/Istio/istio-1.10.0/bin/istioctl
istioctl version

> client version: 1.10.0
control plane version: 1.10.0
data plane version: 1.10.0 (4 proxies)

Istio comes with several built-in configuration profiles. The profiles provide customization of the Istio control plane and of the sidecars for the Istio data plane.

istioctl profile list
> Istio configuration profiles:
default
demo
empty
external
minimal
openshift
preview
remote

For this demonstration, we will use the default profile, which installs istiod and an istio-ingressgateway. We will not require the use of an istio-egressgateway, since all components will be installed locally on minikube.

istioctl install --set profile=default -y
> ✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Installation complete

Minikube Tunnel

kubectl get svc istio-ingressgateway -n istio-system

To associate an IP address, run the minikube tunnel command in a separate terminal tab. Since it requires opening privileged ports 80 and 443 to be exposed, this command will prompt you for your sudo password.

Services of the type LoadBalancer can be exposed by using the minikube tunnel command. It must be run in a separate terminal window to keep the LoadBalancer running. We previously created the istio-ingressgateway. Run the following command and note that the status of EXTERNAL-IP is <pending>. There is currently no external IP address associated with our LoadBalancer.

minikube tunnel

Rerun the previous command. There should now be an external IP address associated with the LoadBalancer. In my case, 127.0.0.1.

kubectl get svc istio-ingressgateway -n istio-system

The external IP address shown is the address we will use to access the resources we chose to expose externally on minikube.

Minikube Dashboard

Once again, in a separate terminal tab, open the Minikube Dashboard (aka Kubernetes Dashboard).

minikube dashboard

The dashboard will give you a visual overview of all your installed Kubernetes components.

Minikube Dashboard showing the istio-system namespace

Namespaces

Kubernetes supports multiple virtual clusters backed by the same physical cluster. These virtual clusters are called namespaces. For this demonstration, we will use four namespaces to organize our deployed resources: dev, mongo, kafka, and storefront-kafka-project. The dev namespace is where we will deploy our Storefront API’s microservices: accounts, orders, and fulfillment. We will deploy MongoDB and Mongo Express to the mongo namespace. Lastly, we will use the kafka and storefront-kafka-project namespaces to deploy Apache Kafka to minikube using Strimzi, a Cloud Native Computing Foundation sandbox project, and CMAK.

kubectl apply -f ./minikube/resources/namespaces.yaml

Automatic Sidecar Injection

In order to take advantage of all of Istio’s features, pods in the mesh must be running an Istio sidecar proxy. When you set the istio-injection=enabled label on a namespace and the injection webhook is enabled, any new pods created in that namespace will automatically have a sidecar added to them. Labeling the dev namespace for automatic sidecar injection ensures that our Storefront API’s microservices — accounts, orders, and fulfillment— will have Istio sidecar proxy automatically injected into their pods.

kubectl label namespace dev istio-injection=enabled

MongoDB

Next, deploy MongoDB and Mongo Express to the mongo namespace on minikube. To ensure a successful connection to MongoDB from Mongo Express, I suggest giving MongoDB a chance to start up fully before deploying Mongo Express.

kubectl apply -f ./minikube/resources/mongodb.yaml -n mongo
sleep 60
kubectl apply -f ./minikube/resources/mongo-express.yaml -n mongo

To confirm the success of the deployments, use the following command:

kubectl get services -n mongo

Or use the Kubernetes Dashboard to confirm deployments.

Mongo Express UI Access

For parts of your application (for example, frontends) you may want to expose a Service onto an external IP address outside of your cluster. Kubernetes ServiceTypes allows you to specify what kind of Service you want; the default is ClusterIP.

Note that while MongoDB uses the ClusterIP, Mongo Express uses NodePort. With NodePort, the Service is exposed on each Node’s IP at a static port (the NodePort). You can contact the NodePort Service, from outside the cluster, by requesting <NodeIP>:<NodePort>.

In a separate terminal tab, open Mongo Express using the following command:

minikube service --url mongo-express -n mongo

You should see output similar to the following:

Click on the link to open Mongo Express. There should already be three MongoDB operational databases shown in the UI. The three Storefront databases and collections will be created automatically, later in the post: accounts, orders, and fulfillment.

Apache Kafka using Strimzi

Next, we will install Apache Kafka and Apache Zookeeper into the kafka and storefront-kafka-project namespaces on minikube, using Strimzi. Since Strimzi has a great, easy-to-use Quick Start guide, I will not detail