Using Ahana Cloud for Presto to perform analytics on AWS using both Apache Hive and AWS Glue as metastores
Introduction
The following series of five videos are an extended version of the demonstration featured in the October 2021 webinar, Build an Open Data Lake on AWS with Presto. An on-demand copy of the live webinar is available on Ahana.io, featuring Dipti Borkar (Ahana Co-Founder and CPO) and I.
In the demonstration, we will build a data lake on AWS using a combination of Ahana Cloud for Presto, Apache Hive, Apache Superset, Amazon S3, AWS Glue, and Amazon Athena. We then analyze the data in Apache Superset using Ahana Cloud for Presto.

Demonstration
The demonstration is divided into five YouTube videos (playlist):
Source Code
All source code for this post and the previous posts in this series are open-sourced and located on GitHub. In the webinar and the videos, the Apache Hive and AWS Glue data catalog tables contain an _athena
or _presto
suffix. For clarity, in the source code, I have changed those to indicate the metastore they are associated with, _hive
or _glue
, since either set of tables can be queried Presto. Additionally, in the webinar and the videos, the raw data files were uploaded to Amazon S3 in uncompressed CSV format; this is unnecessary. The CTAS
SQL statements both expect GZIP-compressed CSV files. To save time and cost, upload the compressed files, as they are, to Amazon S3.
The following files are used in the demonstration:
README.md
: Instructions for demoahana_demo_glue_artists.sql
: AWS Glue SQL statementsahana_demo_glue_artworks.sql
: AWS Glue SQL statementsahana_demo_hive.sql
: Apache Hive SQL statementsjoins.sql
: Simple SQL join statementsuperset_charts.sql
: SQL statements for Superset chartsmoma_public_artists.txt.gz
: Compressed raw artists datamoma_public_artworks.txt.gz
: Compressed raw artworks data
This blog represents my own viewpoints and not of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.