Video Demonstration: Lakehouse Automation on AWS with Apache Airflow

Programmatically load and upload data from Amazon Redshift to an Amazon S3-based Data Lake using Apache Airflow

Introduction

In the following video demonstration, we will learn how to programmatically load and upload data from Amazon Redshift to an Amazon S3-based Data Lake using Apache Airflow. Since we are on AWS, we will be using the fully-managed Amazon Managed Workflows for Apache Airflow (Amazon MWAA). Using Airflow, we will COPY raw data into staging tables, then merge that staging data into a series of tables. We will then load incremental data into Redshift on a regular schedule. Next, we will join and aggregate data from several tables and UNLOAD the resulting dataset to an Amazon S3-based data lake. Lastly, we will catalog the data in S3 using AWS Glue and query with Amazon Athena.

Architecture and workflow demonstrated in the video

Demonstration

For best results, view at 1080p HD on YouTube

Source Code

The source code for this demonstration, including the Airflow DAGs, SQL statements, and data files, is open-sourced and located on GitHub.

DAGs

The DAGs included in the GitHub project are:

redshift_demo__01_create_tables.py
redshift_demo__02_initial_load.py
redshift_demo__03_incremental_load.py
redshift_demo__04_unload_data.py
redshift_demo__05_catalog_and_query.py
redshift_demo__06_run_dags_01_to_05.py
redshift_demo__06B_run_dags_01_to_05.py (alt. ver. w/external notifications module)

Demonstration DAGs as seen in MWAA Airflow UI

This blog represents my own viewpoints and not of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.

Amazon Redshift, Analytics, Apache Airflow, AWS, DAGs, Datawarehouse

This entry was posted on December 2, 2021, 6:00 pm and is filed under Analytics, AWS, Build Automation, Cloud, DevOps, Python, SQL, Technology Consulting. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

Programmatic Ponderings

Video Demonstration: Lakehouse Automation on AWS with Apache Airflow

Introduction

Demonstration

Source Code

DAGs

Leave a comment Cancel reply

Gary Stafford

Recent Posts

Top Posts & Pages

Tag Cloud

Tweets

Programmatic Ponderings

Video Demonstration: Lakehouse Automation on AWS with Apache Airflow

Introduction

Demonstration

Source Code

DAGs

Share this:

Leave a comment Cancel reply

Gary Stafford

Recent Posts

Top Posts & Pages

Tag Cloud

Tweets