Demo: Creating a ETL solution using AWS Glue

Reading Time: 3 minutes
Sample job to read from S3 bucket, write to Amazon RDS MySQL

Create a crawler and connections

First upload any CSV file into your S3 bucket that can be used as a source for our demo.

Source data

Go to AWS Glue, Click o Crawlers, then Add Crawler

Create or select IAM role and Schedule it. Choose one time for demo.

Run the crawler

check the logs

This should create our metadata

If any column type needs to be changed or if a new column needs to be added, it can be done at this point.

Create a connection to table

To connect to MySQL, Click on connections, Add connection, any connection name, Connection type is JDBC, provide connection parameters(JDBC URL, user name and password, VPC and subnet information) and Click connect

Build an ETL job in AWS Glue

To build your first job, clcick on job and then add job

Job name and IAM role and keep the defaults.

Then choose data source in the next screen

Then choose Target database

In the next screen, we can add or remove columns from target, remap the fields etc.

All transformations including sorting, format changes can be done in the Python script that is generated in the next screen.

Run an ETL job in AWS Glue

In above screen there is an option to run job, this executes the job. The job completion can be seen in the Glue section under jobs. TO see more detailed logs go to CloudWatch logs.

Verify the data in target table

Execute SELECT * FROM DEMO_TABLE LIMIT 10; and SELECT COUNT(*) FROM DEMO_TABLE; to validate the data.