Create a crawler and connections
First upload any CSV file into your S3 bucket that can be used as a source for our demo.
Go to AWS Glue, Click o Crawlers, then Add Crawler
Create or select IAM role and Schedule it. Choose one time for demo.
Run the crawler
check the logs
This should create our metadata
If any column type needs to be changed or if a new column needs to be added, it can be done at this point.
Create a connection to table
To connect to MySQL, Click on connections, Add connection, any connection name, Connection type is JDBC, provide connection parameters(JDBC URL, user name and password, VPC and subnet information) and Click connect
Build an ETL job in AWS Glue
To build your first job, clcick on job and then add job
Job name and IAM role and keep the defaults.
Then choose data source in the next screen
Then choose Target database
In the next screen, we can add or remove columns from target, remap the fields etc.
All transformations including sorting, format changes can be done in the Python script that is generated in the next screen.
Run an ETL job in AWS Glue
In above screen there is an option to run job, this executes the job. The job completion can be seen in the Glue section under jobs. TO see more detailed logs go to CloudWatch logs.
Verify the data in target table
Execute SELECT * FROM DEMO_TABLE LIMIT 10; and SELECT COUNT(*) FROM DEMO_TABLE; to validate the data.