hands-on lab
Using AWS Glue for ETL Workloads
Difficulty: Beginner
Duration: Up to 1 hour
Students: 533
Rating: 5/5
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.
Description
AWS Glue is a serverless data integration offering that you can use to discover, prepare, transfer, and integrate your data. AWS Glue jobs are commonly used for Extract, Transform, and Load (ETL) tasks to support analytics, data migration, and machine learning activities.
Learning how to use AWS Glue jobs will make you more proficient at working with data in the public AWS cloud.
In this hands-on lab, you will examine data to work with, implement an AWS Glue job, and verify the results of an example ETL workload.
Learning objectives
Upon completion of this beginner-level lab, you will be able to:
- Use the Amazon S3 and Amazon DynamoDB consoles to view source data
- Implement an AWS Glue job using Python and Apache Spark
- Run an AWS Glue job with a supplied parameter
Intended audience
- Candidates for the AWS Certified Data Engineer Associate certification
- Cloud Architects
- Data Engineers
- DevOps Engineers
- Machine Learning Engineers
- Software Engineers
Prerequisites
Familiarity with the following will be beneficial but is not required:
- AWS Glue
- The Python scripting language
- Amazon DynamoDB
The following content can be used to fulfill the prerequisites:
Environment before
Environment after
Covered topics
Lab steps
Logging In to the Amazon Web Services Console
Examining the Data Sources
Implementing a Glue ETL Job
Running a Glue ETL Job