The AWS Glue Studio visual interface is used to create, run, and manage AWS Glue jobs. Glue Studio's drag-and-drop interface generates Python or Scala scripts that perform extract, transform, and load (ETL) operations on data stored in AWS and third-party data sources. AWS Glue Studio aims to solve the following challenges faced by data engineers:
- Accessibility: A visual interface simplifies the ETL job creation process for users without extensive programming experience.
- Productivity: The most common ETL tasks can be performed quickly and efficiently using the visual interface.
- Debugging: Glue Studio data previews and job run logs help users identify and resolve issues in their ETL jobs.
- Integration: Glue Studio integrates with other AWS services, such as Amazon S3, Amazon RDS, and Amazon Redshift, to facilitate data processing and transformation.
In this lab, you will create an ETL job in AWS Glue Studio that transforms data stored in an Amazon S3 bucket. You will configure and run the job, and observe the transformed data using Amazon Athena.
Learning objectives
Upon completion of this beginner-level lab, you will be able to:
- Configure a Visual ETL job in AWS Glue Studio
- Generate an AWS Glue job script
- Query an AWS Glue Data Catalog table using Amazon Athena
Intended audience
- Candidates for the AWS Certified Data Engineer Associate certification
- Cloud Architects
- Data Engineers
- DevOps Engineers
- Machine Learning Engineers
- Software Engineers
Prerequisites
Familiarity with the following will be beneficial but is not required:
- AWS Glue
- Amazon S3
- Amazon Athena
- Python scripting language
The following content can be used to fulfill the prerequisites: