hands-on lab

Developing an ETL Job in AWS Glue Studio

Difficulty: Beginner
Duration: Up to 45 minutes
Students: 48
Rating: 5/5
On average, students complete this lab in30m
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.

Description

The AWS Glue Studio visual interface is used to create, run, and manage AWS Glue jobs. Glue Studio's drag-and-drop interface generates Python or Scala scripts that perform extract, transform, and load (ETL) operations on data stored in AWS and third-party data sources. AWS Glue Studio aims to solve the following challenges faced by data engineers:

  • Accessibility: A visual interface simplifies the ETL job creation process for users without extensive programming experience.
  • Productivity: The most common ETL tasks can be performed quickly and efficiently using the visual interface.
  • Debugging: Glue Studio data previews and job run logs help users identify and resolve issues in their ETL jobs.
  • Integration: Glue Studio integrates with other AWS services, such as Amazon S3, Amazon RDS, and Amazon Redshift, to facilitate data processing and transformation.

In this lab, you will create an ETL job in AWS Glue Studio that transforms data stored in an Amazon S3 bucket. You will configure and run the job, and observe the transformed data using Amazon Athena.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

  • Configure a Visual ETL job in AWS Glue Studio
  • Generate an AWS Glue job script
  • Query an AWS Glue Data Catalog table using Amazon Athena

Intended audience

  • Candidates for the AWS Certified Data Engineer Associate certification
  • Cloud Architects
  • Data Engineers
  • DevOps Engineers
  • Machine Learning Engineers
  • Software Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

  • AWS Glue
  • Amazon S3
  • Amazon Athena
  • Python scripting language

The following content can be used to fulfill the prerequisites:

Environment before

Environment after

Covered topics

Hands-on Lab UUID

Lab steps

0 of 4 steps completed.Use arrow keys to navigate between steps. Press Enter to go to a step if available.
  1. Logging In to the Amazon Web Services Console
  2. Exploring the AWS Glue Data Catalog Table
  3. Building an AWS Glue Visual ETL Job
  4. Running an AWS Glue Visual ETL Job