hands-on lab

Performing Data Quality Checks Using an Amazon Athena Notebook

Difficulty: Beginner
Duration: Up to 1 hour
Students: 12
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.

Description

Amazon Athena is a service from AWS that allows you to query data stored in Amazon S3 using SQL. When your analysis requires more complex calculations, you can use an Amazon Athena Notebook to process your data using Apache Spark.

Learning how to use an Amazon Athena Notebook will benefit anyone looking to improve their data analysis skills.

In this hands-on lab, you will connect to an Amazon Athena Notebook session and perform data quality checks using an Amazon Athena notebook.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

  • Examine an Amazon Athena workgroup for use with an Amazon Athena notebook
  • Connect to an Amazon Athena Notebook session
  • Use a Jupyter Notebook interface to perform data quality checks

Intended audience

  • Candidates for the AWS Certified Data Engineer Associate certification
  • Cloud Architects
  • Data Engineers
  • Machine Learning Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

  • Amazon Athena
  • Apache Spark
  • Jupyter notebooks

The following content can be used to fulfill the prerequisites:

Environment before

Environment after

Covered topics

Lab steps

Logging In to the Amazon Web Services Console
Using an Athena Notebook