hands-on lab
Implementing an ETL Pipeline with AWS SDK for Pandas
Difficulty: Beginner
Duration: Up to 1 hour
Students: 2
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.
Description
AWS SDK for Pandas is a Python library supplied by Amazon that simplifies data science tasks when using Python to analyze and manipulate data. Built upon the popular Pandas library, it is performant and designed to be used at scale.
Learning how to use AWS SDK for Pandas will benefit anyone who is looking to make use of data science in the public AWS cloud.
In this hands-on lab, you will explore accessing different data stores using the library, and you will implement a Lambda function that uses it to process transaction data in real-time.
Learning objectives
Upon completion of this beginner-level lab, you will be able to:
- Use a JupyterLab Notebook
- Install and use the AWS SDK for Pandas library
- Update an AWS Lambda function using the AWS CLI
- Query data using Amazon Athena
Intended audience
- Candidates for the AWS Certified Data Engineer Associate certification
- Data Engineers
- DevOps Engineers
- Machine Learning Engineers
- Software Engineers
Prerequisites
Familiarity with the following will be beneficial but is not required:
- The Python scripting language
- AWS Lambda
- Amazon S3
The following content can be used to fulfill the prerequisites:
Environment before
Environment after
Covered topics
Lab steps
Exploring the AWS SDK for Pandas Library
Developing an Extract AWS Lambda Function
Logging In to the Amazon Web Services Console
Examining the Pipeline and Extracted Data