Implementing a Searchable Amazon S3 Data Lake
Description
AWS Glue is a service that data analytics professionals can use to catalog, transform, and integrate data from different sources. By consolidating integration capabilities into a single centralized service, AWS Glue gives you the ability to discover, cleanse, catalog, and transform data in a single place.
Learning how to use AWS Glue to work with data will help you become more effective at creating and using data lakes in the public AWS cloud.
In this lab, you will implement an AWS Lambda function that processes order data as it is uploaded to Amazon S3, and you will see how to configure AWS Glue to make searching the data more efficient.
Learning Objectives
Upon completion of this beginner-level lab, you will be able to:
- Use an AWS Lambda to normalize JSON data
- Use Amazon EventBridge to invoke an AWS Lambda function in response to an event
- Configure an AWS Glue table to use a partition index
- Search data stored in Amazon S3 with Amazon Athena
Intended Audience
- Candidates for the AWS Certified Data Analytics Specialty certification
- Cloud Architects
- Data Engineers
- DevOps Engineers
- Machine Learning Engineers
- Software Engineers
Prerequisites
Familiarity with the following will be beneficial but is not required:
- AWS Glue
- Data Lakes
- AWS Lambda
- Amazon EventBridge
- Amazon Athena
The following content can be used to fulfill the prerequisites:
- Developing Serverless ETL with AWS Glue
- Understanding Data Lakes in AWS
- Understanding AWS Lambda to Run & Scale Your Code
- Connecting Application Data using Amazon EventBridge
- Analyzing Data with Amazon Athena
Updates
June 5th, 2024 - Updated the instructions and screenshots to reflect the latest UI
February 15th, 2023 - Updated the Lambda implementation step with a test event