Sessionizing Clickstream Data with Amazon Kinesis and Managed Apache Flink
Description
Amazon Managed Service for Apache Flink is a fully managed service that enables you to perform analysis using SQL and other tools on streaming data in real time. Amazon Managed Service for Apache Flink scales automatically to match your usage, there's no infrastructure to manage and you only pay for what you use.
Use cases for Amazon Managed Service for Apache Flink include:
- Streaming extract, transform, and load (ETL) jobs
- Real-time log analysis
- Ad-tech and digital marketing analysis
Amazon Managed Service for Apache Flink allows you to make use of existing and familiar SQL skills, it also integrates with other AWS services. You can deliver your results to any destination supported by Kinesis Data Streams or Kinesis Firehose, and use a Lambda function to deliver to external or unmanaged destinations.
In this lab, you will learn how to use Amazon Kinesis and Amazon Managed Service for Apache Flink to sessionize sample clickstream data and output it to DynamoDB using an AWS Lambda function.
Learning Objectives
This is a beginner-level lab, upon completion, you will be able to:
- Use Amazon Kinesis and Amazon Managed Service for Apache Flink to analyze clickstream data
- Create an AWS Lambda function that adds records to an Amazon DynamoDB table
- Configure Amazon Kinesis to send results to your AWS Lambda function
Intended Audience
- Candidates for the AWS Certified Data Analytics Speciality exam
- Data Engineers
- Cloud Engineers
Prerequisites
Familiarity with Data Analytics, SQL, the Bash shell, and the Python programming language will be beneficial but is not required.
The following courses can be used to fulfill the prerequisites:
- Analytics Fundamentals for AWS
- Introduction to SQL
- Linux Command Line Byte Session
- Python for Beginners
Updates
August 31st, 2023 - Updated the instructions and screenshots to reflect the latest UI
July 17th, 2023 - Updated the lab to use Zeppelin and Apache Flink
March 12th, 2023 - Resolved an issue that caused the lab to fail to set up on rare occasions
October 14th, 2022 - Updated instructions due to updates in the Kinesis Data Stream data retention timeframes
September 8th, 2022 - Updated the instructions and screenshots to the reflect UI
May 31st, 2022 - Updated screenshots & instructions to reflect UI
December 8th, 2021 - Updated lab step instructions to correct grammar error
December 6th, 2021 - Updated lab step instructions to reflect latest user interface changes
September 7th, 2021 - Updated lab step instructions to reflect latest user interface changes
March 11th, 2021 - Updated AWS Lambda lab step to reflect latest user interface changes
January 22nd, 2021 - Updated AWS Lambda lab step to reflect latest user interface changes