Aggregating Data with Amazon Managed Streaming for Apache Kafka (MSK)
Description
Amazon Managed Streaming for Apache Kafka (also known as Amazon MSK) is an event streaming platform that's capable of handling events numbering in the trillions per day. Originally Apache Kafka was designed to be a type of message queue, it has proven itself useful in many other use-cases too.
This managed offering from AWS makes reliably setting up and managing Apache Kafka clusters simple. You don't need to worry about provisioning servers, or keeping them patched up to date. Amazon MSK integrates with existing AWS technology. Storage is secure and durable, and monitoring is taken care of with Amazon CloudWatch.
In this Hands-On lab, you will see how to create a cluster configuration for an Amazon MSK cluster. You will connect to an Amazon MSK cluster and create some Topics. And you will create a simple application using the Faust streaming library that populates the Topics and aggregates the data.
Please note, this lab creates an Amazon MSK cluster which can take over twenty minutes to finish setting up. Please make sure you have enough time available before starting this lab.
Learning Objectives
Upon completion of this beginner-level lab, you will be able to:
- Create an Amazon MSK Cluster Configuration
- Create a Topic in an Amazon MSK cluster using the Apache Kafka command-line tools
- Implement a Python script that aggregates Topic data
- Retrieve data from Topics using the command-line and Python
Intended Audience
- Data Engineers
- Cloud Engineers
Prerequisites
Familiarity with the following will be beneficial but is not required:
- Amazon Managed Streaming for Apache Kafka
- The Linux Bash shell
- The AWS command-line interface
- Python
The following courses and lab can be used to fulfill the prerequisites:
- Fundamentals of Amazon MSK (Amazon Managed Streaming for Apache Kafka)
- Linux Command Line Byte Session
- Python for Beginners
- Introduction to the AWS CLI
Updates
May 3rd, 2024 - Resolved deployment issue
February 22nd, 2022 - Updated the instructions and screenshots to reflect the latest UI
August 31st, 2021 - Resolved an issue with the commands used to wait for the MSK cluster to become active
August 30th, 2021 - Emphasized the warning about the time it takes to create the MSK cluster
June 18th, 2021 - Clarified some instructions