hands-on lab

Aggregating Data with Amazon Managed Streaming for Apache Kafka (MSK)

Difficulty: Beginner
Duration: Up to 2 hours
Students: 1,474
Rating: 4.3/5
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.

Description

Amazon Managed Streaming for Apache Kafka (also known as Amazon MSK) is an event streaming platform that's capable of handling events numbering in the trillions per day. Originally Apache Kafka was designed to be a type of message queue, it has proven itself useful in many other use-cases too.

This managed offering from AWS makes reliably setting up and managing Apache Kafka clusters simple. You don't need to worry about provisioning servers, or keeping them patched up to date. Amazon MSK integrates with existing AWS technology. Storage is secure and durable, and monitoring is taken care of with Amazon CloudWatch.

In this Hands-On lab, you will see how to create a cluster configuration for an Amazon MSK cluster. You will connect to an Amazon MSK cluster and create some Topics. And you will create a simple application using the Faust streaming library that populates the Topics and aggregates the data.

Please note, this lab creates an Amazon MSK cluster which can take over twenty minutes to finish setting up. Please make sure you have enough time available before starting this lab.

Learning Objectives

Upon completion of this beginner-level lab, you will be able to:

  • Create an Amazon MSK Cluster Configuration
  • Create a Topic in an Amazon MSK cluster using the Apache Kafka command-line tools
  • Implement a Python script that aggregates Topic data
  • Retrieve data from Topics using the command-line and Python

Intended Audience

  • Data Engineers
  • Cloud Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

  • Amazon Managed Streaming for Apache Kafka
  • The Linux Bash shell
  • The AWS command-line interface
  • Python

The following courses and lab can be used to fulfill the prerequisites:

Updates

May 3rd, 2024 - Resolved deployment issue

February 22nd, 2022 - Updated the instructions and screenshots to reflect the latest UI

August 31st, 2021 - Resolved an issue with the commands used to wait for the MSK cluster to become active

August 30th, 2021 - Emphasized the warning about the time it takes to create the MSK cluster

June 18th, 2021 - Clarified some instructions

Environment before

Environment after

Covered topics

Lab steps

Logging In to the Amazon Web Services Console
Creating an Amazon MSK Cluster Configuration
Connecting to the Virtual Machine using EC2 Instance Connect
Creating Topics using the Apache Kafka Command-line Interface
Populating and Processing Topic Data
Visualizing Your Aggregated Data