hands-on lab

Building a Data Pipeline in DC/OS

Difficulty: Intermediate
Duration: Up to 1 hour
Students: 233
Rating: 4.5/5
This lab is currently under maintenance and unavailable. We are actively working to resolve this issue and we apologize for any inconvenience.

DC/OS was declared end of life October 31, 2021 and the content is no longer maintained

Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.

Description

Notice: DC/OS has been declared end-of-life. The lab instructions have been brought up to the end-of-life release. Due to limitations in DC\OS, the final lab step now simulates the real-time analysis of tweets.

It is relatively simple to create powerful data pipelines in DC/OS. In this Lab, you will learn how to perform streaming data analytics by building a data pipeline in DC/OS that combines multiple services and a Twitter-like application. You will review many of the fundamental concepts in using DC/OS along the way, including installing packages, using Marathon-LB to load balance traffic, and working with virtual IPs.

Lab Objectives

Upon completion of this Lab you will be able to:

  • Install DC/OS packages with custom options using the DC/OS CLI
  • Deploy a data pipeline using Kafka, Cassandra, and a social networking app
  • Use the Zeppelin package and DC/OS Spark to perform basic streaming analytics on the data pipeline

Lab Prerequisites

You should be familiar with:

  • Basic and intermediate DC/OS concepts including Virtual IPs and Marathon-LB
  • Working at the command-line in Linux
  • AWS services to optionally understand the architecture of the pre-created DC/OS cluster

Lab Environment

Before completing the Lab instructions, the environment will look as follows:

After completing the Lab instructions, the environment should look similar to:

 

Updates

January 19th, 2022 - Updated lab instructions to reflect the latest (end of life) DC/OS experience

August 1st, 2021 - Resolved an issue preventing the DC/OS cluster from provisioning

October 2nd, 2020 - Replaced CoreOS virtual machines (no longer available in AWS) with CentOS

January 10th, 2019 - Added a validation Lab Step to check the work you perform in the Lab

Covered topics

Lab steps

Logging In to the Amazon Web Services Console
Understanding the DC/OS Cluster Architecture
Connecting to the DC/OS Cluster NAT Instance using SSH
Installing the DC/OS CLI on Linux
Installing the Required Packages in the DC/OS Cluster
Running the Tweeter Application
Simulating Analyzing Tweets in Real-Time with Zeppelin
Validate AWS Lab