hands-on lab

Transforming Data With Apache Spark and Amazon EMR

Difficulty: Beginner
Duration: Up to 1 hour and 30 minutes
Students: 358
Rating: 5/5
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.

Description

Amazon EMR (formerly known as Amazon Elastic Map Reduce) is a big data platform that supports many popular open-source data processing frameworks, including Apache Spark. Amazon EMR simplifies the configuration, provisioning, and scaling of clusters for data analysis and processing workloads.

Learning how to use Amazon EMR will help anyone looking to understand how to perform big data processing in the real world.

In this hands-on lab, you will tour an Amazon EMR cluster, place data and a script in a location accessible to Amazon EMR, submit a workload to an Amazon EMR cluster, and examine the results.

Please note an Amazon EMR cluster takes approximately ten minutes to create and become usable. Please ensure you have enough time available before starting the lab.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

  • Understand the configuration of an Amazon EMR cluster
  • Upload a script and data file to an Amazon S3 bucket
  • Submit work to a cluster by adding a step
  • Inspect the results of an Amazon EMR step

Intended audience

  • Candidates for AWS Certified Data Engineer Associate certification
  • Cloud Architects
  • Data Engineers
  • DevOps Engineers
  • Machine Learning Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

  • Amazon EMR
  • Amazon Simple Storage Service (S3)
  • The Python scripting language
  • The JavaScript Object Notation (JSON) data format

The following content can be used to fulfill the prerequisites:

Environment before

Environment after

Covered topics

Lab steps

Logging In to the Amazon Web Services Console
Touring an Amazon EMR Cluster
Uploading Files to Amazon S3
Submitting a Job to an Amazon EMR Cluster
Examining the Results