Amazon EMR (formerly known as Amazon Elastic Map Reduce) is a big data platform that supports many popular open-source data processing frameworks, including Apache Spark. Amazon EMR simplifies the configuration, provisioning, and scaling of clusters for data analysis and processing workloads.
Learning how to use Amazon EMR will help anyone looking to understand how to perform big data processing in the real world.
In this hands-on lab, you will tour an Amazon EMR cluster, place data and a script in a location accessible to Amazon EMR, submit a workload to an Amazon EMR cluster, and examine the results.
Please note an Amazon EMR cluster takes approximately ten minutes to create and become usable. Please ensure you have enough time available before starting the lab.
Upon completion of this beginner-level lab, you will be able to:
Familiarity with the following will be beneficial but is not required:
The following content can be used to fulfill the prerequisites: