This training lesson begins with an introduction to the concepts of Distributed Machine Learning. We'll discuss the reasons as to why and when you should consider training your machine learning model within a distributed environment.
Apache Spark
We’ll introduce you to Apache Spark and how it can be used to perform machine learning both at scale and speed. Apache Spark is an open-source cluster-computing framework.
Amazon Elastic Map Reduce
We’ll introduce you to Amazon’s Elastic MapReduce service, or EMR for short. EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data. EMR can be easily configured to host Apache Spark.
Spark MLlib
We’ll introduce you to MLlib which is Spark’s machine learning module. We’ll discuss how MLlib can be used to perform various machine learning tasks. For this lesson, we'll focus our attention on decision trees as a machine learning method which the MLlib module supports. A decision tree is a type of supervised machine learning algorithm used often for classification problems.
AWS Glue
We’ll introduce you to AWS Glue. AWS Glue is a fully managed extract, transform, and load service, ETL for short. We’ll show you how AWS Glue can be used to prepare our datasets before they are used to train our machine learning models.
Demonstration
Finally, we’ll show you how to use each of the aforementioned services together to launch an EMR cluster configured and pre-installed with Apache Spark for the purpose of training a machine learning model using a decision tree. This demonstration will provide an end-to-end solution that provides machine learning predictive capabilities.
Intended Audience
The intended audience for this lesson includes:
Learning Objectives
By completing this lesson, you will:
Prerequisites
The following prerequisites will be both useful and helpful for this lesson:
Lesson Agenda
The agenda for the remainder of this lesson is as follows:
Feedback
If you have thoughts or suggestions for this lesson, please contact Cloud Academy at support@cloudacademy.com.