Introduction to Google Cloud Dataproc
Difficulty: Intermediate
Duration: 1 minute and 29 seconds
Students: 4
Google Cloud Dataproc is a fully managed service for running popular big data frameworks like Apache Hadoop and Spark. In this lesson, we will cover the basics including cluster creation and job execution.
Learning Objectives
Understand the fundamentals of Apache Hadoop and Spark.
Use Dataproc to create Hadoop/Spark clusters.
Run Dataproc jobs on a cluster.
Run serverless Dataproc jobs.
Define and execute data pipelines using workflow templates.
Intended Audience
Data Professionals
Machine Learning Engineers
Anyone preparing for a Google Cloud certification
Prerequisites
Access to a GCP account