Welcome to QA's learning platform (formerly Cloud Academy). Learn more about our journey here, opens in a new tab.

Introduction to Google Cloud Dataproc

Difficulty: Intermediate
Duration: 1 minute and 29 seconds
Students: 4

Google Cloud Dataproc is a fully managed service for running popular big data frameworks like Apache Hadoop and Spark. In this lesson, we will cover the basics including cluster creation and job execution.

Learning Objectives 

  1. Understand the fundamentals of Apache Hadoop and Spark. 

  2. Use Dataproc to create Hadoop/Spark clusters. 

  3. Run Dataproc jobs on a cluster. 

  4. Run serverless Dataproc jobs. 

  5. Define and execute data pipelines using workflow templates. 

Intended Audience 

  • Data Professionals 

  • Machine Learning Engineers 

  • Anyone preparing for a Google Cloud certification 

Prerequisites 

  • Access to a GCP account