hands-on lab

PySpark - How to build a Machine Learning Pipeline

Difficulty: Beginner

Duration: Up to 1 hour

Students: 481

Rating: 4.4/5

Start lab

On average, students complete this lab in15m

Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.

Learn and validateUse validations to check your solutions every step of the way.

See resultsTrack your knowledge and monitor your progress.

Description

In this hands-on lab, you will master your knowledge of PySpark, a very popular Python library for big data analysis and modeling. Here, you will learn how to create a machine learning pipeline using the PySpark library, and to perform metric evaluation and model tuning.

Your machine learning skills will be challenged, and by the end of this lab, you should have a deep understanding of how PySpark practically works to build data analysis pipelines.

Learning Objectives

Upon completion of this lab you will be able to:

fit a Logistic Regression model in PySpark;
perform cross-validation in PySPark;
evaluate the model performances;
perform inference on new, unseen data.

Intended Audience

This lab is intended for:

Those interested in performing data analysis with Python.
Anyone involved in data science and engineering pipelines.

Prerequisites

You should possess:

An intermediate understanding of Python.
Basic knowledge of SQL.
Basic knowledge of the following libraries: Pandas.

Updates

May 9th, 2023 - Updated instructions to address logging in

Covered topics

Machine Learning

Development

Artificial Intelligence

Python

Say hello to Ela

Get help from our AI Assistant

Lab steps

PySpark - Machine Learning Pipeline

Lab Rules

Lab rules apply

PySpark - How to build a Machine Learning Pipeline

Description

Learning Objectives

Intended Audience

Prerequisites

Covered topics

Say hello to Ela

Lab steps

Lab Rules

SELF PACED PLATFORM

TRAINING CONTENT

JOB ROLE PATHS

CERTIFICATIONS