In this hands-on lab, you will master your knowledge of PySpark, a very popular Python library for big data analysis and modeling. Here, you will learn how to create a machine learning pipeline using the PySpark library, and to perform metric evaluation and model tuning.
Your machine learning skills will be challenged, and by the end of this lab, you should have a deep understanding of how PySpark practically works to build data analysis pipelines.
fit a Logistic Regression model in PySpark;
perform cross-validation in PySPark;
evaluate the model performances;
perform inference on new, unseen data.
This lab is intended for:
You should possess:
Updates
May 9th, 2023 - Updated instructions to address logging in