Working with Pandas

About

Pandas is Python’s ETL package for structured data and is built on top of NumPy, designed to mimic the functionality of R data frames. It provides a convenient way to handle tabular data and can perform all SQL functionalities, including group-by and join. Furthermore, it's compatible with many other Data Science packages, including visualization packages such as Matplotlib and Seaborn.

In this lesson, we are going to explore Pandas and show you how it can be used as a powerful data manipulation tool. We'll start off by looking at arrays, queries, and dataframes, and then we'll look specifically at the groupby function before rounding off the lesson by looking at how the merge and join methods can be used in Pandas.

If you have any feedback relating to this lesson, feel free to tell us about it at support@cloudacademy.com.

Learning Objectives

Understand the fundamentals of the Pandas library in Python and how it is used to handle data
Learn how to work with arrays, queries, and dataframes
Learn how to use the groupby, merge, and join methods in Pandas

Intended Audience

This Lesson is intended for data engineers, data scientists, or anyone who wants to use the Pandas library in Python for handling data.

Pre-requisites

To get the most out of this lesson, you should already have a good working knowledge of Python and data visualization techniques.

Resources

The dataset(s) used in the lesson can be found in the following GitHub repo: https://github.com/cloudacademy/practical-data-science-python

Unit UUID

Course UUID

This content is developed in partnership with QA

Learn more