Working with Pandas
Pandas is Python’s ETL package for structured data and is built on top of NumPy, designed to mimic the functionality of R data frames. It provides a convenient way to handle tabular data and can perform all SQL functionalities, including group-by and join. Furthermore, it's compatible with many other Data Science packages, including visualization packages such as Matplotlib and Seaborn.
In this lesson, we are going to explore Pandas and show you how it can be used as a powerful data manipulation tool. We'll start off by looking at arrays, queries, and dataframes, and then we'll look specifically at the groupby function before rounding off the lesson by looking at how the merge and join methods can be used in Pandas.
If you have any feedback relating to this lesson, feel free to tell us about it at support@cloudacademy.com.
Learning Objectives
- Understand the fundamentals of the Pandas library in Python and how it is used to handle data
- Learn how to work with arrays, queries, and dataframes
- Learn how to use the groupby, merge, and join methods in Pandas
Intended Audience
This Lesson is intended for data engineers, data scientists, or anyone who wants to use the Pandas library in Python for handling data.
Pre-requisites
To get the most out of this lesson, you should already have a good working knowledge of Python and data visualization techniques.
Resources
The dataset(s) used in the lesson can be found in the following GitHub repo: https://github.com/cloudacademy/practical-data-science-python