Pandas is Python’s ETL package for structured data and is built on top of NumPy, designed to mimic the functionality of R data frames. It provides a convenient way to handle tabular data and can perform all SQL functionalities, including group-by and join. Furthermore, it's compatible with many other Data Science packages, including visualization packages such as Matplotlib and Seaborn.
In this lesson, we are going to explore Pandas and show you how it can be used as a powerful data manipulation tool. We'll start off by looking at arrays, queries, and dataframes, and then we'll look specifically at the groupby function before rounding off the lesson by looking at how the merge and join methods can be used in Pandas.
If you have any feedback relating to this lesson, feel free to tell us about it at support@cloudacademy.com.
This Lesson is intended for data engineers, data scientists, or anyone who wants to use the Pandas library in Python for handling data.
To get the most out of this lesson, you should already have a good working knowledge of Python and data visualization techniques.
The dataset(s) used in the lesson can be found in the following GitHub repo: https://github.com/cloudacademy/practical-data-science-python