In this workshop, we are going to learn some basic commands in Pandas, an expansive Python library for working with tabular data like CSV files. You can think of Pandas as a more powerful version of Excel that operates within the Python environment, where you can wrangle, clean, analyze, and visualize data. Knowing how to use Pandas is important if you plan on working with datasets that include qualitative and/or quantitative data points.
Throughout the lessons in this workshop, we will interact with the Pandas library using Jupyter Notebooks to analyze a dataset on refugee arrivals to the United States between 2005 and 2015. Specifically, you will learn how to:
- Import Pandas and read in a CSV file as a DataFrame
- Explore your data, including displaying and sampling the data
- Clean your data, including checking for duplicates and converting data types
- Review and interpret summary statistics
- Filter your data, including renaming, selecting, dropping, and adding columns
- Analyze your data by sorting columns, grouping columns, and counting values
- Visualize your data with basic bar charts, pie charts, and time series
- Write a DataFrame to a CSV file
- Build your Pandas skills with the Pandas documentation and other resources
Read before you get started:
- Short Introduction to Jupyter Notebooks
- A Beginner’s Tutorial to Jupyter Notebooks
- Guide To Data Cleaning: Definition, Benefits, Components, And How To Clean Your Data
- Introduction to Python
Projects related to Data Analysis with Python and Pandas:
- The Simplest Data Science Project Using Pandas & Matplotlib
- Performing Sentiment Analysis Using Twitter Data
- Introduction to Data Visualization in Python with Pandas
Cheat sheets:


