5. Data exploration – Introduction to Data Wrangling, Cleaning, Analysis, and Visualization with Python and Pandas

Display data

To display the DataFrame, we can run a cell with the variable name of the DataFrame

refugee_df

Let’s take a look at a few elements in this DataFame:

Index
- The bolded ascending numbers in the very left-hand column of the DataFrame is called the Pandas Index. You can select rows based on the Index.
- By default, the Index is a sequence of numbers starting with zero. However, you can change the Index to something else, such as one of the columns in your dataset.
- The index is a Unique ID
Truncation
- The DataFrame is truncated, signaled by the ellipses in the middle … of every column.
- The DataFrame is truncated because we set our default display settings to 100 rows. Anything more than 100 rows will be truncated. To display all the rows, we would need to alter Pandas’ default display settings again.
Rows x Columns
- Pandas reports how many rows and columns are in this dataset at the bottom of the output. Our DataFrame has 121,245 rows × 5 columns.
NAN
- NaN is the Pandas value for any missing data.

We can also display the first n rows of the DataFrame with the .head() method

refugee_df.head(2)

refugee_df.head(15)

We can also look at a random sample of data with the .sample() method

refugee_df.sample(15)

Terms used in lesson:

.head(): .head() is a method in the Pandas library that will display the top n rows of a DataFrame.

.sample(): .sample() is a method in the Pandas library that will display a random sample of n rows in a DataFrame.

NaN: NaN is the Pandas value for any missing data.