7. Summary statistics

Calculate summary statistics

To calculate the summary statistics for the columns in our DataFrame, we can use the .describe() method. However, this will only compute columns with numerical data. If we want to include all columns, we can add “include=‘all’”. We also want to specify datetime_is_numeric=True to treat the datetime values as numeric. 

refugee_df.describe()
refugee_df.describe(include='all', datetime_is_numeric=True)

What can we glean from these summary statistics? 

  • Looking at the year column, we get confirmation that our data starts in 2005 and ends in 2015. 
  • Looking at the origin column, we learn that refugees that were resettled in the U.S. during the 2005 – 2015 period came from 113 unique countries of origin, with Iraq being the most common country of origin. 
  • Looking at the dest_state column, we learn that California is the state where most refugees resettled during the 2005 – 2015 period. We also notice that there are 52 unique states in the dataset, which may include Washington D.C. and Puerto Rico. We will need to investigate this further in a moment. 
  • Looking at the dest-city column, we can see that, among the 2,850 unique cities, Denver is the city that resettled the highest number of refugees during the 2005 – 2015 period.
  • Looking at the arrivals column, we can see that the average mean resettlement of refugees by country, per year, per state/city location was 5.5, which is to say about 5-6 refugees on average. The max number of refugees resettled from the same country, in the same year, to the same state/city location was 2,813.

Lesson 8