Gapminder World Data

The Gapminder Foundation is a non-profit that promotes sustainable development with the increased use of and understanding of statistics. The organisation gathers information about how people live in different countries, tracked across the years, across several indicators.

The Project

For this project, four variables are investigated, namely income per person (GDP/capita, PPP$ inflation-adjusted), fixed-line subscribers (per 100 people), cell phone (per 100 people) and broadband subscribers (per 100 people). An additional dataset is used to supplement country-level geographical data. Further details on the metrics mentioned above and how they were collected can be found in the links below.

The analysis is conducted using Jupyter Notebook running on a Python kernel.

The Code and the Report

Summary of Main Findings

  1. Income has grown steadily across the world since the turn of the century.

  2. As the primary mode of communication, fixed-line connections have declined in usage across the world.

  3. After explosive growth, phone line connections have begun to show signs of saturation.

  4. Broadband connections continue to grow as a mode of communication across the globe.

  5. Across the different channels of communication compared, phone line connections display the most equitable distribution across countries.

  6. Europe is the most connected region in the world across all of the variables tracked.

  7. Africa is the only continent to display a declining trend in the number of broadband connections.

  8. There is a positive correlation between income and the number of fixed and broadband connections.

  9. There is also evidence suggesting that the number of fixed-line connections a country possesses is positively correlated with the number of broadband connections.

Limitations

Treatment of Missing Values: Missing values were interpolated under the assumption that the relationship is linear. Formal statistical techniques can be applied to assess the validity of such a claim.

Better yet, an investigation into the causes of the missing values may reveal systematic bias. In other words, an assessment could be made to evaluate whether the values are missing at random. It may be that missing values are a placeholder for the value zero. For example, it is entirely plausible that the missing value for Afghanistan in 1998 under broadband connectivity is another way of stating that broadband was absent from the country at that moment in time. Replacing the missing value with the numeric zero would, therefore, be an accurate representation of reality.

Outlier Treatment: The numerical summary, as well as the plot of distributions, revealed outliers. Suffice it to say here that entire literature has developed around investigating the cause and proper treatment outliers.

References

  1. https://www.gapminder.org/data/

  2. https://www.gapminder.org/data/documentation/gd001/

  3. https://data.worldbank.org/indicator/IT.NET.BBND.P2

  4. https://data.worldbank.org/indicator/IT.CEL.SETS.P2

  5. https://data.worldbank.org/indicator/IT.NET.USER.ZS

Previous
Previous

Fuel Economy

Next
Next

Analyse A/B Test Results