Background

The main aim of this post is to show how the Seaborn package can be used to simplify visualisation of statistical data.

The data and its corresponding visualisations tell a very important story, but this time I will desist from directly commenting on, or interpreting the story being told by the data, as this is a highly emotive subject.

Dataset

This dataset is a derivative of Reinhart et. al’s Global Financial Stability dataset which can be found online at: https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx

The dataset will be valuable to those who seek to understand the dynamics of financial stability within the African context.

Context

The dataset specifically focuses on the Banking, Debt, Financial, Inflation and Systemic Crises that occurred, from 1860 to 2014, in 13 African countries, including: Algeria, Angola, Central African Republic, Ivory Coast, Egypt, Kenya, Mauritius, Morocco, Nigeria, South Africa, Tunisia, Zambia and Zimbabwe.

Acknowledgements

  • Kaggle Dataset (22 Nov 2019)

    • Reinhart, C., Rogoff, K., Trebesch, C. and Reinhart, V. (2019) Global Crises Data by Country. [online] https://www.hbs.edu/behavioral-finance-and-financial-stability/data. Available at: https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx [Accessed: 17 July 2019].
  • Kaggle kernel (22 Nov 2019)


Now, without further adieu…

Topographic Map of Africa

Import Relevant Libraries

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

import seaborn as sns
sns.set('notebook')
sns.set_style('darkgrid')

Load data

raw = pd.read_csv('african_crises.csv', index_col='year', parse_dates=True)
display(raw.info())
raw.sample(3)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1059 entries, 1870-01-01 to 2013-01-01
Data columns (total 13 columns):
case                               1059 non-null int64
cc3                                1059 non-null object
country                            1059 non-null object
systemic_crisis                    1059 non-null int64
exch_usd                           1059 non-null float64
domestic_debt_in_default           1059 non-null int64
sovereign_external_debt_default    1059 non-null int64
gdp_weighted_default               1059 non-null float64
inflation_annual_cpi               1059 non-null float64
independence                       1059 non-null int64
currency_crises                    1059 non-null int64
inflation_crises                   1059 non-null int64
banking_crisis                     1059 non-null object
dtypes: float64(3), int64(7), object(3)
memory usage: 115.8+ KB



None
case cc3 country systemic_crisis exch_usd domestic_debt_in_default sovereign_external_debt_default gdp_weighted_default inflation_annual_cpi independence currency_crises inflation_crises banking_crisis
year
1955-01-01 69 ZMB Zambia 0 0.000714 0 0 0.0 3.571429 0 0 0 no_crisis
1984-01-01 35 KEN Kenya 0 15.781300 0 0 0.0 20.667000 1 0 1 no_crisis
2001-01-01 56 ZAF South Africa 0 12.126500 0 0 0.0 5.700000 1 1 0 no_crisis
  • No missing data :)
  • Mostly numerical data

Correlation Matrix

LabelEncoder

Let’s see if any of the features are correlated:

  • Let’s first convert the non-numerical columns to numerical.

Enter: labelEncoder

numerical = raw.copy() # first, copy our raw dataframe
from sklearn.preprocessing import LabelEncoder
num = LabelEncoder()
numerical.cc3 = num.fit_transform(numerical.cc3)
numerical.country = num.fit_transform(numerical.country)
numerical.banking_crisis = num.fit_transform(numerical.banking_crisis)
numerical.head()
case cc3 country systemic_crisis exch_usd domestic_debt_in_default sovereign_external_debt_default gdp_weighted_default inflation_annual_cpi independence currency_crises inflation_crises banking_crisis
year
1870-01-01 1 3 0 1 0.052264 0 0 0.0 3.441456 0 0 0 0
1871-01-01 1 3 0 0 0.052798 0 0 0.0 14.149140 0 0 0 1
1872-01-01 1 3 0 0 0.052274 0 0 0.0 -3.718593 0 0 0 1
1873-01-01 1 3 0 0 0.051680 0 0 0.0 11.203897 0 0 0 1
1874-01-01 1 3 0 0 0.051308 0 0 0.0 -3.848561 0 0 0 1

Correlation

corr = numerical.corr()
corr
case cc3 country systemic_crisis exch_usd domestic_debt_in_default sovereign_external_debt_default gdp_weighted_default inflation_annual_cpi independence currency_crises inflation_crises banking_crisis
case 1.000000 0.964105 0.990553 0.010991 -0.231976 0.128358 -0.039262 -0.032981 0.044762 0.021858 0.095339 0.006405 0.023652
cc3 0.964105 1.000000 0.946147 -0.012692 -0.312222 0.134268 -0.082447 -0.007799 0.048917 0.012709 0.090759 0.003644 0.041981
country 0.990553 0.946147 1.000000 0.015586 -0.198953 0.155659 -0.000455 -0.041843 0.049184 0.013308 0.097166 0.016491 0.014667
systemic_crisis 0.010991 -0.012692 0.015586 1.000000 0.202687 0.122158 0.249850 0.005274 0.106452 0.147083 0.112751 0.172562 -0.853702
exch_usd -0.231976 -0.312222 -0.198953 0.202687 1.000000 0.005253 0.422890 -0.040726 -0.011947 0.126034 -0.056472 -0.063783 -0.168775
domestic_debt_in_default 0.128358 0.134268 0.155659 0.122158 0.005253 1.000000 0.464751 -0.029874 0.151832 0.109120 0.227585 0.224429 -0.225797
sovereign_external_debt_default -0.039262 -0.082447 -0.000455 0.249850 0.422890 0.464751 1.000000 0.345919 0.072609 0.228192 0.199428 0.187930 -0.263992
gdp_weighted_default -0.032981 -0.007799 -0.041843 0.005274 -0.040726 -0.029874 0.345919 1.000000 -0.004535 0.078936 0.016970 0.017630 -0.026545
inflation_annual_cpi 0.044762 0.048917 0.049184 0.106452 -0.011947 0.151832 0.072609 -0.004535 1.000000 0.016569 0.076590 0.080060 -0.098860
independence 0.021858 0.012709 0.013308 0.147083 0.126034 0.109120 0.228192 0.078936 0.016569 1.000000 0.086376 -0.022548 -0.159620
currency_crises 0.095339 0.090759 0.097166 0.112751 -0.056472 0.227585 0.199428 0.016970 0.076590 0.086376 1.000000 0.393376 -0.166859
inflation_crises 0.006405 0.003644 0.016491 0.172562 -0.063783 0.224429 0.187930 0.017630 0.080060 -0.022548 0.393376 1.000000 -0.235852
banking_crisis 0.023652 0.041981 0.014667 -0.853702 -0.168775 -0.225797 -0.263992 -0.026545 -0.098860 -0.159620 -0.166859 -0.235852 1.000000

Matrix

Let’s invoke a heatmap for better visualisation of the correlations. This is a correlation matrix:

plt.figure(figsize=(20,20))
sns.heatmap(corr, cmap='coolwarm',annot=True, fmt='.2f', annot_kws={'size' : 18})
<matplotlib.axes._subplots.AxesSubplot at 0x1120cc240>

Correlation Matrix

We see that the first 3 columns are highly correlated with each other. This is simply because they contain the same information. We will thus have to drop two and keep one as an identifier for country. For visualisation purposes, we will need the country name.

  • Drop case and cc3 as they contain same information as country
  • Textual data: Country name and banking crisis: y/n.

However, when it comes to predictive analysis, we will have to either encode the country column, or instead use the case column, as it is already numerical!

For now, we go back to the raw variable, drop the case and cc3 columns and perform more EDA.

raw.drop(raw.loc[:,'case' : 'cc3'], axis=1, inplace=True)

Trends

Exchange Rate vs. USD

The USD is a standard unit of comparison for a currency’s strength. We start by observing patterns, overtime, in each country’s currency exchange rate to the dollar

countries = raw.country.unique() # List of countries in the dataset

plt.figure(figsize=(18,18)) # create empty figure with these dimensions
plt.title('Currency exchange rate vs. USD over time') # title the plots
for ind, country in enumerate(countries): #index, country
    plt.subplot(5,3,ind+1) # add a plot box in the figure for each country
    exch = raw[raw.country==country].exch_usd # country's exchange rate to the dollar

    sns.lineplot(data= exch, label=str(country), marker='o', color=np.random.rand(3,)) # plot the trend
    plt.ylabel('Exchange rate vs. USD')
    # when did the country gain independence?
    independence = min(raw[raw.country==country].independence[raw[raw.country==country].independence==1].index)
    plt.axvline(independence, color='green', linestyle="--", label='Independence')

    plt.legend(loc='best')
plt.show()

Exchange rate to USD

Inflation Rate

plt.figure(figsize=(18,18)) # create empty figure with these dimensions
plt.title('Annual inflation Rate') # title the plots
for ind, country in enumerate(countries): #index, country
    plt.subplot(5,3,ind+1) # add a plot box in the figure for each country
    infl = raw[raw.country==country].inflation_annual_cpi # country's exchange rate to the dollar
    sns.lineplot(data= infl[0:-10], label=str(country), marker='o', color=np.random.rand(3,)) # plot the trend
    plt.ylabel('Annual inflation rate')
    # when did the country gain independence?
    independence = min(raw[raw.country==country].independence[raw[raw.country==country].independence==1].index)
    plt.axvline(independence, color='green', linestyle="--", label='Independence')

    plt.legend(loc='best')
plt.show()

Annual average Inflation rate

Average exchange rate vs. independence

plt.figure(figsize=(10,8))
rawn = raw.copy()
#rawn = rawn[rawn.gdp_weighted_default>0]
plt.title('exchange rates vs. USD for African countries before 1960')
rawn = rawn[rawn.index<'1930']
sns.scatterplot(x=rawn.index, y=rawn.exch_usd, hue=rawn.independence)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2db26c50>

Exchange rates before independence

plt.figure(figsize=(10,8))
rawn = raw.copy()
#rawn = rawn[rawn.gdp_weighted_default>0]
plt.title('exchange rates vs. USD for African countries after 1930')
rawn = rawn[rawn.index>'1930']
sns.scatterplot(x=rawn.index, y=rawn.exch_usd, hue=rawn.independence)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2c518c50>

Exchange rates after independence

plt.figure(figsize=(15,15))
rawn = raw.copy()

#plt.title('exchange rates vs. USD for African countries 1860 - 2014')
sns.scatterplot(x=rawn.index, y=rawn.exch_usd, hue=rawn.independence)
plt.show()

Independent vs. colonised exchange rates

In general, we see that the value of an African country’s currency, on average, plummetted after independence, which generally occurred in the middle of the 20th century

This is confirmed by the plot below, which shows a steeper rise in exchange rates after independence

rawn['years'] = [float(i) for i in rawn.index.year]

plt.figure(figsize=(60,40))
from matplotlib import rcParams

# figure size in inches
rcParams['figure.figsize'] = 21.7,40.27

sns.lmplot(x='years', y='exch_usd', hue='independence', data=rawn,
           markers=['*', '.'], height=10, aspect=1) #)
plt.legend(loc='best')
plt.show()
<Figure size 4320x2880 with 0 Axes>

Independent vs. colonised exchange rates


sns.lmplot(x='years', y='inflation_annual_cpi', hue='independence', data=rawn,
           markers=['*', '.'], height=10, aspect=1) #)
plt.ylim(-100000, 100000)
plt.show()

Independent vs. colonised annual average inflation

The annual inflation rates also illustrate this point. However, removing Zimbabwe from this analysis might yield a more sensible result:

rawz = rawn.copy()
rawz = rawz[rawz.country!='Zimbabwe']

sns.lmplot(x='years', y='inflation_annual_cpi', hue='independence', data=rawz,
           markers=['*', '.'], height=10, aspect=1) #)
plt.ylim(-100, 200)
plt.show()

Annual Average inflation, without Zimbabwe

Crises and debt counts for each of the countries

counts = [raw.columns[i] for i,j in enumerate(raw.dtypes) if j in ['int64', 'O']][1:] # Non-continuous numerical columns (excluding Country)

plt.figure(figsize=(25,25))
plt.title('Debt and crises for each country')
for ind, count in enumerate(counts):
    plt.subplot(4,2,ind+1) # add a plot box in the figure for each country
    plt.title(count)
    sns.countplot(y=raw.country, hue=raw[count])
plt.show()

Crises and debt counts

That’s all for now. Thank you for the read! That's all folks