Compare Johns Hopkins and RKI data for Germany¶
In [1]:
%config InlineBackend.figure_formats = ['svg']
%matplotlib inline
# or try ""%matplotlib notebook" for more interactive exploration
import datetime
import matplotlib.pyplot as plt
import pandas as pd
import oscovida as ov
In [2]:
ov.display_binder_link("compare-rki-and-johns-hopkins-data.ipynb")
In [3]:
print(f"Notebook created:  01 November 2020.")
print(f"Notebook executed: {datetime.datetime.now().strftime('%d %B %Y')}.")
Load data from Johns Hopkins university:¶
In [4]:
cases_jh, deaths_jh = ov.get_country_data("Germany")
Load data from RKI¶
In [5]:
germany = ov.fetch_data_germany()
germany.head()
Out[5]:
The meaning of the entries in the RKI data are explained at https://www.arcgis.com/home/item.html?id=f10774f1c63e40168479a1feb6c7ca74
Here, we look at one particular item that is the date attached to each row reported in the file. There is:
Meldedatum: data at which the health authoraties became aware of the caseRefdatum: reference date, which is "Erkrankungsdatum" (date of getting sick), or - if not know - theMeldedatum.
From the Johns Hopkins University there is only one data available for the numbers of cases and deaths for the whole of Germany.
In [6]:
g2 = germany.set_index(pd.to_datetime(germany['Meldedatum']))
g2.index.name = 'date'
g3 = g2.groupby('date').agg('sum')
cases_rki_mel = g3["AnzahlFall"].groupby('date').agg('sum').cumsum()
deaths_rki_mel = g3["AnzahlTodesfall"].groupby('date').agg('sum').cumsum()
In [7]:
g4 = germany.set_index(pd.to_datetime(germany['Refdatum']))
g4.index.name = 'date'
g5 = g4.groupby('date').agg('sum')
cases_rki_ref = g5["AnzahlFall"].groupby('date').agg('sum').cumsum()
deaths_rki_ref = g5["AnzahlTodesfall"].groupby('date').agg('sum').cumsum()
Comparative plots¶
In [8]:
fig, ax = plt.subplots()
ax.plot(cases_rki_ref.index, cases_rki_ref.values, "-", 
        label="RKI Refdatum", color="yellow")
ax.plot(cases_rki_mel.index, cases_rki_mel.values, "-", 
        label="RKI Meldedatum", color="red")
ax.plot(cases_jh.index, cases_jh.values, ":", 
        label="JHU", linewidth=2,  color="blue")
ax.legend()
ax.set_title("cases")
fig.autofmt_xdate()
In [9]:
fig, ax = plt.subplots()
ax.bar(cases_rki_ref.index, cases_rki_ref.diff().values, alpha=1.0, 
       label="RKI Refdatum", color="yellow")
ax.bar(cases_rki_mel.index, cases_rki_mel.diff().values, alpha=0.5, 
       label="RKI Meldedatum", color="red")
ax.bar(cases_jh.index, cases_jh.diff().values, alpha=0.3, 
       label="JHU", color="blue")
ax.legend()
ax.set_title("daily new cases")
ax.set_xlim(left = pd.to_datetime("2020-03-01"))
fig.autofmt_xdate()
Observations cases¶
- The total number of cases for JHU and RKI data is similar
 - The JHU data is more similar to the 
Meldedatumthan theRefdatumin the RKI data 
In [10]:
fig, ax = plt.subplots()
ax.plot(deaths_rki_ref.index, deaths_rki_ref.values, "-", 
        label="RKI Refdatum", color="yellow")
ax.plot(deaths_rki_mel.index, deaths_rki_mel.values, "-", 
        label="RKI Meldedatum", color="red")
ax.plot(deaths_jh.index, deaths_jh.values, ":", 
        label="JHU", color="blue")
ax.legend()
ax.set_title("Deaths")
fig.autofmt_xdate()
In [11]:
fig, ax = plt.subplots()
ax.bar(deaths_rki_ref.index, deaths_rki_ref.diff().values, alpha=1.0, 
       label="RKI Refdatum", color="yellow")
ax.bar(deaths_rki_mel.index, deaths_rki_mel.diff().values, alpha=0.5, 
       label="RKI Meldedatum", color="red")
ax.bar(deaths_jh.index, deaths_jh.diff().values, alpha=0.3, 
       label="JHU", color="blue")
ax.legend()
ax.set_title("daily new cases")
ax.set_xlim(left = pd.to_datetime("2020-03-01"), 
            right=pd.to_datetime("2020-06-01"))
fig.autofmt_xdate()
Observations deaths¶
The total number of deaths reported show differences in the associated dates:
- RKI reference data has the dates furthest in the past
 - RKI Meldedatum follows this (with several days delay - see plot above)
 - The JHU data has even more recent dates attached
 
The difference between cases numbers reported from JHU and RKI is smaller than the differences regarding the deaths.