Compare Johns Hopkins and RKI data for Germany

%config InlineBackend.figure_formats = ['svg']
# or try ""%matplotlib notebook" for more interactive exploration

import datetime
import matplotlib.pyplot as plt
import pandas as pd
import oscovida as ov
print(f"Notebook created:  01 November 2020.")
print(f"Notebook executed: {'%d %B %Y')}.")
Notebook created:  01 November 2020.
Notebook executed: 07 March 2023.

Load data from Johns Hopkins university:

cases_jh, deaths_jh = ov.get_country_data("Germany")

Load data from RKI

germany = ov.fetch_data_germany()
FID IdBundesland Bundesland Landkreis Altersgruppe Geschlecht AnzahlFall AnzahlTodesfall Meldedatum IdLandkreis Datenstand NeuerFall NeuerTodesfall Refdatum NeuGenesen AnzahlGenesen IstErkrankungsbeginn Altersgruppe2
2020-10-28 1 1 Schleswig-Holstein SK Flensburg A15-A34 M 1 0 2020/10/28 00:00:00 1001 07.03.2023, 00:00 Uhr 0 -9 2020/01/19 00:00:00 0 1 1 Nicht übermittelt
2020-03-19 2 1 Schleswig-Holstein SK Flensburg A15-A34 M 1 0 2020/03/19 00:00:00 1001 07.03.2023, 00:00 Uhr 0 -9 2020/03/13 00:00:00 0 1 1 Nicht übermittelt
2020-03-21 3 1 Schleswig-Holstein SK Flensburg A15-A34 M 1 0 2020/03/21 00:00:00 1001 07.03.2023, 00:00 Uhr 0 -9 2020/03/13 00:00:00 0 1 1 Nicht übermittelt
2020-03-19 4 1 Schleswig-Holstein SK Flensburg A15-A34 M 1 0 2020/03/19 00:00:00 1001 07.03.2023, 00:00 Uhr 0 -9 2020/03/16 00:00:00 0 1 1 Nicht übermittelt
2020-03-14 5 1 Schleswig-Holstein SK Flensburg A35-A59 M 1 0 2020/03/14 00:00:00 1001 07.03.2023, 00:00 Uhr 0 -9 2020/03/16 00:00:00 0 1 1 Nicht übermittelt

The meaning of the entries in the RKI data are explained at

Here, we look at one particular item that is the date attached to each row reported in the file. There is:

  • Meldedatum: data at which the health authoraties became aware of the case
  • Refdatum: reference date, which is "Erkrankungsdatum" (date of getting sick), or - if not know - the Meldedatum.

From the Johns Hopkins University there is only one data available for the numbers of cases and deaths for the whole of Germany.

g2 = germany.set_index(pd.to_datetime(germany['Meldedatum'])) = 'date'
g3 = g2.groupby('date').agg('sum')
cases_rki_mel = g3["AnzahlFall"].groupby('date').agg('sum').cumsum()
deaths_rki_mel = g3["AnzahlTodesfall"].groupby('date').agg('sum').cumsum()
g4 = germany.set_index(pd.to_datetime(germany['Refdatum'])) = 'date'
g5 = g4.groupby('date').agg('sum')
cases_rki_ref = g5["AnzahlFall"].groupby('date').agg('sum').cumsum()
deaths_rki_ref = g5["AnzahlTodesfall"].groupby('date').agg('sum').cumsum()
Comparative plots

fig, ax = plt.subplots()
ax.plot(cases_rki_ref.index, cases_rki_ref.values, "-", 
        label="RKI Refdatum", color="yellow")
ax.plot(cases_rki_mel.index, cases_rki_mel.values, "-", 
        label="RKI Meldedatum", color="red")
ax.plot(cases_jh.index, cases_jh.values, ":", 
        label="JHU", linewidth=2,  color="blue")

fig, ax = plt.subplots(), cases_rki_ref.diff().values, alpha=1.0, 
       label="RKI Refdatum", color="yellow"), cases_rki_mel.diff().values, alpha=0.5, 
       label="RKI Meldedatum", color="red"), cases_jh.diff().values, alpha=0.3, 
       label="JHU", color="blue")
ax.set_title("daily new cases")
ax.set_xlim(left = pd.to_datetime("2020-03-01"))

Observations cases

  • The total number of cases for JHU and RKI data is similar
  • The JHU data is more similar to the Meldedatum than the Refdatum in the RKI data
fig, ax = plt.subplots()
ax.plot(deaths_rki_ref.index, deaths_rki_ref.values, "-", 
        label="RKI Refdatum", color="yellow")
ax.plot(deaths_rki_mel.index, deaths_rki_mel.values, "-", 
        label="RKI Meldedatum", color="red")
ax.plot(deaths_jh.index, deaths_jh.values, ":", 
        label="JHU", color="blue")

fig, ax = plt.subplots(), deaths_rki_ref.diff().values, alpha=1.0, 
       label="RKI Refdatum", color="yellow"), deaths_rki_mel.diff().values, alpha=0.5, 
       label="RKI Meldedatum", color="red"), deaths_jh.diff().values, alpha=0.3, 
       label="JHU", color="blue")
ax.set_title("daily new cases")
ax.set_xlim(left = pd.to_datetime("2020-03-01"), 

Observations deaths

The total number of deaths reported show differences in the associated dates:

  • RKI reference data has the dates furthest in the past
  • RKI Meldedatum follows this (with several days delay - see plot above)
  • The JHU data has even more recent dates attached

The difference between cases numbers reported from JHU and RKI is smaller than the differences regarding the deaths.