Germany: Comparing data from Johns Hopkins University and Robert Koch Institute

In [1]:
%config InlineBackend.figure_formats = ['svg']
import datetime
import numpy as np
import pandas as pd
import oscovida as ov
import matplotlib.pyplot as plt
# clear the local cache, i.e. force re-download of data sets
# ov.clear_cache()
ov.display_binder_link("2022-germany-rki-overview.ipynb")

print(f"Last executed: {datetime.datetime.today()}")
Last executed: 2022-09-19 17:11:58.027494

Get data from Johns Hopkins University (JHU)

In [2]:
cases_jhu, deaths_jhu = ov.get_country_data("Germany")

Get data from Robert-Koch Institute (RKI)

In [3]:
germany = ov.fetch_data_germany()

# As we want the total numbers for Germany, wwe need to accumulate over all # districts (Landkreise) and various rows for each date:
# We use 'Meldedatum' as this is expected to be closest to the JHU data
# See https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/e408ccf8878541a7ab6f6077a42fd811_0/about
g2 = germany.set_index(pd.to_datetime(germany['Meldedatum']))
g2.index.name = 'date'
g3 = g2.groupby('date').agg('sum')
cases_rki = g3["AnzahlFall"].groupby('date').agg('sum').cumsum()
deaths_rki = g3["AnzahlTodesfall"].groupby('date').agg('sum').cumsum()

Overview plot Germany with RKI data

The overview plot for Germany (http://oscovida.github.io/html/Germany.html) is based on JHU data (and for completeness attached below). Here we provide the same observables but based on the accumulated RKI data.

We expect the RKI data to severly underestimate the number of deaths in the most recent week(s) - see discussion here.

In [4]:
ov.overview(country="Germany", data=(cases_rki, deaths_rki), weeks=5);
2022-09-19T17:12:04.937186 image/svg+xml Matplotlib v3.6.0, https://matplotlib.org/ 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 250 250 300 300 350 350 7-day incidence rate (per 100K people) 248.5 Germany, last 5 weeks, last data point from 2022-09-17 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 0 20 40 60 daily change normalised per 100K 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 0.000 0.025 0.050 0.075 0.100 daily change normalised per 100K 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 0.8 0.8 0.9 0.9 1.0 1.0 1.1 1.1 1.2 1.2 R & growth factor (based on cases) Germany cases daily growth factor Germany cases daily growth factor (rolling mean) Germany estimated R (using cases) 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 0.6 0.6 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on deaths) Germany deaths daily growth factor Germany deaths daily growth factor (rolling mean) Germany estimated R (using deaths) 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 0 500 1000 cases doubling time [days] Germany doubling time cases (rolling mean) Germany doubling time deaths (rolling mean) 0 16631 33262 49893 daily change Germany new cases (rolling 7d mean) Germany new cases 0.0 20.8 41.6 62.4 83.2 daily change Germany new deaths (rolling 7d mean) Germany new deaths 0 7863 15727 deaths doubling time [days]

Overview plot Germany with JHU data (last 5 weeks)

This is the 'normal' plot that is shown on the OSCOVIDA pages, i.e. at http://oscovida.github.io/html/Germany.html :

In [5]:
ov.overview(country="Germany", weeks=5);
2022-09-19T17:12:06.598412 image/svg+xml Matplotlib v3.6.0, https://matplotlib.org/ 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 19 Sep 250 250 300 300 350 350 7-day incidence rate (per 100K people) 274.3 Germany, last 5 weeks, last data point from 2022-09-18 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 19 Sep 0 25 50 75 100 daily change normalised per 100K 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 19 Sep 0.0 0.1 0.2 0.3 daily change normalised per 100K 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 19 Sep 0.8 0.8 0.9 0.9 1.0 1.0 1.1 1.1 1.2 1.2 R & growth factor (based on cases) Germany cases daily growth factor Germany cases daily growth factor (rolling mean) Germany estimated R (using cases) 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 19 Sep 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on deaths) Germany deaths daily growth factor Germany deaths daily growth factor (rolling mean) Germany estimated R (using deaths) 15 Aug 22 Aug 29 Aug 05 Sep 12 Sep 19 Sep 0 250 500 750 1000 cases doubling time [days] Germany doubling time cases (rolling mean) Germany doubling time deaths (rolling mean) 0 20789 41578 62366 83155 daily change Germany new cases (rolling 7d mean) Germany new cases 0.0 83.2 166.3 249.5 daily change Germany new deaths (rolling 7d mean) Germany new deaths 0 311 623 934 1245 deaths doubling time [days]

Comparison of data from from JHU and RKI: cases (last 5 weeks)

In [6]:
fig, ax = plt.subplots(figsize=(10, 4))
ov.plot_daily_change(ax, cases_jhu[-7*5:], color="C1", labels=["JHU Germany", "cases"])
ov.plot_daily_change(ax, cases_rki[-7*5:], color="C3", labels=["RKI Germany", "cases"])
fig.autofmt_xdate()
2022-09-19T17:12:07.308842 image/svg+xml Matplotlib v3.6.0, https://matplotlib.org/ 2022-08-13 2022-08-17 2022-08-21 2022-08-25 2022-08-29 2022-09-01 2022-09-05 2022-09-09 2022-09-13 2022-09-17 0 0 20000 20000 40000 40000 60000 60000 80000 80000 daily change JHU Germany new cases (rolling 7d mean) RKI Germany new cases (rolling 7d mean) JHU Germany new cases RKI Germany new cases

This deviation is unusual (March 2022): in the past, the RKI showed greater lag in reporting than the JHU data.

Comparison of data from from JHU and RKI: deaths (complete pandemic)

In [7]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 4))
ov.plot_daily_change(ax, deaths_jhu, color="C0", labels=["JHU Germany", "deaths"])
ov.plot_daily_change(ax, deaths_rki, color="C4", labels=["RKI Germany", "deaths"])
fig.autofmt_xdate()
2022-09-19T17:12:11.011172 image/svg+xml Matplotlib v3.6.0, https://matplotlib.org/ 2020-01 2020-05 2020-09 2021-01 2021-05 2021-09 2022-01 2022-05 2022-09 0 0 200 200 400 400 600 600 800 800 1000 1000 1200 1200 daily change JHU Germany new deaths (rolling 7d mean) RKI Germany new deaths (rolling 7d mean) JHU Germany new deaths RKI Germany new deaths

The time delay in the reported deaths is well understood: JHU data use the date at which the death was reported, whereas RKI data uses the best available estimate of when the person was infected (so the day of deaths is not visible in that data). See detailed discussion at https://oscovida.github.io/2020-germany-reporting-delay-meldeverzug.html

Overview plot Germany with RKI data (complete pandemic)

The overview plot for Germany (http://oscovida.github.io/html/Germany.html) is based on JHU data (and for completeness attached below). Here we provide the same observables but based on the accumulated RKI data.

We expect the RKI data to severly underestimate the number of deaths in the most recent week(s) - see discussion above.

In [8]:
ov.overview(country="Germany", data=(cases_rki, deaths_rki));
2022-09-19T17:12:16.977500 image/svg+xml Matplotlib v3.6.0, https://matplotlib.org/ Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 0 0 500 500 1000 1000 1500 1500 2000 2000 7-day incidence rate (per 100K people) 248.5 Germany, last data point from 2022-09-17 Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 0 100 200 300 daily change normalised per 100K Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 0.0 0.5 1.0 1.5 daily change normalised per 100K Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 0.8 0.8 0.9 0.9 1.0 1.0 1.1 1.1 1.2 1.2 R & growth factor (based on cases)