Germany: Comparing data from Johns Hopkins University and Robert Koch Institute

In [1]:
%config InlineBackend.figure_formats = ['svg']
import datetime
import numpy as np
import pandas as pd
import oscovida as ov
import matplotlib.pyplot as plt
# clear the local cache, i.e. force re-download of data sets
# ov.clear_cache()
ov.display_binder_link("2022-germany-rki-overview.ipynb")

print(f"Last executed: {datetime.datetime.today()}")
Last executed: 2023-01-26 16:53:48.287862

Get data from Johns Hopkins University (JHU)

In [2]:
cases_jhu, deaths_jhu = ov.get_country_data("Germany")

Get data from Robert-Koch Institute (RKI)

In [3]:
germany = ov.fetch_data_germany()

# As we want the total numbers for Germany, wwe need to accumulate over all # districts (Landkreise) and various rows for each date:
# We use 'Meldedatum' as this is expected to be closest to the JHU data
# See https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/e408ccf8878541a7ab6f6077a42fd811_0/about
g2 = germany.set_index(pd.to_datetime(germany['Meldedatum']))
g2.index.name = 'date'
g3 = g2.groupby('date').agg('sum')
cases_rki = g3["AnzahlFall"].groupby('date').agg('sum').cumsum()
deaths_rki = g3["AnzahlTodesfall"].groupby('date').agg('sum').cumsum()
/tmp/ipykernel_3403968/3860661022.py:8: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  g3 = g2.groupby('date').agg('sum')

Overview plot Germany with RKI data

The overview plot for Germany (http://oscovida.github.io/html/Germany.html) is based on JHU data (and for completeness attached below). Here we provide the same observables but based on the accumulated RKI data.

We expect the RKI data to severly underestimate the number of deaths in the most recent week(s) - see discussion here.

In [4]:
ov.overview(country="Germany", data=(cases_rki, deaths_rki), weeks=5);
2023-01-26T16:53:56.935590 image/svg+xml Matplotlib v3.6.3, https://matplotlib.org/ 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 100 100 150 150 200 200 250 250 7-day incidence rate (per 100K people) 74.4 Germany, last 5 weeks, last data point from 2023-01-25 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0 20 40 daily change normalised per 100K 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0.0 0.1 0.2 daily change normalised per 100K 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on cases) Germany cases daily growth factor Germany cases daily growth factor (rolling mean) Germany estimated R (using cases) 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0.6 0.6 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on deaths) Germany deaths daily growth factor Germany deaths daily growth factor (rolling mean) Germany estimated R (using deaths) 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0 1000 2000 3000 4000 cases doubling time [days] Germany doubling time cases (rolling mean) Germany doubling time deaths (rolling mean) 0 16631 33262 daily change Germany new cases (rolling 7d mean) Germany new cases 0.0 83.2 166.3 daily change Germany new deaths (rolling 7d mean) Germany new deaths 0 5176 10353 15529 20706 deaths doubling time [days]

Overview plot Germany with JHU data (last 5 weeks)

This is the 'normal' plot that is shown on the OSCOVIDA pages, i.e. at http://oscovida.github.io/html/Germany.html :

In [5]:
ov.overview(country="Germany", weeks=5);
2023-01-26T16:53:59.266825 image/svg+xml Matplotlib v3.6.3, https://matplotlib.org/ 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 100 100 150 150 200 200 250 250 7-day incidence rate (per 100K people) 78.4 Germany, last 5 weeks, last data point from 2023-01-25 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0 20 40 60 80 daily change normalised per 100K 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0.0 0.1 0.2 0.3 0.4 daily change normalised per 100K 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0.6 0.6 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on cases) Germany cases daily growth factor Germany cases daily growth factor (rolling mean) Germany estimated R (using cases) 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0.8 0.8 1.0 1.0 1.2 1.2 1.4 1.4 R & growth factor (based on deaths) Germany deaths daily growth factor Germany deaths daily growth factor (rolling mean) Germany estimated R (using deaths) 26 Dec 02 Jan 09 Jan 16 Jan 23 Jan 0 1000 2000 3000 4000 cases doubling time [days] Germany doubling time cases (rolling mean) Germany doubling time deaths (rolling mean) 0 16631 33262 49893 66524 daily change Germany new cases (rolling 7d mean) Germany new cases 0.0 83.2 166.3 249.5 332.6 daily change Germany new deaths (rolling 7d mean) Germany new deaths 0 258 517 775 1034 deaths doubling time [days]

Comparison of data from from JHU and RKI: cases (last 5 weeks)

In [6]:
fig, ax = plt.subplots(figsize=(10, 4))
ov.plot_daily_change(ax, cases_jhu[-7*5:], color="C1", labels=["JHU Germany", "cases"])
ov.plot_daily_change(ax, cases_rki[-7*5:], color="C3", labels=["RKI Germany", "cases"])
fig.autofmt_xdate()
2023-01-26T16:54:00.233211 image/svg+xml Matplotlib v3.6.3, https://matplotlib.org/ 2022-12-21 2022-12-25 2022-12-29 2023-01-01 2023-01-05 2023-01-09 2023-01-13 2023-01-17 2023-01-21 2023-01-25 0 0 10000 10000 20000 20000 30000 30000 40000 40000 50000 50000 60000 60000 70000 70000 daily change JHU Germany new cases (rolling 7d mean) RKI Germany new cases (rolling 7d mean) JHU Germany new cases RKI Germany new cases

This deviation is unusual (March 2022): in the past, the RKI showed greater lag in reporting than the JHU data.

Comparison of data from from JHU and RKI: deaths (complete pandemic)

In [7]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 4))
ov.plot_daily_change(ax, deaths_jhu, color="C0", labels=["JHU Germany", "deaths"])
ov.plot_daily_change(ax, deaths_rki, color="C4", labels=["RKI Germany", "deaths"])
fig.autofmt_xdate()
2023-01-26T16:54:06.662710 image/svg+xml Matplotlib v3.6.3, https://matplotlib.org/ 2020-01 2020-05 2020-09 2021-01 2021-05 2021-09 2022-01 2022-05 2022-09 2023-01 0 0 200 200 400 400 600 600 800 800 1000 1000 1200 1200 daily change JHU Germany new deaths (rolling 7d mean) RKI Germany new deaths (rolling 7d mean) JHU Germany new deaths RKI Germany new deaths

The time delay in the reported deaths is well understood: JHU data use the date at which the death was reported, whereas RKI data uses the best available estimate of when the person was infected (so the day of deaths is not visible in that data). See detailed discussion at https://oscovida.github.io/2020-germany-reporting-delay-meldeverzug.html

Overview plot Germany with RKI data (complete pandemic)

The overview plot for Germany (http://oscovida.github.io/html/Germany.html) is based on JHU data (and for completeness attached below). Here we provide the same observables but based on the accumulated RKI data.

We expect the RKI data to severly underestimate the number of deaths in the most recent week(s) - see discussion above.

In [8]:
ov.overview(country="Germany", data=(cases_rki, deaths_rki));
2023-01-26T16:54:15.788376 image/svg+xml Matplotlib v3.6.3, https://matplotlib.org/ Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 Jan 23 0 0 500 500 1000 1000 1500 1500 2000 2000 7-day incidence rate (per 100K people) 74.4 Germany, last data point from 2023-01-25 Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 Jan 23 0 100 200 300 daily change normalised per 100K Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 Jan 23 0.0 0.5 1.0 1.5 daily change normalised per 100K Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 Jan 23 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on cases)