Germany: Comparing data from Johns Hopkins University and Robert Koch Institute

In [1]:
%config InlineBackend.figure_formats = ['svg']
import datetime
import numpy as np
import pandas as pd
import oscovida as ov
import matplotlib.pyplot as plt
# clear the local cache, i.e. force re-download of data sets
# ov.clear_cache()
ov.display_binder_link("2022-germany-rki-overview.ipynb")

print(f"Last executed: {datetime.datetime.today()}")
Last executed: 2023-03-07 16:49:27.699121

Get data from Johns Hopkins University (JHU)

In [2]:
cases_jhu, deaths_jhu = ov.get_country_data("Germany")

Get data from Robert-Koch Institute (RKI)

In [3]:
germany = ov.fetch_data_germany()

# As we want the total numbers for Germany, wwe need to accumulate over all # districts (Landkreise) and various rows for each date:
# We use 'Meldedatum' as this is expected to be closest to the JHU data
# See https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/e408ccf8878541a7ab6f6077a42fd811_0/about
g2 = germany.set_index(pd.to_datetime(germany['Meldedatum']))
g2.index.name = 'date'
g3 = g2.groupby('date').agg('sum')
cases_rki = g3["AnzahlFall"].groupby('date').agg('sum').cumsum()
deaths_rki = g3["AnzahlTodesfall"].groupby('date').agg('sum').cumsum()
/tmp/ipykernel_991669/3860661022.py:8: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  g3 = g2.groupby('date').agg('sum')

Overview plot Germany with RKI data

The overview plot for Germany (http://oscovida.github.io/html/Germany.html) is based on JHU data (and for completeness attached below). Here we provide the same observables but based on the accumulated RKI data.

We expect the RKI data to severly underestimate the number of deaths in the most recent week(s) - see discussion here.

In [4]:
ov.overview(country="Germany", data=(cases_rki, deaths_rki), weeks=5);
2023-03-07T16:49:36.001677 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/ 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 80 80 100 100 120 120 140 140 7-day incidence rate (per 100K people) 78.5 Germany, last 5 weeks, last data point from 2023-03-06 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0 10 20 30 daily change normalised per 100K 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0.000 0.025 0.050 0.075 0.100 daily change normalised per 100K 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on cases) Germany cases daily growth factor Germany cases daily growth factor (rolling mean) Germany estimated R (using cases) 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0.6 0.6 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on deaths) Germany deaths daily growth factor Germany deaths daily growth factor (rolling mean) Germany estimated R (using deaths) 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0 1000 2000 3000 4000 cases doubling time [days] Germany doubling time cases (rolling mean) Germany doubling time deaths (rolling mean) 0 8316 16631 24947 daily change Germany new cases (rolling 7d mean) Germany new cases 0.0 20.8 41.6 62.4 83.2 daily change Germany new deaths (rolling 7d mean) Germany new deaths 0 3440 6881 10321 13762 deaths doubling time [days]

Overview plot Germany with JHU data (last 5 weeks)

This is the 'normal' plot that is shown on the OSCOVIDA pages, i.e. at http://oscovida.github.io/html/Germany.html :

In [5]:
ov.overview(country="Germany", weeks=5);
2023-03-07T16:49:38.490574 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/ 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 80 80 100 100 120 120 140 140 7-day incidence rate (per 100K people) 80.4 Germany, last 5 weeks, last data point from 2023-03-06 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0 10 20 30 daily change normalised per 100K 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0.00 0.05 0.10 0.15 daily change normalised per 100K 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on cases) Germany cases daily growth factor Germany cases daily growth factor (rolling mean) Germany estimated R (using cases) 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0.8 0.8 0.9 0.9 1.0 1.0 1.1 1.1 1.2 1.2 R & growth factor (based on deaths) Germany deaths daily growth factor Germany deaths daily growth factor (rolling mean) Germany estimated R (using deaths) 30 Jan 06 Feb 13 Feb 20 Feb 27 Feb 06 Mar 0 1000 2000 3000 4000 cases doubling time [days] Germany doubling time cases (rolling mean) Germany doubling time deaths (rolling mean) 0 8316 16631 24947 daily change Germany new cases (rolling 7d mean) Germany new cases 0.0 41.6 83.2 124.7 daily change Germany new deaths (rolling 7d mean) Germany new deaths 0 428 857 1285 1713 deaths doubling time [days]

Comparison of data from from JHU and RKI: cases (last 5 weeks)

In [6]:
fig, ax = plt.subplots(figsize=(10, 4))
ov.plot_daily_change(ax, cases_jhu[-7*5:], color="C1", labels=["JHU Germany", "cases"])
ov.plot_daily_change(ax, cases_rki[-7*5:], color="C3", labels=["RKI Germany", "cases"])
fig.autofmt_xdate()
2023-03-07T16:49:39.494558 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/ 2023-02-01 2023-02-05 2023-02-09 2023-02-13 2023-02-17 2023-02-21 2023-02-25 2023-03-01 2023-03-05 0 0 5000 5000 10000 10000 15000 15000 20000 20000 25000 25000 30000 30000 daily change JHU Germany new cases (rolling 7d mean) RKI Germany new cases (rolling 7d mean) JHU Germany new cases RKI Germany new cases

This deviation is unusual (March 2022): in the past, the RKI showed greater lag in reporting than the JHU data.

Comparison of data from from JHU and RKI: deaths (complete pandemic)

In [7]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 4))
ov.plot_daily_change(ax, deaths_jhu, color="C0", labels=["JHU Germany", "deaths"])
ov.plot_daily_change(ax, deaths_rki, color="C4", labels=["RKI Germany", "deaths"])
fig.autofmt_xdate()
2023-03-07T16:49:45.536999 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/ 2020-01 2020-05 2020-09 2021-01 2021-05 2021-09 2022-01 2022-05 2022-09 2023-01 2023-05 0 0 200 200 400 400 600 600 800 800 1000 1000 1200 1200 daily change JHU Germany new deaths (rolling 7d mean) RKI Germany new deaths (rolling 7d mean) JHU Germany new deaths RKI Germany new deaths

The time delay in the reported deaths is well understood: JHU data use the date at which the death was reported, whereas RKI data uses the best available estimate of when the person was infected (so the day of deaths is not visible in that data). See detailed discussion at https://oscovida.github.io/2020-germany-reporting-delay-meldeverzug.html

Overview plot Germany with RKI data (complete pandemic)

The overview plot for Germany (http://oscovida.github.io/html/Germany.html) is based on JHU data (and for completeness attached below). Here we provide the same observables but based on the accumulated RKI data.

We expect the RKI data to severly underestimate the number of deaths in the most recent week(s) - see discussion above.

In [8]:
ov.overview(country="Germany", data=(cases_rki, deaths_rki));
2023-03-07T16:49:54.574371 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/ Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 Jan 23 May 23 0 0 500 500 1000 1000 1500 1500 2000 2000 7-day incidence rate (per 100K people) 78.5 Germany, last data point from 2023-03-06 Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 Jan 23 May 23 0 100 200 300 daily change normalised per 100K Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 Jan 23 May 23 0.0 0.5 1.0 1.5 daily change normalised per 100K Jan 20 May 20 Sep 20 Jan 21 May 21 Sep 21 Jan 22 May 22 Sep 22 Jan 23 May 23 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on cases)