In [24]:
%config InlineBackend.figure_formats = ['svg']
import oscovida as ov

ov.display_binder_link("tutorial-overview-graphs.ipynb")

Overview function explained

The most important function of oscovida package is overview. It takes the following parameters:

  1. country — a country to analyse (mandatory, str);
  2. region — a region of the country (optional, str);
  3. subregion — a subregion of the country (optional, str);
  4. savefig — whether to save a sigure (optional, bool, default is False);
  5. dates — a range of dates in a format "2020-05-15:2020-10-20"
  6. weeks — how many last weeks to show (optional, int, default is zero, which means "all"),
  7. data — the external data source, a pair of pd.Series, see below.

The function returns a triple: (pyplot graph, a pandas series for cases, a pandas series for deaths).

This function provides six graphs:

In [25]:
ov.overview('Russia');
2021-03-24T18:40:11.146027 image/svg+xml Matplotlib v3.3.2, https://matplotlib.org/ Mar 20 May 20 Jul 20 Sep 20 Nov 20 Jan 21 Mar 21 0 0 50 50 100 100 7-day incidence rate (per 100K people) 44.2 Russia, last data point from 2021-03-23 Mar 20 May 20 Jul 20 Sep 20 Nov 20 Jan 21 Mar 21 0 5 10 15 20 daily change normalised per 100K Mar 20 May 20 Jul 20 Sep 20 Nov 20 Jan 21 Mar 21 0.0 0.1 0.2 0.3 0.4 daily change normalised per 100K Mar 20 May 20 Jul 20 Sep 20 Nov 20 Jan 21 Mar 21 0.8 0.8 0.9 0.9 1.0 1.0 1.1 1.1 1.2 1.2 R & growth factor (based on cases) Russia cases daily growth factor Russia cases daily growth factor (rolling mean) Russia estimated R (using cases) Mar 20 May 20 Jul 20 Sep 20 Nov 20 Jan 21 Mar 21 0.8 0.8 0.9 0.9 1.0 1.0 1.1 1.1 1.2 1.2 R & growth factor (based on deaths) Russia deaths daily growth factor Russia deaths daily growth factor (rolling mean) Russia estimated R (using deaths) Mar 20 May 20 Jul 20 Sep 20 Nov 20 Jan 21 Mar 21 0 100 200 300 cases doubling time [days] Russia doubling time cases (rolling mean) Russia doubling time deaths (rolling mean) 0 7297 14593 21890 29187 daily change Russia new cases (rolling 7d mean) Russia new cases 0.0 145.9 291.9 437.8 583.7 daily change Russia new deaths (rolling 7d mean) Russia new deaths 0.0 67.4 134.7 202.1 deaths doubling time [days]

Let's see how exactly we obtain all these graphs.

Under the hood we

  • retrieve the data with get_country_data() function (see the tutorial)
  • then we optionally narrow the time range using either weeks=N for the last N weeks or dates="2020-05-01:2020-10-01" for the specific range of dates. Note that one cannot use both dates and weeks together.
  • finally, we feed this data for cases and deaths to a set of plotting functions:
    • plot_incidence_rate for 7-day incidence rate per 100 thousand inhabitants (plot 1)
    • plot_daily_change for daily changes (plots 2 and 3, see the tutorial)
    • plot_reproduction_number for R-value and the growth factor (plots 4 and 5, see the tutorial)
    • plot_doubling_time for the doubling times (plot 6, see the tutorial).

That's exactly how we fetch the data using weeks inside the overview function:

In [26]:
country = "Iran"
weeks = 30
cases, deaths = ov.get_country_data(country)
cases = cases[- weeks * 7:]   # cut off unwanted data
deaths = deaths[- weeks * 7:] # cut off unwanted data

What we have in cases and deaths are Pandas time series: it is a sort of a two-row array with dates in one row and COVID cases / deaths in the other:

In [27]:
cases
Out[27]:
2020-08-26     365606
2020-08-27     367796
2020-08-28     369911
2020-08-29     371816
2020-08-30     373570
               ...   
2021-03-19    1786265
2021-03-20    1793805
2021-03-21    1801065
2021-03-22    1808422
2021-03-23    1815712
Freq: D, Name: Iran cases, Length: 210, dtype: object

And here is the example with dates:

In [28]:
country = "Germany"
region="Hamburg"
dates = "2020-09-15:2020-10-25"
cases, deaths = ov.get_country_data(country, region)

date_start, date_end = dates.split(':')
cases = cases[date_start:date_end]
deaths = deaths[date_start:date_end]

ov.overview(country=country, region=region, dates=dates);
2021-03-24T18:40:14.472169 image/svg+xml Matplotlib v3.3.2, https://matplotlib.org/ Sep 20 Sep 20 Sep 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 20 20 40 40 60 60 80 80 100 100 7-day incidence rate (per 100K people) Hamburg, Germany, last data point from 2020-10-25 Sep 20 Sep 20 Sep 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 0 5 10 15 20 daily change normalised per 100K Sep 20 Sep 20 Sep 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 0.0 0.1 0.2 0.3 0.4 daily change normalised per 100K Sep 20 Sep 20 Sep 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 0.8 0.8 1.0 1.0 1.2 1.2 1.4 1.4 1.6 1.6 R & growth factor (based on cases) Germany-Hamburg cases daily growth factor Germany-Hamburg cases daily growth factor (rolling mean) Germany-Hamburg estimated R (using cases) Sep 20 Sep 20 Sep 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 1 1 2 2 3 3 R & growth factor (based on deaths) Germany-Hamburg deaths daily growth factor Germany-Hamburg deaths daily growth factor (rolling mean) Germany-Hamburg estimated R (using deaths) Sep 20 Sep 20 Sep 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 Oct 20 0 25 50 75 100 cases doubling time [days] Germany-Hamburg doubling time cases (rolling mean) Germany-Hamburg doubling time deaths (rolling mean) 0.0 92.4 184.7 277.1 369.5 daily change Germany-Hamburg new cases (rolling 7d mean) Germany-Hamburg new cases 0.000 1.847 3.695 5.542 7.389 daily change Germany-Hamburg new deaths (rolling 7d mean) Germany-Hamburg new deaths 0.0 64.6 129.3 193.9 258.5 deaths doubling time [days]

External data source

One may pass an external data object to the overview for visualisation. The data object should contain a pair of Pandas series (one for cases, and one for deaths). Each series must have an index of type pd.Timestamp. See the example with an artificial data:

In [29]:
import numpy as np
import pandas as pd

days = 100
country = "Narnia"
dates = pd.date_range("2020-03-01", periods=days, freq='D')
data1 = np.exp(np.linspace(1,15,days)).astype(int)  # don't have to be integers
data2 = np.exp(np.linspace(1,5,days)).astype(int)   # don't have to be integers

c = pd.Series(data1, index=pd.DatetimeIndex(dates))
d = pd.Series(data2, index=pd.DatetimeIndex(dates))

c.name = f"{country} cases"
d.name = f"{country} deaths"

ov.overview(country=country, data=(c,d));
2021-03-24T18:40:15.909317 image/svg+xml Matplotlib v3.3.2, https://matplotlib.org/ Mar 20 Apr 20 Apr 20 May 20 May 20 Jun 20 0.0 0.0 0.5 0.5 1.0 1.0 1.5 1.5 2.0 2.0 7-day incidence rate 1e6 Narnia, last data point from 2020-06-08 Mar 20 Apr 20 Apr 20 May 20 May 20 Jun 20 0 0 100000 100000 200000 200000 300000 300000 400000 400000 daily change Narnia new cases (rolling 7d mean) Narnia new cases Mar 20 Apr 20 Apr 20 May 20 May 20 Jun 20 0 0 2 2 4 4 6 6 daily change Narnia new deaths (rolling 7d mean) Narnia new deaths Mar 20 Apr 20 Apr 20 May 20 May 20 Jun 20 0.75 0.75 1.00 1.00 1.25 1.25 1.50 1.50 1.75 1.75 R & growth factor (based on cases) Narnia cases daily growth factor Narnia cases daily growth factor (rolling mean) Narnia estimated R (using cases) Mar 20 Apr 20 Apr 20 May 20 May 20 Jun 20 0.8 0.8 1.0 1.0 1.2 1.2 R & growth factor (based on deaths) Narnia deaths daily growth factor Narnia deaths daily growth factor (rolling mean) Narnia estimated R (using deaths) Mar 20 Apr 20 Apr 20 May 20 May 20 Jun 20 0 2 4 6 cases doubling time [days] Narnia doubling time cases (rolling mean) Narnia doubling time deaths (rolling mean) 0.00 7.57 15.14 22.71 deaths doubling time [days]

Other tutorials

You can find more tutorials here.