Quick study to investigate oscillations in reported infections in Germany. Here is the plot of the data in question:
import oscovida
import numpy as np
import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ['svg']
oscovida.display_binder_link("2020-05-10-notebook-weekly-fluctuations-in-data-from-germany.ipynb")
# get data
cases, deaths = oscovida.get_country_data("Germany")
# plot daily changes
fig, ax = plt.subplots(figsize=(8, 4))
oscovida.plot_daily_change(ax, cases, 'C1')
The working assumption is that during the weekend fewer numbers are captured or reported. The analysis below seems to confirm this.
We compute a discrete Fourier transform of the data, and expect a peak at a frequency corresponding to a period of 7 days.
Data selection¶
We start with data from 1st March as numbers before were small. It is convenient to take a number of days that can be divided by seven (for alignment of the freuency axis in Fourier space, so we choose 63 days from 1st of March):
data = cases['2020-03-01':'2020-05-03']
# compute daily change
diff = data.diff().dropna()
# plot data points (corresponding to bars in figure above:)
fig, ax = plt.subplots()
ax.plot(diff.index, diff, '-C1',
label='daily new cases Germany')
fig.autofmt_xdate() # avoid x-labels overlap
# How many data points (=days) have we got?
diff.size
diff2 = diff.resample("24h").asfreq() # ensure we have one data point every day
diff2.size
Compute the frequency spectrum¶
fig, ax = plt.subplots()
# compute power density spectrum
change_F = abs(np.fft.fft(diff2))**2
# determine appropriate frequencies
n = change_F.size
freq = np.fft.fftfreq(n, d=1)
# We skip values at indices 0, 1 and 2: these are large because we have a finite
# sequence and not substracted the mean from the data set
# We also only plot the the first n/2 frequencies as for high n, we get negative
# frequencies with the same data content as the positive ones.
ax.plot(freq[3:n//2], change_F[3:n//2], 'o-C3')
ax.set_xlabel('frequency [cycles per day]');
A signal with oscillations on a weekly basis would correspond to a frequency of 1/7 as frequency is measured in per day
. We thus expect the peak above to be at 1/7 $\approx 0.1428$.
We can show this more easily by changing the frequency scale from cycles per day to cycles per week:
fig, ax = plt.subplots()
ax.plot(freq[3:n//2] * 7, change_F[3:n//2], 'o-C3')
ax.set_xlabel('frequency [cycles per week]');
In other words: there as a strong component of the data with a frequency corresponding to one week.
This is the end of the notebook.
Fourier transform basics¶
A little playground to explore properties of discrete Fourier transforms.
time = np.linspace(0, 4, 1000)
signal_frequency = 3 # choose this freely
signal = np.sin(time * 2 * np.pi * signal_frequency)
fourier = np.abs(np.fft.fft(signal))
# compute frequencies in fourier spectrum
n = signal.size
timestep = time[1] - time[0]
freqs = np.fft.fftfreq(n, d=timestep)
fig, ax = plt.subplots()
ax.plot(time, signal, 'oC9', label=f'signal, frequency={signal_frequency}')
ax.set_xlabel('time')
ax.legend()
fig, ax = plt.subplots()
ax.plot(freqs[0:n//2][:20], fourier[0:n//2][0:20], 'o-C8', label="Fourier transform")
ax.legend()
ax.set_xlabel('frequency');
oscovida.display_binder_link("2020-05-10-notebook-weekly-fluctuations-in-data-from-germany.ipynb")