Given the ongoing Corona pandemic, I found myself staring at the worldometers website ( https://www.worldometers.info/coronavirus/ ) for the past few weeks to get a better understanding of the magnitude of the havoc that the COVID-19 has wrecked on the world. Initially, I noticed the COVID-19 cases and deaths creeping up gradually, but then the creeping soon evolved into accelerating and, of late, they have started galloping.

Around the same time, I came across a tweet that my dad had posted on the 23^{rd} of March:

Picture 1: A snapshot of my father’s tweet

Thereafter, I kept an eye out on the worldometer curves (a snapshot of a curve can be seen in the tweet itself) and was stunned when the deaths reached approximately 27,000 four days after his tweet!

This, along with the countless discussions I have with my friends after our zoom classes, finally motivated me to diligently analyze the data of the COVID-19, not for any particular purpose but just to satisfy my intellectual curiosity. My first step was identifying the right indicator to choose. At first, I thought of choosing the number of Cases, but the testing process is different in different countries. Additionally, many cases go unreported in countries that do not have adequate testing facilities. Thus, the data on the same may not reflect the actual number of cases. For this reason, I decided to track the number of Deaths per day and Cumulative Deaths.

**Getting the data in place : **

I made a simple table on MS-Excel to track the same.

Date |
Cumulative deaths |
Deaths for the day |

Feb 16 | 1,775 | |

Feb 17 | 1,873 | 98 |

Feb 18 | 2,009 | 136 |

Feb 19 | 2,126 | 117 |

Feb 20 | 2,247 | 121 |

Feb 21 | 2,360 | 113 |

Feb 22 | 2,460 | 100 |

Feb 23 | 2,618 | 158 |

Feb 24 | 2,699 | 81 |

Feb 25 | 2,763 | 64 |

Feb 26 | 2,800 | 37 |

Feb 27 | 2,858 | 58 |

Feb 28 | 2,923 | 65 |

Feb 29 | 2,977 | 54 |

Mar 1 | 3,050 | 73 |

Mar 2 | 3,117 | 67 |

Mar 3 | 3,202 | 85 |

Mar 4 | 3,285 | 83 |

Mar 5 | 3,387 | 102 |

Mar 6 | 3,494 | 107 |

Mar 7 | 3,599 | 105 |

Mar 8 | 3,827 | 228 |

Mar 9 | 4,025 | 198 |

Mar 10 | 4,296 | 271 |

Mar 11 | 4,628 | 332 |

Mar 12 | 4,981 | 353 |

Mar 13 | 5,428 | 447 |

Mar 14 | 5,833 | 405 |

Mar 15 | 6,520 | 687 |

Mar 16 | 7,162 | 642 |

Mar 17 | 7,979 | 817 |

Mar 18 | 8,951 | 972 |

Mar 19 | 10,030 | 1,079 |

Mar 20 | 11,386 | 1,356 |

Mar 21 | 13,011 | 1,625 |

Mar 22 | 14,640 | 1,629 |

Mar 23 | 16,513 | 1,873 |

Mar 24 | 18,894 | 2,381 |

Mar 25 | 21,282 | 2,388 |

Mar 26 | 24,073 | 2,791 |

Mar 27 | 27,343 | 3,270 |

Mar 28 | 30,861 | 3,518 |

Mar 29 | 34,065 | 3,204 |

Mar 30 | 37,774 | 3,709 |

Mar 31 | 42,309 | 4,535 |

Apr 1 | 47,192 | 4,883 |

Apr 2 | 53,166 | 5,974 |

Apr 3 | 59,145 | 5,979 |

Apr 4 | ||

Apr 5 | ||

Apr 6 | ||

Apr 7 | ||

Apr 8 |

Table 1: Data on the cumulative deaths and daily deaths

Then, I plotted the charts for “cumulative deaths” and “deaths per day”:

Graph 1: Cumulative deaths from 16^{th} February till 3^{rd} April

Graph 2: Number of daily deaths from 16^{th} February till 3^{rd} April

**Modeling the equation:**

I tried to model an equation for the data using the in-built equation of best fit function on MS-Excel itself. There are five types of equations (standard curves) that Excel offers for modeling a best-fit equation. These are as follows:

- Exponential
- Logarithmic
- Linear
- Power
- Polynomial

Intuitively, from a quick eyeballing, I rejected linear, logarithmic, and power curves. However, just to be sure, I tried each curve out. The gap between the line of best fit and the data was too much for the above three curves, and this proved my intuition to be correct.

Ultimately, a polynomial graph suited the data. At first, I used a 2-degree polynomial (parabolic in shape) which looked like this:

Graph 3: A degree two-equation of best fit

However, on increasing the degree of the polynomial, my coefficient of determination value became closer to 1.

Here’s a 4-degree polynomial which models the data with great precision:

Graph 4: a degree 4 polynomial of best fit (beginning Feb 16, 2020)

The equation of the 4-degree polynomial is

y = 0.0315x^{4} – 1.4216x^{3} + 20.471x^{2} + 10.49x + 1881.3

where y is the number of deaths to date and x is the number of days since February 16^{th}.

To get a more tangible feel of the precision of my model, I substituted a few values of days into the equation and found an error % between the predicted value and the actual value.

Start date–> | 16 February 2020 | |||

Date |
Days elapsed (x) |
Predicted |
Actual |
Error |

25 March 2020 | 38 | 18719 | 21282 | -13.7% |

01 April 2020 | 45 | 42489 | 47192 | -11.1% |

05 April 2020 | 49 | 64860 | ||

10 April 2020 | 54 | 105004 | ||

15 April 2020 | 59 | 162252 | ||

30 April 2020 | 74 | 481715 |

I wondered if the % error above was significant to take the equation to forecast the deaths in the future.

I then realized that the data being analyzed comprised two main sections: the slow ‘creeping’ rise and the ‘galloping’ rise. This could possibly be the reason for the high level of inaccuracies in the predictions. To find an equation that could be used to predict the current scenario, I then chose recent data points of the cases that have occurred since 5^{th} March.

The chart and the equation are shown below :

Graph 5: a degree 4 polynomial of best fit (beginning Mar 5, 2020)

The equation this time around is

**y = 0.0337x ^{4} + 0.7715x^{3} + 5.0925x^{2} + 94.975x + 3296.3**

where y is the number of deaths to date and x is the number of days since March 5^{th}.

I repeated the task of finding some values of ‘y” for some selected values of “x” :

Start date–> | 05 March 2020 | |||

Date |
Days elapsed (x) |
Predicted |
Actual |
Error |

25 March 2020 | 20 | 18797 | 21282 | -13.2% |

01 April 2020 | 27 | 42668 | 47192 | -10.6% |

05 April 2020 | 31 | 65241 | ||

10 April 2020 | 36 | 105913 | ||

15 April 2020 | 41 | 164151 | ||

30 April 2020 | 56 | 491495 |

The error % has only reduced marginally than in the equation staring Feb 16^{th}. Thus, the MS-excel curve-equation program works well, and automatically takes more recent data if the curve has 2-3 distinct parts. However, I am happy to have reduced by an error of 0.5% because in a situation like this, every digit counts.

I think the above equation (since March 5^{th} ) will be the best-fit equation for COVID-19 deaths for some time at least. I have modeled this equation using the data available. This model does not take into consideration any unforeseen acts like discovering a vaccine, or some community flare-up in Africa or India.

I am sure that a more accurate model using more rigorous mathematics can be developed; however, my model, based on data available publicly, only serves to make an individual realize quantitatively the havoc that the COVID-19 has wrecked across the world.

The ultimate driving force behind me writing this is to make you realize that it’s only an upward curve from here unless we ALL take action. Let’s be a responsible citizen not only of your country but also of the world. If the corona hits the slums in the Indian subcontinent or Africa, the situation will worsen beyond control. Most of the poor people are following the lockdown due to which they have lost their daily source of income and are struggling to acquire the basic necessities. On the other hand, there are relatively privileged people sitting at home who have the means to help those out there indirectly. Whenever you see a “please donate” link on Instagram or any other social media platform, please do not scroll past it and dismiss it as “Oh someone else will do it” or “Oh I do not have the time”. Make an impact. Something that may seem small and inconsequential to you will make a big difference to a family in reality.

Through our smart actions, we alone can stop this Equation of mine from becoming the “Equation of Death”!

Great analysis! This made me so and think. Maybe you should also share your Excel sheet so that others can also model and benefit from your work.