Analyzing Rain Data with Time Series

Chennai is a city located on the coast of Indian. Since June 2019, Chennai has been facing a water shortage. There are four major water reservoirs for Chennai:
- Poondi
- Cholavara
- Redhills
- Chembarambakkam
Someone posted data on these four provinces in the hopes that someone can provide useful inside. Here is my attempt.
Note: The following data analysis is in reference to a data competition posted on Kraggle in September, 2019. Click here to visit the data competition on Kraggle
Describing the Data
Before we can begin our analysis it is customary that we get a top-level view of the data at hand.
File Name | X | Y |
chennai_reservior_levels.csv | Date (days) | Dam reservoir levels measured in mcft. (millions of cubic feet). |
chennai_reservior_rainfall.csv | Date (days) | Rain levels are measured in mm. using a rain gauge. |
Reservoir Level vs Rainfall Time Series
The following data that we have is a time series, meaning that our data has a connection with the season, and is, therefore, susceptible to a trend, seasonal, and cyclical patterns.
Below is a graph that plots the measures of rainfall and the measures of reservoir level. The reservoir level is a line plot. The rainfall data is plotted with red bars. The rainfall data’s intensity is increased 10 fold in order to show relation to the graph.
I think it’s clear to say that an increase in rainfall is followed by an increase in reservoir level… duh.
How much water accumulates in the dam after a Monson?
Suppose we can measure the amount of in which the reservoir level changes as during every rainfall season. Rainfall appears to come in spurts, where there is a gradual increase in the rain followed by a gradual decrease.
If we can compute a single day in which the rainfall peaks, this seasonal point can be a good starting point in which we measure when the dam levels increase.
Later I will compute this measurement, but for now assume that dam level increase on average by value/factor noted: [D_increase].
When in the year can we expect there to be a Monson?
This is seasonal data. We can see that each year there is an increase in rainfall around December – January. We can get a better gauge by looking at a seasonal plot.
Our data follow a seasonal trend of peaks and valleys. If we select the top 100 value from our dataset then take an average, we can get a better approximation of what day of the year rainfall will occur.

The red circle marks the local maximum for each season. The green square marks the local minimum.
Eureka! Previously we saw inferred that the reservoir level increases by [D_increase].
We can now assert with statistical confidence that it is likely for reservoir levels for all four cities to increase by a factor or [D_increase] during December and January.