## Analyzing Rain Data with Time Series

Chennai is a city located on the coast of Indian. Since June 2019, Chennai has been facing a water shortage. There are four major water reservoirs for Chennai:

- Poondi
- Cholavara
- Redhills
- Chembarambakkam

Someone posted data on these four provinces in the hopes that someone can provide useful inside. Here is my attempt.

*Note: The following data analysis is in reference to a data competition posted on Kraggle in September, 2019. **Click here to visit the data competition on Kraggle*

### Describing the Data

Before we can begin our analysis it is customary that we get a top-level view of the data at hand.

File Name | X | Y |

chennai_reservior_levels.csv | Date (days) | Dam reservoir levels measured in mcft. (millions of cubic feet). |

chennai_reservior_rainfall.csv | Date (days) | Rain levels are measured in mm. using a rain gauge. |

#### Reservoir Level vs Rainfall Time Series

The following data that we have is a time series, meaning that our data has a connection with the season, and is, therefore, susceptible to a trend, seasonal, and cyclical patterns.

Below is a graph that plots the measures of rainfall and the measures of reservoir level. The reservoir level is a line plot. The rainfall data is plotted with red bars. The rainfall data’s intensity is increased 10 fold in order to show relation to the graph.

I think it’s clear to say that an increase in rainfall is followed by an increase in reservoir level… duh.

#### How much water accumulates in the dam after a Monson?

Suppose we can measure the amount of in which the reservoir level changes as during every rainfall season. Rainfall appears to come in spurts, where there is a gradual increase in the rain followed by a gradual decrease.

If we can compute a single day in which the rainfall peaks, this seasonal point can be a good starting point in which we measure when the dam levels increase.

Later I will compute this measurement, but for now assume that dam level increase on average by value/factor noted: [**D_increase**].

#### When in the year can we expect there to be a Monson?

This is seasonal data. We can see that each year there is an increase in rainfall around **December** – **January**. We can get a better gauge by looking at a seasonal plot.

Our data follow a seasonal trend of peaks and valleys. If we select the top 100 value from our dataset then take an average, we can get a better approximation of what day of the year rainfall will occur.

The red circle marks the local maximum for each season. The green square marks the local minimum.

**Eureka! ** Previously we saw inferred that the reservoir level increases by [D_increase].

We can now assert with statistical confidence that it is likely for reservoir levels for all four cities to increase by a factor or [D_increase] during **December and January**.