2020-07-12 / Blog 1

Calculating monthly, daily, and hourly means of time series data

When I started using large streams of time series data, one of the first challenges I had was calculating temporal average values of data in an efficient way. If you are measuring any of the phenomena in the soil-plant-atmosphere interaction, e.g. air temperature, soil temperature, net radiation, you will need to summarise your data as hourly, daily, or monthly averages.

We will use air temperature data from one of the lysimeter setups at the National Green Infrastructure Facility that I instrumented within the research project Priming Laboratory EXperiments on infrastructure and Urban Systems (PLEXUS) to calculate hourly and daily means. You can watch the commissioning of the setup in the video on the right. Data used in this script can be accessed via NGIF API.

Firstly, we define the start and end of the time frame, then generate a time step vector using seq() function with time data formatted as POSIXct:

start <- as.POSIXct("2020-07-10 00:00:00","UTC")

end <- as.POSIXct("2020-07-12 00:00:00","UTC")

time_steps_1h <- seq(start,end,60*60)

Then, we use findInterval() function to obtain which time step each measurement belongs to:

which_step_1h <- findInterval(as.POSIXct(airtemp_raw[,1],tz="UTC"),time_steps)

Last step is to use the aggregate() function to calculate the mean values:

airtemp_1h <- aggregate(airtemp_raw[,2],list(time_steps_1h[which_step_1h]),mean,na.rm=T)

This piece of code gives us the 1-hourly means of data frame that we imported. Same method can be applied to other n-hourly means by varying the time steps using the (60*60*n) argument. Fig. 1 shows the raw data, as well as the 1-,3- and 6-hourly mean values of the air temperature.

We will use volumetric water content data from the same lysimeter setup at the National Green Infrastructure Facility to calculate daily and monthly means. Data used in this script can be accessed from the data repository of Newcastle University:

We define the daily time steps in the same way as hourly time steps

daily_steps <- seq(start,end,60*60*24)

Similarly to the hourly data, we use findInterval() function to obtain which daily time steps each measurement belongs to

The difference between the daily and monthly calculations will be the way we define the time steps. We will use seq.Date() function for this purpose:

monthly_steps <- seq.Date(as.Date(start),as.Date(end),"month")

The scripts used in this post can be found here: Blog Scripts @ Github