Shifting and Diffing Columns in R’s dataframe
Goal of this post: showing how to shift and diff columns in R dataframes. Useful when you have absolute values in a data frame and you want to analyze variations.
For this tutorial we will use a data frame with the forecast temperature in Genoa for a week in August:
day <- c("Fri", "Sat", "Sun", "Mon", "Tue", "Wed", "Thu", "Fri") t_max <- c(28, 28, 30, 31, 31, 31, 33, 30) t_min <- c(13, 14, 17, 18, 20, 18, 22, 20) df <- data.frame(day, t_min, t_max) df
Perform operations on rows
Computing data on rows is straightforward; you just need to add a column with the desired operation.
For instance to get the difference between maximum and minimum temperature, we can do as follows:
df$variation <- df$t_max - df$t_min df
Diffing Value on a Column
To compute the variations of a variable, we can use the
The following code, for instance, computes the variations in the
maximum temperature from day to day. Notice that to insert the values
in the dataframe we need to pad the initial value(s) with
t_max_variation <- diff(df$t_max, 1) df$t_max_variation <- c(NA, t_max_variation) df
Other operations might require to shift values of a column. For instance to compute the percent variation in the maximum temperature, we first create a new column which replicates the maximum temperature shifted by one day and then perform an operation on the data frame.
tail) can be used to shift a vector. The
following code, for instance, takes all elements of
t_max but the
t_max_shifted <- head(df$t_max, -1) t_max_shifted
We can now use the same trick we used earlier to add
to the data frame.
df$t_max_shifted <- c(NA, head(df$t_max, -1)) df
The variation in the maximum temperature as a percentage can now be computed as an operation on columns:
df$t_perc_var <- round(df$t_max_variation / df$t_max_shifted, digits=2) df