# Shifting and Diffing Columns in R’s dataframe

## Goal

Goal of this post: showing how to shift and diff columns in R dataframes. Useful when you have absolute values in a data frame and you want to analyze variations.

## Setup

For this tutorial we will use a data frame with the forecast temperature in Genoa for a week in August:

day <- c("Fri", "Sat", "Sun", "Mon", "Tue", "Wed", "Thu", "Fri") t_max <- c(28, 28, 30, 31, 31, 31, 33, 30) t_min <- c(13, 14, 17, 18, 20, 18, 22, 20) df <- data.frame(day, t_min, t_max) df

## Perform operations on rows

Computing data on rows is straightforward; you just need to add a column with the desired operation.

For instance to get the difference between maximum and minimum temperature, we can do as follows:

```
df$variation <- df$t_max - df$t_min
df
```

## Diffing Value on a Column

To compute the ** variations of a variable**, we can use the

`diff`

function.
The following code, for instance, computes the variations in the
maximum temperature from day to day. Notice that to insert the values
in the dataframe we need to pad the initial value(s) with `NA`

.

t_max_variation <- diff(df$t_max, 1) df$t_max_variation <- c(NA, t_max_variation) df

## Shifting Values

Other operations might require to ** shift** values of a column. For
instance to compute the percent variation in the maximum temperature,
we first create a new column which replicates the maximum temperature
shifted by one day and then perform an operation on the data frame.

The function `head`

(and `tail`

) can be used to shift a vector. The
following code, for instance, takes all elements of `t_max`

but the
last.

```
t_max_shifted <- head(df$t_max, -1)
t_max_shifted
```

We can now use the same trick we used earlier to add `t_max_shifted`

to the data frame.

df$t_max_shifted <- c(NA, head(df$t_max, -1)) df

The variation in the maximum temperature as a percentage can now be computed as an operation on columns:

```
df$t_perc_var <- round(df$t_max_variation / df$t_max_shifted, digits=2)
df
```