Date last run: 17Jun2020
Introduction
A question was asked in the RStudio Community:
Finding closest date relative to another date in a column and subtracting.
When I saw that I thought it was a good opportunity to use the rowwise()
function of the dplyr package. This is a stylized form of that problem.
Load packages and create example data
options(knitr.table.format='html')
HOQCutil::silent_library(c('dplyr','tidyr'))
df1 <- tibble::tribble(
~id, ~x, ~a, ~b, ~c ,
1L, 3.25, 2.1, NA , 3.4 ,
2L, 3.50, 3.45, 3.7, 3.4
)
id | x | a | b | c |
---|---|---|---|---|
1 | 3.25 | 2.10 | NA | 3.4 |
2 | 3.50 | 3.45 | 3.7 | 3.4 |
Determine differences with column x
For all columns except the id
and the x
column we want to find the column that has the smallest difference with the x
column. So the first thing we do is determine which are these columns (in dif_columns
) and calculate the differences. The difference columns get a suffix ‘_d’ :
dif_columns = setdiff(names(df1),c('id','x'))
df2 = df1 %>%
mutate(
across(
.cols = any_of(dif_columns) ,
.fns = function(x,y) ifelse(is.na(x),99999,abs(y-x)) ,
.names = "{col}_d" ,
df1$x)
)
id | x | a | b | c | a_d | b_d | c_d |
---|---|---|---|---|---|---|---|
1 | 3.25 | 2.10 | NA | 3.4 | 1.15 | 99999.0 | 0.15 |
2 | 3.50 | 3.45 | 3.7 | 3.4 | 0.05 | 0.2 | 0.10 |
Use rowwise() to find the smallest difference for each row
We use rowwise()
to calculate for each row the column (with ‘_d’ suffix) with the smallest difference. We do that in four steps:
- first determine the index of the smallest difference using the
min_i
function - from this index we find the name of the column that has the smallest difference
- determine the value of the smallest difference (not further used here) using the
min_v
function - determine the value of the column that has the smallest difference using the
min_o
function
By debugging the min_*
functions it became clear that the arguments of these function are the values of the columns that are passed. Because we don’t know the number of arguments beforehand, we have used the a = unlist(list(...))
construct to get these values in a vector.
min_i = function(...) {
# gives index of smallest number
a = unlist(list(...))
order(a)[1]
}
min_v = function(...) {
# gives value of smallest number
a = unlist(list(...))
a[order(a)[1]]
}
min_o = function(...) {
# picks value at index (data is index followed by values)
a = unlist(list(...))
i = a[1]
o = a[-1]
o[i]
}
dif2_columns = paste(dif_columns,"_d",sep="")
df3 = df2 %>%
rowwise() %>%
mutate(
min_ix = min_i (c_across(cols = any_of(dif2_columns))),
min_var = dif_columns[min_ix],
min_dif = min_v (c_across(cols = any_of(dif2_columns))),
min_org = min_o(min_ix,c_across(cols = any_of(dif_columns)))
)
id | x | a | b | c | a_d | b_d | c_d | min_ix | min_var | min_dif | min_org |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.25 | 2.10 | NA | 3.4 | 1.15 | 99999.0 | 0.15 | 3 | c | 0.15 | 3.40 |
2 | 3.50 | 3.45 | 3.7 | 3.4 | 0.05 | 0.2 | 0.10 | 1 | a | 0.05 | 3.45 |
Remove the intermediary columns
df4 = df3 %>%
select(any_of(c('id','x',dif_columns,'min_var','min_org')))
id | x | a | b | c | min_var | min_org |
---|---|---|---|---|---|---|
1 | 3.25 | 2.10 | NA | 3.4 | c | 3.40 |
2 | 3.50 | 3.45 | 3.7 | 3.4 | a | 3.45 |
Session Info
This document was produced on 17Jun2020 with the following R environment:
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] tidyr_1.1.0 dplyr_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.4.6 knitr_1.28 magrittr_1.5 tidyselect_1.1.0
#> [5] HOQCutil_0.1.22 R6_2.4.1 rlang_0.4.6 stringr_1.4.0
#> [9] highr_0.8 tools_4.0.0 xfun_0.13 htmltools_0.4.0
#> [13] ellipsis_0.3.0 digest_0.6.25 tibble_3.0.1 lifecycle_0.2.0
#> [17] crayon_1.3.4 purrr_0.3.4 vctrs_0.3.0 glue_1.4.1
#> [21] evaluate_0.14 rmarkdown_2.1 stringi_1.4.6 compiler_4.0.0
#> [25] pillar_1.4.3 generics_0.0.2 pkgconfig_2.0.3