Skip to content

These are methods for the dplyr generics left_join(), right_join(), inner_join(), full_join(), anti_join(), and semi_join(). Left, right, inner, and anti join are translated to the [.data.table equivalent, full joins to data.table::merge.data.table(). Left, right, and full joins are in some cases followed by calls to data.table::setcolorder() and data.table::setnames() to ensure that column order and names match dplyr conventions. Semi-joins don't have a direct data.table equivalent.

Usage

# S3 method for dtplyr_step
left_join(x, y, ..., by = NULL, copy = FALSE, suffix = c(".x", ".y"))

Arguments

x, y

A pair of lazy_dt()s.

...

Other parameters passed onto methods.

by

A character vector of variables to join by.

If NULL, the default, *_join() will perform a natural join, using all variables in common across x and y. A message lists the variables so that you can check they're correct; suppress the message by supplying by explicitly.

To join by different variables on x and y, use a named vector. For example, by = c("a" = "b") will match x$a to y$b.

To join by multiple variables, use a vector with length > 1. For example, by = c("a", "b") will match x$a to y$a and x$b to y$b. Use a named vector to match different variables in x and y. For example, by = c("a" = "b", "c" = "d") will match x$a to y$b and x$c to y$d.

To perform a cross-join, generating all combinations of x and y, use by = character().

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

Examples

library(dplyr, warn.conflicts = FALSE)

band_dt <- lazy_dt(dplyr::band_members)
instrument_dt <- lazy_dt(dplyr::band_instruments)

band_dt %>% left_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [3 x 3]
#> Call:   setcolorder(`_DT22`[`_DT21`, on = .(name), allow.cartesian = TRUE], 
#>     c(1L, 3L, 2L))
#> 
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  NA    
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass  
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% right_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [3 x 3]
#> Call:   `_DT21`[`_DT22`, on = .(name), allow.cartesian = TRUE]
#> 
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass  
#> 3 Keith NA      guitar
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% inner_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [2 x 3]
#> Call:   `_DT21`[`_DT22`, on = .(name), nomatch = NULL, allow.cartesian = TRUE]
#> 
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass  
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% full_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [4 x 3]
#> Call:   merge(`_DT21`, `_DT22`, all = TRUE, by.x = "name", by.y = "name", 
#>     allow.cartesian = TRUE)
#> 
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Keith NA      guitar
#> 3 Mick  Stones  NA    
#> 4 Paul  Beatles bass  
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

band_dt %>% semi_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [2 x 2]
#> Call:   `_DT21`[unique(`_DT21`[`_DT22`, which = TRUE, nomatch = NULL, 
#>     on = .(name)])]
#> 
#>   name  band   
#>   <chr> <chr>  
#> 1 John  Beatles
#> 2 Paul  Beatles
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% anti_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [1 x 2]
#> Call:   `_DT21`[!`_DT22`, on = .(name)]
#> 
#>   name  band  
#>   <chr> <chr> 
#> 1 Mick  Stones
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results