These are methods for the dplyr generics left_join()
, right_join()
,
inner_join()
, full_join()
, anti_join()
, and semi_join()
. Left, right,
inner, and anti join are translated to the [.data.table
equivalent,
full joins to data.table::merge.data.table()
.
Left, right, and full joins are in some cases followed by calls to
data.table::setcolorder()
and data.table::setnames()
to ensure that column
order and names match dplyr conventions.
Semi-joins don't have a direct data.table equivalent.
Arguments
- x, y
A pair of
lazy_dt()
s.- ...
Other parameters passed onto methods.
- by
A join specification created with
join_by()
, or a character vector of variables to join by.If
NULL
, the default,*_join()
will perform a natural join, using all variables in common acrossx
andy
. A message lists the variables so that you can check they're correct; suppress the message by supplyingby
explicitly.To join on different variables between
x
andy
, use ajoin_by()
specification. For example,join_by(a == b)
will matchx$a
toy$b
.To join by multiple variables, use a
join_by()
specification with multiple expressions. For example,join_by(a == b, c == d)
will matchx$a
toy$b
andx$c
toy$d
. If the column names are the same betweenx
andy
, you can shorten this by listing only the variable names, likejoin_by(a, c)
.join_by()
can also be used to perform inequality, rolling, and overlap joins. See the documentation at ?join_by for details on these types of joins.For simple equality joins, you can alternatively specify a character vector of variable names to join by. For example,
by = c("a", "b")
joinsx$a
toy$a
andx$b
toy$b
. If variable names differ betweenx
andy
, use a named character vector likeby = c("x_a" = "y_a", "x_b" = "y_b")
.To perform a cross-join, generating all combinations of
x
andy
, seecross_join()
.- copy
If
x
andy
are not from the same data source, andcopy
isTRUE
, theny
will be copied into the same src asx
. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.- suffix
If there are non-joined duplicate variables in
x
andy
, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.
Examples
library(dplyr, warn.conflicts = FALSE)
band_dt <- lazy_dt(dplyr::band_members)
instrument_dt <- lazy_dt(dplyr::band_instruments)
band_dt %>% left_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [3 x 3]
#> Call: setcolorder(`_DT23`[`_DT22`, on = .(name), allow.cartesian = TRUE],
#> c(1L, 3L, 2L))
#>
#> name band plays
#> <chr> <chr> <chr>
#> 1 Mick Stones NA
#> 2 John Beatles guitar
#> 3 Paul Beatles bass
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% right_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [3 x 3]
#> Call: `_DT22`[`_DT23`, on = .(name), allow.cartesian = TRUE]
#>
#> name band plays
#> <chr> <chr> <chr>
#> 1 John Beatles guitar
#> 2 Paul Beatles bass
#> 3 Keith NA guitar
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% inner_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [2 x 3]
#> Call: `_DT22`[`_DT23`, on = .(name), nomatch = NULL, allow.cartesian = TRUE]
#>
#> name band plays
#> <chr> <chr> <chr>
#> 1 John Beatles guitar
#> 2 Paul Beatles bass
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% full_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [4 x 3]
#> Call: merge(`_DT22`, `_DT23`, all = TRUE, by.x = "name", by.y = "name",
#> allow.cartesian = TRUE)
#>
#> name band plays
#> <chr> <chr> <chr>
#> 1 John Beatles guitar
#> 2 Keith NA guitar
#> 3 Mick Stones NA
#> 4 Paul Beatles bass
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% semi_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [2 x 2]
#> Call: `_DT22`[unique(`_DT22`[`_DT23`, which = TRUE, nomatch = NULL,
#> on = .(name)])]
#>
#> name band
#> <chr> <chr>
#> 1 John Beatles
#> 2 Paul Beatles
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
band_dt %>% anti_join(instrument_dt)
#> Joining, by = "name"
#> Source: local data table [1 x 2]
#> Call: `_DT22`[!`_DT23`, on = .(name)]
#>
#> name band
#> <chr> <chr>
#> 1 Mick Stones
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results