Skip to content

This is a method for the dplyr distinct() generic. It is translated to data.table::unique.data.table().

Usage

# S3 method for dtplyr_step
distinct(.data, ..., .keep_all = FALSE)

Arguments

.data

A lazy_dt()

...

<data-masking> Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables.

.keep_all

If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values.

Examples

library(dplyr, warn.conflicts = FALSE)
df <- lazy_dt(data.frame(
  x = sample(10, 100, replace = TRUE),
  y = sample(10, 100, replace = TRUE)
))

df %>% distinct(x)
#> Source: local data table [10 x 1]
#> Call:   unique(`_DT6`[, .(x)])
#> 
#>       x
#>   <int>
#> 1     8
#> 2     3
#> 3    10
#> 4     1
#> 5     5
#> 6     6
#> # … with 4 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
df %>% distinct(x, y)
#> Source: local data table [60 x 2]
#> Call:   unique(`_DT6`)
#> 
#>       x     y
#>   <int> <int>
#> 1     8     7
#> 2     3     4
#> 3    10     7
#> 4     1    10
#> 5     5     7
#> 6     6     1
#> # … with 54 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
df %>% distinct(x, .keep_all = TRUE)
#> Source: local data table [10 x 2]
#> Call:   unique(`_DT6`, by = "x")
#> 
#>       x     y
#>   <int> <int>
#> 1     8     7
#> 2     3     4
#> 3    10     7
#> 4     1    10
#> 5     5     7
#> 6     6     1
#> # … with 4 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results