Skip to content

This is a method for the dplyr distinct() generic. It is translated to


# S3 method for dtplyr_step
distinct(.data, ..., .keep_all = FALSE)



A lazy_dt()


<data-masking> Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables in the data frame.


If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values.


library(dplyr, warn.conflicts = FALSE)
df <- lazy_dt(data.frame(
  x = sample(10, 100, replace = TRUE),
  y = sample(10, 100, replace = TRUE)

df %>% distinct(x)
#> Source: local data table [10 x 1]
#> Call:   unique(`_DT7`[, .(x)])
#>       x
#>   <int>
#> 1    10
#> 2     4
#> 3     1
#> 4     5
#> 5     7
#> 6     2
#> # … with 4 more rows
#> # Use to access results
df %>% distinct(x, y)
#> Source: local data table [64 x 2]
#> Call:   unique(`_DT7`)
#>       x     y
#>   <int> <int>
#> 1    10     8
#> 2     4     3
#> 3     1     8
#> 4     1     6
#> 5     5     7
#> 6     7    10
#> # … with 58 more rows
#> # Use to access results
df %>% distinct(x, .keep_all = TRUE)
#> Source: local data table [10 x 2]
#> Call:   unique(`_DT7`, by = "x")
#>       x     y
#>   <int> <int>
#> 1    10     8
#> 2     4     3
#> 3     1     8
#> 4     5     7
#> 5     7    10
#> 6     2     7
#> # … with 4 more rows
#> # Use to access results