在使用dplyr的R中, distinct 和 unique 之间的区别是什么?
What are the differences between distinct and unique in R using dplyr in consideration to:
- 速度
- 功能(有效的输入,参数等)&用途
- 输出
例如:
library(dplyr) data(iris) # creating data with duplicates iris_dup <- bind_rows(iris, iris) d <- distinct(iris_dup) u <- unique(iris_dup) all(d==u) # returns True在此示例中, distinct 和 unique 执行相同的功能.是否有一些例子,您应该使用一种而不是另一种?有一个技巧或常见用法吗?
In this example distinct and unique perform the same function. Are there examples of times you should use one but not the other? Are there any tricks or common uses of one?
推荐答案这些功能可以互换使用,因为两个功能中都存在等效的命令.主要区别在于速度和输出格式.
These functions may be used interchangeably, as there exists equivalent commands in both functions. The main difference lies in the speed and the output format.
distinct()是dplyr软件包下的一个函数,可以自定义.例如,以下代码段仅返回数据框中指定一组列的不同元素
distinct() is a function under the package dplyr, and may be customized. For example, the following snippet returns only the distinct elements of a specified set of columns in the dataframe
distinct(iris_dup, Petal.Width, Species)unique()严格返回数据框中的唯一行.每行中的所有元素都必须匹配才能被称为重复项.
unique() strictly returns the unique rows in a dataframe. All the elements in each row must match in order to be termed as duplicates.
正如Imo所指出的, unique()具有相似的功能.我们获得一个临时数据帧,并从中找到唯一的行.对于大型数据帧,此过程可能会比较慢.
As Imo points out, unique() has a similar functionality. We obtain a temporary dataframe and find the unique rows from that. This process may be slower for large dataframes.
unique(iris_dup[c("Petal.Width", "Species")])两者都返回相同的输出(尽管差别很小-它们表示 不同 行号). distinct 返回一个有序列表,而 unique 返回每个唯一元素首次出现的行号.
Both return the same output (albeit with a small difference - they indicate different row numbers). distinct returns an ordered list, whereas unique returns the row number of the first occurrence of each unique element.
Petal.Width Species 1 0.2 setosa 2 0.4 setosa 3 0.3 setosa 4 0.1 setosa 5 0.5 setosa 6 0.6 setosa 7 1.4 versicolor 8 1.5 versicolor 9 1.3 versicolor 10 1.6 versicolor 11 1.0 versicolor 12 1.1 versicolor 13 1.8 versicolor 14 1.2 versicolor 15 1.7 versicolor 16 2.5 virginica 17 1.9 virginica 18 2.1 virginica 19 1.8 virginica 20 2.2 virginica 21 1.7 virginica 22 2.0 virginica 23 2.4 virginica 24 2.3 virginica 25 1.5 virginica 26 1.6 virginica 27 1.4 virginica总体而言,这两个函数均根据所选的组合列返回唯一的行元素.但是,我倾向于引用 dplyr 库,并指出 distinct 更快.
Overall, both functions return the unique row elements based on the combined set of columns chosen. However, I am inclined to quote the dplyr library and state that distinct is faster.
更多推荐
独特与独特之间的区别
发布评论