为什么子集不介意丢失数据框的子集参数?(Why subset doesn't mind missing subset argument for dataframes?)

编程入门 行业动态 更新时间:2024-10-23 03:27:08
为什么子集不介意丢失数据框的子集参数?(Why subset doesn't mind missing subset argument for dataframes?)

通常我不知道哪里出现了神秘的错误,但现在我的问题是神秘缺乏错误的来源。

numbers <- c(1, 2, 3) frame <- as.data.frame(numbers)

如果我输入

subset(numbers, )

(所以我想采取一些子集,但忘记指定子集函数的子集参数),那么R提醒我(它应该):

subset.default(数字)中的错误: 参数“subset”缺失,没有默认值

但是,当我输入

subset(frame,)

(所以与data.frame相同而不是向量),它不会给出错误,而只是返回(完整)数据帧。

这里发生了什么? 为什么我没有得到我当之无愧的错误信息?

Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from.

Let

numbers <- c(1, 2, 3) frame <- as.data.frame(numbers)

If I type

subset(numbers, )

(so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should):

Error in subset.default(numbers, ) : argument "subset" is missing, with no default

However when I type

subset(frame,)

(so the same thing with a data.frame instead of a vector), it doesn't give an error but instead just returns the (full) dataframe.

What is going on here? Why don't I get my well deserved error message?

最满意答案

tl; dr : subset函数根据提供的对象的类型调用不同的函数(具有不同的方法)。 在上面的例子中, subset(numbers, )使用subset.default而subset(frame, )使用subset.data.frame 。


R有几个内置的面向对象的系统。 最简单和最常见的称为S3。 这种OO编程风格实现了Wickham所说的“通用函数OO”。 在这种OO风格下,称为泛型函数的对象查看对象的类,然后将适当的方法应用于对象。 如果不存在直接方法,那么始终有一个默认方法可用。

为了更好地了解S3的工作原理和其他OO系统的工作原理,您可以查看Advanced R站点的相关部分。 找到对象的正确方法的过程称为方法分派。 您可以在帮助文件?UseMethod阅读更多信息。

正如在?subset的详细信息部分所述, subset函数“是一个通用函数”。 这意味着subset检查第一个参数中对象的类,然后使用方法调度将适当的方法应用于对象。

通用函数的方法编码为

<通用函数名称>。<类名称>

并可以使用methods(<generic function name>)找到methods(<generic function name>) 。 对于subset ,我们得到

methods(subset) [1] subset.data.frame subset.default subset.matrix see '?methods' for accessing help and source code

这表明如果对象有一个data.frame类,那么subset调用subset.data.frame方法(函数)。 它的定义如下:

subset.data.frame function (x, subset, select, drop = FALSE, ...) { r <- if (missing(subset)) rep_len(TRUE, nrow(x)) else { e <- substitute(subset) r <- eval(e, x, parent.frame()) if (!is.logical(r)) stop("'subset' must be logical") r & !is.na(r) } vars <- if (missing(select)) TRUE else { nl <- as.list(seq_along(x)) names(nl) <- names(x) eval(substitute(select), nl, parent.frame()) } x[r, vars, drop = drop] }

请注意,如果子集参数丢失,则第一行

r <- if (missing(subset)) rep_len(TRUE, nrow(x))

产生与data.frame相同长度的TRUES矢量,以及最后一行

x[r, vars, drop = drop]

将此向量馈送到行参数中,这意味着如果您未包含子集参数,则subset函数将返回data.frame的所有行。

正如我们从methods调用的输出中可以看到的, subset没有原子向量的方法。 这意味着,作为你的错误

subset.default中的错误(数字)

当你将subset应用于一个向量时,R调用被定义为的subset.default方法

subset.default function (x, subset, ...) { if (!is.logical(subset)) stop("'subset' must be logical") x[subset & !is.na(subset)] }

当子集参数丢失时, subset.default函数会抛出错误并stop 。

tl;dr: The subset function calls different functions (has different methods) depending on the type of object it is fed. In the example above, subset(numbers, ) uses subset.default while subset(frame, ) uses subset.data.frame.


R has a couple of object-oriented systems built-in. The simplest and most common is called S3. This OO programming style implements what Wickham calls a "generic-function OO." Under this style of OO, an object called a generic function looks at the class of an object and then applies the proper method to the object. If no direct method exists, then there is always a default method available.

To get a better idea of how S3 works and the other OO systems work, you might check out the relevant portion of the Advanced R site. The procedure of finding the proper method for an object is referred to as method dispatch. You can read more about this in the help file ?UseMethod.

As noted in the Details section of ?subset, the subset function "is a generic function." This means that subset examines the class of the object in the first argument and then uses method dispatch to apply the appropriate method to the object.

The methods of a generic function are encoded as

< generic function name >.< class name >

and can be found using methods(<generic function name>). For subset, we get

methods(subset) [1] subset.data.frame subset.default subset.matrix see '?methods' for accessing help and source code

which indicates that if the object has a data.frame class, then subset calls the subset.data.frame the method (function). It is defined as below:

subset.data.frame function (x, subset, select, drop = FALSE, ...) { r <- if (missing(subset)) rep_len(TRUE, nrow(x)) else { e <- substitute(subset) r <- eval(e, x, parent.frame()) if (!is.logical(r)) stop("'subset' must be logical") r & !is.na(r) } vars <- if (missing(select)) TRUE else { nl <- as.list(seq_along(x)) names(nl) <- names(x) eval(substitute(select), nl, parent.frame()) } x[r, vars, drop = drop] }

Note that if the subset argument is missing, the first lines

r <- if (missing(subset)) rep_len(TRUE, nrow(x))

produce a vector of TRUES of the same length as the data.frame, and the last line

x[r, vars, drop = drop]

feeds this vector into the row argument which means that if you did not include a subset argument, then the subset function will return all of the rows of the data.frame.

As we can see from the output of the methods call, subset does not have methods for atomic vectors. This means, as your error

Error in subset.default(numbers, )

that when you apply subset to a vector, R calls the subset.default method which is defined as

subset.default function (x, subset, ...) { if (!is.logical(subset)) stop("'subset' must be logical") x[subset & !is.na(subset)] }

The subset.default function throws an error with stop when the subset argument is missing.

更多推荐

本文发布于:2023-08-07 02:54:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1458448.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:子集   参数   数据   subset   dataframes

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!