我有一个数据集,其中每个个体( id )都有一个 e_date ,并且由于每个个体都可以有多个 e_date ,试图获取每个人的最早约会。因此,基本上,我希望有一个数据集,每个 id 均显示一行,以显示他最早的 e_date 值。 我使用了聚合函数来找到最小值,我创建了一个新的变量,将日期和id结合在一起,最后我基于原始数据集的子集,使用新变量包含最小值创建。我来了:
I have a dataset where each individual (id) has an e_date, and since each individual could have more than one e_date, I'm trying to get the earliest date for each individual. So basically I would like to have a dataset with one row per each id showing his earliest e_date value. I've use the aggregate function to find the minimum values, I've created a new variable combining the date and the id and last I've subset the original dataset based on the one containing the minimums using the new variable created. I've come to this:
new <- aggregate(e_date ~ id, data_full, min) data_full["comb"] <- NULL data_full$comb <- paste(data_full$id,data_full$e_date) new["comb"] <- NULL new$comb <- paste(new$lopnr,new$EDATUM) data_fixed <- data_full[which(new$comb %in% data_full$comb),]第一件事是聚合函数似乎根本不起作用,它减少了行,但查看数据后,我可以清楚地看到,某些ID用不同的 e_date 出现了多次。另外,当我使用as.Date格式而不是日期(整数)的原始格式时,代码为我提供了不同的结果。我认为答案很简单,但我对此很惊讶。
The first thing is that the aggregate function doesn't seems to work at all, it reduces the number of rows but viewing the data I can clearly see that some ids appear more than once with different e_date. Plus, the code gives me different results when I use the as.Date format instead of its original format for the date (integer). I think the answer is simple but I'm struck on this one.
推荐答案我们可以使用 data .table 。将'data.frame'转换为'data.table'( setDT(data_full)),按'id'分组,我们得到第一行( head(.SD,1L))。
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(data_full)), grouped by 'id', we get the 1st row (head(.SD, 1L)).
library(data.table) setDT(data_full)[order(e_date), head(.SD, 1L), by = id]
或使用 dplyr ,按'id'分组后,安排 e_date(假设它属于 Date 类),并获得带有 slice 的第一行。
Or using dplyr, after grouping by 'id', arrange the 'e_date' (assuming it is of Date class) and get the first row with slice.
library(dplyr) data_full %>% group_by(id) %>% arrange(e_date) %>% slice(1L)
如果我们需要 base R 选项,则可以使用 ave
data_full[with(data_full, ave(e_date, id, FUN = function(x) rank(x)==1)),]更多推荐
R中每个ID的最早日期
发布评论