根据不同的日期要求过滤行(Filtering rows based on different date requirements)

我有一个包含三列的数据框，包括站点ID，样本日期和测量值。这是一个理论数据集。

Dates <- data.frame(c(as.Date("2008-7-1"), rep(as.Date("2008-3-1"), times = 4) , rep(as.Date("2008-9-1"), times = 4), as.Date("2008-9-8"))) Sites <- as.data.frame(as.factor(c("Site1",rep(c("Site1","Site2","Site3","Site4"), 2), "Site1"))) Values <- data.frame(matrix(sample(0:50, 5*2, replace=TRUE), ncol=1)) Dataframe <- cbind(Dates,Sites,Values) colnames(Dataframe) <- c("date","site","value")

我正在筛选出与某些标准不符的特定样品。

首先，我想只选择春季和秋季样本。所以我想在3月到5月和9月到11月之间选择网站，这意味着数据框中的第一行将被删除。有没有比以下更好的方法：

library(dplyr) Season_sequence <- c(seq(as.Date("2008-3-1"), as.Date("2008-5-31"), by="days"), seq(as.Date("2008-9-1"), as.Date("2008-11-30"), by="days")) `%datein%` <- function(x,y) (x %in% y) Season_removed <- Dataframe %>% filter(date %datein% Season_sequence)

这有效，但如果我有几年的样本，我不知道如何快速创建一个序列来匹配这个。

其次，我不希望在特定季节内来自特定站点的两个样本（即我不想要任何重复样本），这意味着将删除数据帧中的最后一行。我不知道如何从这个开始。

I have a dataframe with three columns consisting of a site ID, the date of a sample and a measured value. Here is a theoretical dataset.

Dates <- data.frame(c(as.Date("2008-7-1"), rep(as.Date("2008-3-1"), times = 4) , rep(as.Date("2008-9-1"), times = 4), as.Date("2008-9-8"))) Sites <- as.data.frame(as.factor(c("Site1",rep(c("Site1","Site2","Site3","Site4"), 2), "Site1"))) Values <- data.frame(matrix(sample(0:50, 5*2, replace=TRUE), ncol=1)) Dataframe <- cbind(Dates,Sites,Values) colnames(Dataframe) <- c("date","site","value")

I am screening out specific samples that do not match certain criteria.

Firstly, I would like to only select spring and autumn samples. So I would like to select sites between March-May and September-November, meaning the first row in the dataframe would be removed. Is there a better way than the following:

library(dplyr) Season_sequence <- c(seq(as.Date("2008-3-1"), as.Date("2008-5-31"), by="days"), seq(as.Date("2008-9-1"), as.Date("2008-11-30"), by="days")) `%datein%` <- function(x,y) (x %in% y) Season_removed <- Dataframe %>% filter(date %datein% Season_sequence)

This works but if I have samples over several years I am not sure how to quickly create a sequence to match this.

Secondly, I do not want two samples from a specific site within a particular season (i.e. I do not want any replicate samples), meaning that the last row in the dataframe would be removed. I am not sure how to start with this one.

最满意答案

对于第一个问题，您可以为您的月份创建一个列（独立于年份）并选择该列（此处，转换为数字，但您也可以保留简单的单词选择）。对于第二个问题，您可以使用disctinct ：

Dataframe %>% mutate(month = as.numeric(format(date, '%m'))) %>% filter(month %in% c(3,4,5,9,10,11)) %>% distinct(month, site)

For the first problem, you can create a column for your month (independent from the year) and select on that one (here, converted into numeric but you could keep the plain word selection too). For the second problem, you can use disctinct:

Dataframe %>% mutate(month = as.numeric(format(date, '%m'))) %>% filter(month %in% c(3,4,5,9,10,11)) %>% distinct(month, site)

更多推荐