根据不同的日期要求过滤行(Filtering rows based on different date requirements)

编程入门 行业动态 更新时间:2024-10-27 01:27:24
根据不同的日期要求过滤行(Filtering rows based on different date requirements)

我有一个包含三列的数据框,包括站点ID,样本日期和测量值。 这是一个理论数据集。

Dates <- data.frame(c(as.Date("2008-7-1"), rep(as.Date("2008-3-1"), times = 4) , rep(as.Date("2008-9-1"), times = 4), as.Date("2008-9-8"))) Sites <- as.data.frame(as.factor(c("Site1",rep(c("Site1","Site2","Site3","Site4"), 2), "Site1"))) Values <- data.frame(matrix(sample(0:50, 5*2, replace=TRUE), ncol=1)) Dataframe <- cbind(Dates,Sites,Values) colnames(Dataframe) <- c("date","site","value")

我正在筛选出与某些标准不符的特定样品。

首先 ,我想只选择春季和秋季样本。 所以我想在3月到5月和9月到11月之间选择网站,这意味着数据框中的第一行将被删除。 有没有比以下更好的方法:

library(dplyr) Season_sequence <- c(seq(as.Date("2008-3-1"), as.Date("2008-5-31"), by="days"), seq(as.Date("2008-9-1"), as.Date("2008-11-30"), by="days")) `%datein%` <- function(x,y) (x %in% y) Season_removed <- Dataframe %>% filter(date %datein% Season_sequence)

这有效,但如果我有几年的样本,我不知道如何快速创建一个序列来匹配这个。

其次 ,我不希望在特定季节内来自特定站点的两个样本(即我不想要任何重复样本),这意味着将删除数据帧中的最后一行。 我不知道如何从这个开始。

I have a dataframe with three columns consisting of a site ID, the date of a sample and a measured value. Here is a theoretical dataset.

Dates <- data.frame(c(as.Date("2008-7-1"), rep(as.Date("2008-3-1"), times = 4) , rep(as.Date("2008-9-1"), times = 4), as.Date("2008-9-8"))) Sites <- as.data.frame(as.factor(c("Site1",rep(c("Site1","Site2","Site3","Site4"), 2), "Site1"))) Values <- data.frame(matrix(sample(0:50, 5*2, replace=TRUE), ncol=1)) Dataframe <- cbind(Dates,Sites,Values) colnames(Dataframe) <- c("date","site","value")

I am screening out specific samples that do not match certain criteria.

Firstly, I would like to only select spring and autumn samples. So I would like to select sites between March-May and September-November, meaning the first row in the dataframe would be removed. Is there a better way than the following:

library(dplyr) Season_sequence <- c(seq(as.Date("2008-3-1"), as.Date("2008-5-31"), by="days"), seq(as.Date("2008-9-1"), as.Date("2008-11-30"), by="days")) `%datein%` <- function(x,y) (x %in% y) Season_removed <- Dataframe %>% filter(date %datein% Season_sequence)

This works but if I have samples over several years I am not sure how to quickly create a sequence to match this.

Secondly, I do not want two samples from a specific site within a particular season (i.e. I do not want any replicate samples), meaning that the last row in the dataframe would be removed. I am not sure how to start with this one.

最满意答案

对于第一个问题,您可以为您的月份创建一个列(独立于年份)并选择该列(此处,转换为数字,但您也可以保留简单的单词选择)。 对于第二个问题,您可以使用disctinct :

Dataframe %>% mutate(month = as.numeric(format(date, '%m'))) %>% filter(month %in% c(3,4,5,9,10,11)) %>% distinct(month, site)

For the first problem, you can create a column for your month (independent from the year) and select on that one (here, converted into numeric but you could keep the plain word selection too). For the second problem, you can use disctinct:

Dataframe %>% mutate(month = as.numeric(format(date, '%m'))) %>% filter(month %in% c(3,4,5,9,10,11)) %>% distinct(month, site)

更多推荐

本文发布于:2023-07-14 23:38:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1108332.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:日期   Filtering   rows   requirements   date

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!