我使用以下代码准备了一些数据:
I have some data prepared using the below code:
# # Data Preparation ---------------------- library(lubridate) start_date <- "2018-10-30 00:00:00" start_date <- as.POSIXct(start_date, origin="1970-01-01") dates <- c(start_date) for(i in 1:287) { dates <- c(dates, start_date + minutes(i * 10)) } dates <- as.POSIXct(dates, origin="1970-01-01") date_val <- format(dates, '%d-%m-%Y') weather.forecast.data <- data.frame(dateTime = dates, date = date_val, id = 'GH1', radiation = runif(288)) weather.forecast.data$radiation[(weather.forecast.data$id == 'GH1') & (weather.forecast.data$date == '30-10-2018')] = NA我的任务是从 weather.forecast.data 过滤行,其中对于id和date的每个唯一实例,所有辐射值都将丢失。
My task is to filter rows from the weather.forecast.data where all radiation values are missing for each unique instance of id and date.
我有使用 data.table 编写的代码:
library(data.table) setDT(weather.forecast.data) weather.forecast.data[, dateid := paste(date, id, sep = "__")] weather.forecast.data[, is_all_na := all(is.na(radiation)), dateid] weather.forecast.data = weather.forecast.data[!(is_all_na), !c('dateid', 'is_all_na'), with = FALSE]我正在尝试使用 dplyr 函数和管道操作以提高可读性:
I am trying to use dplyr functions and pipe operations to make it better readable:
library(dplyr) weather.forecast.data %>% mutate(dateid = paste(date, id, sep = "__")) %>% group_by(dateid) %>% summarise(is_all_na = all(is.na(radiation))) %>% filter(is_all_na) %>% select(dateid)我能够检索 id 都丢失了。但是,我无法从原始数据中删除 id 。
I am able to retrieve the id with all missing. But, I am unable to remove the id from the original data.
推荐答案无需一起粘贴 列,则可以 group_by 多列
No need to paste columns together, you can group_by multiple columns
library(dplyr) weather.forecast.data %>% group_by(date, id) %>% filter(!all(is.na(radiation)))这将删除行其中全部辐射 是每个<$ c的 NA $ c>日期和 id 。
This will drop the rows where all the radiation is NA for each date and id.
更多推荐
组中所有不适用的投递ID
发布评论