我在R中有一个数据框,其中有两列 temp 和 timeStamp 。数据经常具有 temp 值。数据框的一部分看起来像 -
I have a data frame in R with two columns temp and timeStamp. The data has temp values regularly. A portion of dataframe looks like-
I必须创建显示随时间变化的线条图。从这里可以看出,对于几个 timeStamp , temp 的值保持不变。拥有这些重复值会增加数据文件的大小,我想删除它们。所以输出应该是这样的 -
I have to create line chart showing changes in temp over time. As can be seen here, temp values remain the same for several timeStamp. Having these repeating value increases the size of data file and I want to remove them. So the output should look like this-
只显示有变化的值。 无法想像得到这样的想法。任何对方向的输入都是非常有用的。
Showing just the values where there is a change. Cannot think of a way to get this think done in R. Any inputs in the right direction would be really helpful.
推荐答案一个选项是使用 data.table 。我们将'data.frame'转换为'data.table'( setDT(df1))。按temp分组,我们将每个组的第一个和最后一个观察( .SD [c(1L,.N)] 进行子集。如果每个组只有一个值,那么我们将这个行( else .SD )。
One option would be using data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)). Grouped by 'temp', we subset the first and last observation (.SD[c(1L, .N)]) per each group. If there is only a single value per group, we take the row as such (else .SD).
library(data.table) setDT(df1)[, if(.N>1) .SD[c(1L, .N)] else .SD, by =temp] # temp val #1: 22.50 1 #2: 22.50 4 #3: 22.37 5 #4: 22.42 6 #5: 22.42 7
或 base R 选项与重复。我们在'temp'(输出是一个逻辑向量)中检查重复的值,并检查反向的复制( fromLast = TRUE )。在这两种情况下,使用& 找到 TRUE元素,否定(!),并将'df1'的行子集。
Or a base R option with duplicated. We check the duplicated values in 'temp' (output is a logical vector), and also check the duplication from the reverse side (fromLast=TRUE). Use & to find the elements that are TRUE in both cases, negate (!) and subset the rows of 'df1'.
df1[!(duplicated(df1$temp) & duplicated(df1$temp,fromLast=TRUE)),] # temp val #1 22.50 1 #4 22.50 4 #5 22.37 5 #6 22.42 6 #7 22.42 7数据
data
df1 <- data.frame(temp=c(22.5, 22.5, 22.5, 22.5, 22.37,22.42, 22.42), val=1:7)更多推荐
按每个组的第一个和最后一个值进行子集
发布评论