按日期计算data.frame列平均值(Compute data.frame column averages by date)

编程入门行业动态更新时间:2024-10-27 08:32:46

我在R中有一个data.frame，其中一列是日期列表（其中许多是重复的），而另一列是在该日期记录的温度。问题的列看起来像这样（但有几千行和其他不必要的列）：

Date | Temp ----------------- 1/2/13 34.4 1/2/13 36.4 1/2/13 34.3 1/4/13 45.6 1/4/13 33.5 1/5/13 45.2

我需要找到获得日平均温度的方法。所以理想情况下，我可以告诉R循环访问数据框架，并为每个匹配的日期，给我一个当天温度的平均值。我一直在使用Google，并且我知道R中的循环是可能的，但是我无法用我对R代码知之甚少的概念来概括我的头。

我知道我可以拉出一个列并对其进行平均（即mean(data.frame[[2]]) ）但我完全迷失了如何告诉R将该均值与第一列中的单个值相匹配。

此外，我如何能够每七个日历日产生一次平均值（不管一天中有多少个条目）？所以，如果我的日期范围在1/1/13开始，我会得到在1/1/13和1/7/13之间取得的所有临时数据的平均值，然后是1/8 / 13和1/15/13等等......

我非常感谢帮助我掌握R循环的任何帮助。谢谢！

编辑

这里是dput(head(my.dataframe))的输出。 请注意：我编辑了“date”和“timestamp”，因为它们都会继续进行数千个条目：

structure(list(RECID = 579:584, SITEID = c(101L, 101L, 101L, 101L, 101L, 101L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L), DAY = c(7L, 7L, 7L, 7L, 7L, 7L), DATE = structure(c(34L, 34L, 34L, 34L, 34L, 34L), .Label = c("10/1/2013", "10/10/2013", "10/11/2013", "10/12/2013", "10/2/2013", "10/3/2013", "10/4/2013", "10/5/2013", "10/6/2013", "10/7/2013", "10/8/2013", "10/9/2013", "6/10/2013", "6/11/2013","9/9/2013"), class = "factor"), TIMESTAMP = structure(784:789, .Label = c("10/1/2013 0:00", "10/1/2013 1:00", "10/1/2013 10:00", "10/1/2013 11:00", "10/1/2013 12:00", "10/1/2013 13:00", "10/1/2013 14:00", "10/1/2013 15:00", "10/1/2013 16:00", "10/1/2013 17:00", "10/1/2013 18:00", "10/1/2013 19:00", "10/1/2013 2:00"), class = "factor"), TEMP = c(23.376, 23.376, 23.833, 24.146, 24.219, 24.05), X.C = c(NA, NA, NA, NA, NA, NA)), .Names = c("RECID", "SITEID", "MONTH", "DAY", "DATE", "TIMESTAMP", "TEMP", "X.C"), row.names = c(NA, 6L), class = "data.frame")

I have a data.frame in R where one column is a list of dates (many of which are duplicates), whereas the other column is a temperature recorded on that date. The columns in question look like this (but is several thousand rows and a few other unnecessary cols):

Date | Temp ----------------- 1/2/13 34.4 1/2/13 36.4 1/2/13 34.3 1/4/13 45.6 1/4/13 33.5 1/5/13 45.2

I need to find a way of getting a daily average for temperature. So ideally, I could tell R to loop through the data.frame and for every date that matched, give me an average for the temperature that day. I've been googling and I know loops in R are possible, but I can't wrap my head around this conceptually given what little I know about R code.

I know I can pull out a single column and average it (i.e. mean(data.frame[[2]])) but I'm utterly lost on how to tell R to match that mean to a single value located in the first column.

Additionally, how could I generate an average for every seven calendar days (regardless of how many entries exist for a single day)? So, a seven day rolling average, i.e. if my date range starts at 1/1/13 I'd get an average for all temps taken between 1/1/13 and 1/7/13, and then between 1/8/13 and 1/15/13 and so on...

Any assistance helping me grasp R loops is much appreciated. Thank you!

EDIT

Here's the output of dput(head(my.dataframe)) PLEASE NOTE: I edited down both "date" and "timestamp" because they both go on for several thousand entries otherwise:

最满意答案

library(plyr) ddply(df, .(Date), summarize, daily_mean_Temp = mean(Temp))

这是拆分应用组合范例的一个简单例子。

作为Ananda Mahto提到的备选方案1， dplyr软件包是对plyr的更高性能重写。他显示语法。

选择＃2： aggregate()在功能上也是等价的，只有比plyr/dplyr更少的铃哨。

此外， “每7个日历日生成平均值” ：您的意思是“每周平均值” ，还是“移动7天平均值（尾随/领先/居中）” ？

library(plyr) ddply(df, .(Date), summarize, daily_mean_Temp = mean(Temp))

This is a simple example of the Split-Apply-Combine paradigm.

Alternative #1 as Ananda Mahto mentions, dplyr package is a higher-performance rewrite of plyr. He shows the syntax.

Alternative #2: aggregate() is also functionally equivalent, just has fewer bells-and-whistles than plyr/dplyr.

Additionally 'generate average for every 7 calendar days': do you mean 'average-by-week-of-year', or 'moving 7-day average (trailing/leading/centered)'?

更多推荐

本文发布于:2023-07-27 15:33:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1292441.html