创建一个虚拟矩阵，将来自df中列的值放入矩阵中，其中值存在= 1，其余= NA或0(Create a dummy matrix where values from a column in a df a

创建一个虚拟矩阵，将来自df中列的值放入矩阵中，其中值存在= 1，其余= NA或0(Create a dummy matrix where values from a column in a df are put into a matrix and where value exist = 1 and the rest =NA or 0)

问题描述：

我有一个11640行的时间系列数据，其中每行是在那个特定小时测量的水份。我需要一个代码，使我的列具有watertable值（我的数据集中的特定列的名称是：“Watertable”），并将其放入Matrix中的不同列中。 “Watertable”中的列包含从0到100的深度值。我只对1到120厘米的值感兴趣。

我创建了具有相同行长（即11640行）和120列的矩阵，其中每列表示深度低于表面的深度（例如列35（名为X35）在表面下35厘米）（所有列名都是1到120， X在数字前面：X1，X2，X3，X4 ... X119，X120）。现在我想将自己的数值放在我的“Watertable”中，放入矩阵的相应列中。

例：

如果某一行的“Watertable”值为58.我希望将matrox中58和更高的所有列设置为1（并将较低的列设置为NA或0）。 OBS！由于我的数据有十进制值，如果.5为“四舍五入”，如果<.5 ---> 50.56781 = 51和50.34369 = 50则为“向下舍入”

“Watertable”数据示例：

> head(DATA1) DATE_TIME Watertable 1 2014-06-14 00:00:00 50.80874 2 2014-06-14 01:00:00 50.04499 3 2014-06-14 02:00:00 50.02677 4 2014-06-14 03:00:00 51.01249 5 2014-06-14 04:00:00 51.04969 6 2014-06-14 05:00:00 51.56349

Description of problem:

I have a time serie data with 11640 rows where each row is the measured watertable at that certain hour. I'm in need of a code that take my column with watertable values (name of the the certain column in my Dataset is: "Watertable") and put it into different columns in a Matrix. The columns in "Watertable" contains depth values from 0 to over 100. I'm only interested in the values that is from 1 to 120 cm.

I have created the matrix with the same row length (i.e. 11640 rows) and with 120 columns where each column represent depth below surface (e.g. column 35 (named X35) is 35 cm below surface) (all column names are 1 to 120 with an X in front of the number: X1, X2, X3, X4... X119, X120). Now I want to put my values in my "Watertable" into the corresponding column in my matrix.

Example:

if the "Watertable" value for a certain row is 58. I want to have all columns in my matrox that is 58 and higher set to 1 (and the lower ones set to NA or 0). OBS! As my data has decimal values, I want to "round up" if .5 and "round down" if <.5 ---> 50.56781 = 51 and 50.34369 = 50

Example of "Watertable" data:

> head(DATA1) DATE_TIME Watertable 1 2014-06-14 00:00:00 50.80874 2 2014-06-14 01:00:00 50.04499 3 2014-06-14 02:00:00 50.02677 4 2014-06-14 03:00:00 51.01249 5 2014-06-14 04:00:00 51.04969 6 2014-06-14 05:00:00 51.56349

What I want:

Date X1 ... X50 X51 ... 2014-06-14 00:00:00 NA or 0 NA or 0 1 1 1 2014-06-14 01:00:00 NA or 0 NA or 0 1 1 1 2014-06-14 02:00:00 NA or 0 NA or 0 1 1 1 2014-06-14 03:00:00 NA or 0 NA or 0 NA or 0 1 1 2014-06-14 04:00:00 NA or 0 NA or 0 NA or 0 1 1 2014-06-14 05:00:00 NA or 0 NA or 0 NA or 0 NA or 0 1

I have also one column with Date and time in my matrix as I thought I'll need that one in my code

Codefor my matrix:

WT_U_mtx= matrix(NA, nrow=11640, ncol=101, byrow=FALSE) s= seq(from=0, to=100, by=1) colnames(WT_U_mtx) = s WT_U_mtx= as.data.frame(WT_U_mtx) names(WT_U_mtx) <- sub("", "X", names(WT_U_mtx)) WT_U_mtx= cbind(WT_U_mtx, DATA[,"DATE_TIME"])

Look of raw matrix

tbl_df(WT_U_mtx) Source: local data frame [11,640 x 122]

X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 1 NA NA NA NA NA NA NA NA NA NA NA NA 2 NA NA NA NA NA NA NA NA NA NA NA NA 3 NA NA NA NA NA NA NA NA NA NA NA NA 4 NA NA NA NA NA NA NA NA NA NA NA NA 5 NA NA NA NA NA NA NA NA NA NA NA NA 6 NA NA NA NA NA NA NA NA NA NA NA NA 7 NA NA NA NA NA NA NA NA NA NA NA NA 8 NA NA NA NA NA NA NA NA NA NA NA NA 9 NA NA NA NA NA NA NA NA NA NA NA NA 10 NA NA NA NA NA NA NA NA NA NA NA NA ... ... ... ... ... ... ... ... ... ... ... ... ... Variables not shown: X13 (lgl),X14 (lgl),X15 (lgl),X16 (lgl),X17 (lgl), X18 (lgl), X19 (lgl), X20 (lgl), ..., X120 (lgl), DATA[, "DATE_TIME"] (time)

My try: I have actually no clue how to write the code and I have been looking everywhere on the internet (including other questions on stackoverflow). I assume I want some kind of "If" function. My apologies if the question isn't clear or incorrect in structure, I'm a rookie to this webpage and Rstudio overall.

Appreciate all help I can get! /Elin

最满意答案

组成数据：

dd <- read.csv(text=" time,water.table 2014-06-14 00:00:00,50.80874 2014-06-14 01:00:00,50.04499 2014-06-14 02:00:00,50.02677 2014-06-14 03:00:00,51.01249 2014-06-14 04:00:00,51.04969 2014-06-14 05:00:00,51.56349") dd$time <- as.POSIXct(dd$time)

汇总矩阵：

maxlen <- 120 r <- round(dd$water.table) m <- sapply(r,function(x) rep(0:1,c(x-1,maxlen-x+1))) m <- t(m)

检查：

dim(m) ## 6 120 apply(m==0,1,sum) ## number of zeros in each row ## [1] 50 49 49 50 50 51

结合：

dd2 <- data.frame(dd,m)

Make up data:

dd <- read.csv(text=" time,water.table 2014-06-14 00:00:00,50.80874 2014-06-14 01:00:00,50.04499 2014-06-14 02:00:00,50.02677 2014-06-14 03:00:00,51.01249 2014-06-14 04:00:00,51.04969 2014-06-14 05:00:00,51.56349") dd$time <- as.POSIXct(dd$time)

Put together matrix:

maxlen <- 120 r <- round(dd$water.table) m <- sapply(r,function(x) rep(0:1,c(x-1,maxlen-x+1))) m <- t(m)

Check:

dim(m) ## 6 120 apply(m==0,1,sum) ## number of zeros in each row ## [1] 50 49 49 50 50 51

Combine:

dd2 <- data.frame(dd,m)

更多推荐