Ubuntu 14.04上的R版本3.2.2
我试图读入包含千位分隔符“,”的R .csv-data(两列:“id”和“variable1”)。 到目前为止没问题。 我使用read.csv2,数据看起来像这样:
> data <- read.csv2("data.csv", sep = ";", stringsAsFactors = FALSE, dec = ".") > data[1000:1010, ] id variable1 1 2,001 1,001 2,002 1,002 2,001 1,003 2,002 1,004 2,001 1,005 2,002 1,006 2,001 1,007 2,002 1,008 2,001 1,009 2,002 1,01 2,001在那之后,我首先尝试使用gsub()删除逗号:
data[, c("id", "variable1")] <- sapply(data[, c("id", "variable1")], function(x) {as.numeric(gsub("\\,","", as.character(x)))}) > data[1000:1010, ] id variable1 1 2001 1001 2002 1002 2001 1003 2002 1004 2001 1005 2002 1006 2001 1007 2002 1008 2001 1009 2002 101 2001我认为我的问题在第一个输出中已经很明显,因为有一千个分隔符,但缺少“结束零”。 类似数字“1000”仅显示为“1”,“1010”显示为“1,01”,表示数据中的“id” - 变量(也在.csv数据中)。 当然,R无法识别这一点。
所以我的问题是:有没有办法告诉R每个数字在千位分隔符后读取数据时(或者可能在那之后)必须有三个数字,这样我才能得到正确的数字? 数据应如下所示:
> data[1000:1010, ] id variable1 1000 2001 1001 2002 1002 2001 1003 2002 1004 2001 1005 2002 1006 2001 1007 2002 1008 2001 1009 2002 1010 2001编辑:谢谢大家的回答。 不幸的是,这些建议适用于此示例,但不适用于我的数据,因为我认为我选择了错误的示例行。 数据中的其他行可能如下所示:
id1 variable1 1 1 2,001 999 999 1,102 1000 1 2,001 1001 1,001 2,002 1002 1,002 2,001当然,数字是“1”的两倍。 第一个真的是“1”,但第二个应该是“1000”。 但现在我觉得我无法用R解决我的问题。也许我需要更好地导出原始数据,因为问题也出现在.csv数据中。
R version 3.2.2 on Ubuntu 14.04
I am trying to read in R .csv-data (two columns: "id" and "variable1") containing the thousand separator ",". So far no problem. I am using read.csv2 and the data looks like that:
> data <- read.csv2("data.csv", sep = ";", stringsAsFactors = FALSE, dec = ".") > data[1000:1010, ] id variable1 1 2,001 1,001 2,002 1,002 2,001 1,003 2,002 1,004 2,001 1,005 2,002 1,006 2,001 1,007 2,002 1,008 2,001 1,009 2,002 1,01 2,001After that first I tried to use gsub() to remove the commas:
data[, c("id", "variable1")] <- sapply(data[, c("id", "variable1")], function(x) {as.numeric(gsub("\\,","", as.character(x)))}) > data[1000:1010, ] id variable1 1 2001 1001 2002 1002 2001 1003 2002 1004 2001 1005 2002 1006 2001 1007 2002 1008 2001 1009 2002 101 2001I think my problem is already obvious in the first output, because there is a thousand separator, but the "ending zeros" are missing. Like number "1000" is just displayed as "1" and "1010" as "1,01" for the "id"-variable in the data (also in the .csv-data). Of course, R can't identify this.
So my question is: Is there are way to tell R that every number must have three numbers after the thousand separator when reading in the data (or maybe after that), so that I have the correct numbers? The data should look like this:
> data[1000:1010, ] id variable1 1000 2001 1001 2002 1002 2001 1003 2002 1004 2001 1005 2002 1006 2001 1007 2002 1008 2001 1009 2002 1010 2001Edit: Thanks you all for your answers. Unfortunately the suggestions will work for this example but not for my data, because I think I chose bad example rows. Other rows in the data can look like this:
id1 variable1 1 1 2,001 999 999 1,102 1000 1 2,001 1001 1,001 2,002 1002 1,002 2,001Of course, there is twice the number "1". The first is really a "1", but the second should be a "1000". But now I think I can't solve my problem with R. Maybe I need a better export of the original data, because the problem appears also in the .csv data.
最满意答案
如果“,”是唯一的分隔符,即所有数字都是整数,则可以将csv2 (或read.csv )的dec参数设置为“,”并乘以1000:
data <- read.csv2( text = "id ; variable1 1 ; 2,001 1,008 ; 2,001 1,009 ; 2,002 1,01 ; 2,001 1,3 ; 2,0", sep = ";", stringsAsFactors = FALSE, header = TRUE, dec = "," )。
> 1000*data id variable1 1 1000 2001 2 1008 2001 3 1009 2002 4 1010 2001 5 1300 2000 >If "," is the only separator, i.e. all of the numbers are integers, you can set the dec argument of csv2 (or read.csv) to "," and multiply by 1000:
data <- read.csv2( text = "id ; variable1 1 ; 2,001 1,008 ; 2,001 1,009 ; 2,002 1,01 ; 2,001 1,3 ; 2,0", sep = ";", stringsAsFactors = FALSE, header = TRUE, dec = "," ).
> 1000*data id variable1 1 1000 2001 2 1008 2001 3 1009 2002 4 1010 2001 5 1300 2000 >更多推荐
发布评论