如何阅读.csv

编程入门 行业动态 更新时间:2024-10-28 14:36:45
如何阅读.csv-data包含千位分隔符和零的特殊处理(在R中)?(How to read .csv-data containing thousand separators and special handling of zeros (in R)?)

Ubuntu 14.04上的R版本3.2.2

我试图读入包含千位分隔符“,”的R .csv-data(两列:“id”和“variable1”)。 到目前为止没问题。 我使用read.csv2,数据看起来像这样:

> data <- read.csv2("data.csv", sep = ";", stringsAsFactors = FALSE, dec = ".") > data[1000:1010, ] id variable1 1 2,001 1,001 2,002 1,002 2,001 1,003 2,002 1,004 2,001 1,005 2,002 1,006 2,001 1,007 2,002 1,008 2,001 1,009 2,002 1,01 2,001

在那之后,我首先尝试使用gsub()删除逗号:

data[, c("id", "variable1")] <- sapply(data[, c("id", "variable1")], function(x) {as.numeric(gsub("\\,","", as.character(x)))}) > data[1000:1010, ] id variable1 1 2001 1001 2002 1002 2001 1003 2002 1004 2001 1005 2002 1006 2001 1007 2002 1008 2001 1009 2002 101 2001

我认为我的问题在第一个输出中已经很明显,因为有一千个分隔符,但缺少“结束零”。 类似数字“1000”仅显示为“1”,“1010”显示为“1,01”,表示数据中的“id” - 变量(也在.csv数据中)。 当然,R无法识别这一点。

所以我的问题是:有没有办法告诉R每个数字在千位分隔符后读取数据时(或者可能在那之后)必须有三个数字,这样我才能得到正确的数字? 数据应如下所示:

> data[1000:1010, ] id variable1 1000 2001 1001 2002 1002 2001 1003 2002 1004 2001 1005 2002 1006 2001 1007 2002 1008 2001 1009 2002 1010 2001

编辑:谢谢大家的回答。 不幸的是,这些建议适用于此示例,但不适用于我的数据,因为我认为我选择了错误的示例行。 数据中的其他行可能如下所示:

id1 variable1 1 1 2,001 999 999 1,102 1000 1 2,001 1001 1,001 2,002 1002 1,002 2,001

当然,数字是“1”的两倍。 第一个真的是“1”,但第二个应该是“1000”。 但现在我觉得我无法用R解决我的问题。也许我需要更好地导出原始数据,因为问题也出现在.csv数据中。

R version 3.2.2 on Ubuntu 14.04

I am trying to read in R .csv-data (two columns: "id" and "variable1") containing the thousand separator ",". So far no problem. I am using read.csv2 and the data looks like that:

> data <- read.csv2("data.csv", sep = ";", stringsAsFactors = FALSE, dec = ".") > data[1000:1010, ] id variable1 1 2,001 1,001 2,002 1,002 2,001 1,003 2,002 1,004 2,001 1,005 2,002 1,006 2,001 1,007 2,002 1,008 2,001 1,009 2,002 1,01 2,001

After that first I tried to use gsub() to remove the commas:

data[, c("id", "variable1")] <- sapply(data[, c("id", "variable1")], function(x) {as.numeric(gsub("\\,","", as.character(x)))}) > data[1000:1010, ] id variable1 1 2001 1001 2002 1002 2001 1003 2002 1004 2001 1005 2002 1006 2001 1007 2002 1008 2001 1009 2002 101 2001

I think my problem is already obvious in the first output, because there is a thousand separator, but the "ending zeros" are missing. Like number "1000" is just displayed as "1" and "1010" as "1,01" for the "id"-variable in the data (also in the .csv-data). Of course, R can't identify this.

So my question is: Is there are way to tell R that every number must have three numbers after the thousand separator when reading in the data (or maybe after that), so that I have the correct numbers? The data should look like this:

> data[1000:1010, ] id variable1 1000 2001 1001 2002 1002 2001 1003 2002 1004 2001 1005 2002 1006 2001 1007 2002 1008 2001 1009 2002 1010 2001

Edit: Thanks you all for your answers. Unfortunately the suggestions will work for this example but not for my data, because I think I chose bad example rows. Other rows in the data can look like this:

id1 variable1 1 1 2,001 999 999 1,102 1000 1 2,001 1001 1,001 2,002 1002 1,002 2,001

Of course, there is twice the number "1". The first is really a "1", but the second should be a "1000". But now I think I can't solve my problem with R. Maybe I need a better export of the original data, because the problem appears also in the .csv data.

最满意答案

如果“,”是唯一的分隔符,即所有数字都是整数,则可以将csv2 (或read.csv )的dec参数设置为“,”并乘以1000:

data <- read.csv2( text = "id ; variable1 1 ; 2,001 1,008 ; 2,001 1,009 ; 2,002 1,01 ; 2,001 1,3 ; 2,0", sep = ";", stringsAsFactors = FALSE, header = TRUE, dec = "," )

> 1000*data id variable1 1 1000 2001 2 1008 2001 3 1009 2002 4 1010 2001 5 1300 2000 >

If "," is the only separator, i.e. all of the numbers are integers, you can set the dec argument of csv2 (or read.csv) to "," and multiply by 1000:

data <- read.csv2( text = "id ; variable1 1 ; 2,001 1,008 ; 2,001 1,009 ; 2,002 1,01 ; 2,001 1,3 ; 2,0", sep = ";", stringsAsFactors = FALSE, header = TRUE, dec = "," )

.

> 1000*data id variable1 1 1000 2001 2 1008 2001 3 1009 2002 4 1010 2001 5 1300 2000 >

更多推荐

本文发布于:2023-07-23 04:36:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1227551.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:csv

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!