将FIX消息格式(“Tag = Value”)转换为CSV(Convert FIX message format (“Tag=Value”) into CSV)

编程入门 行业动态 更新时间:2024-10-23 13:28:12
将FIX消息格式(“Tag = Value”)转换为CSV(Convert FIX message format (“Tag=Value”) into CSV)

我有一个csv / log文件35 = S(引用消息;“Tag = Value”),我需要将速率提取到适合数据挖掘的正确的CSV文件中。 这不完全是与FIX相关的,它更像是一个关于如何清理数据集的R相关问题。

原始消息看起来像这样:

190=1.1204 ,191=-0.000029,193=20141008,537=0 ,631=1.12029575,642=0.000145,10=56 190=7.20425,191=0.000141 ,537=0 ,631=7.2034485,10=140 , , 190=1.26237,191=0 ,537=1 ,10=068 , , ,

我首先需要找到一个看起来像这样的中间数据集,其中相同的标签是对齐的。

190=1.1204 ,191=-0.000029,193=20141008,537=0,631=1.12029575,642=0.000145,10=56 190=7.20425,191=0.000141 , ,537=0,631=7.2034485 , ,10=140 190=1.26237,191=0 , ,537=1, , ,10=068

而这又需要转换为:

190 ,191 ,193 ,537,631 ,642 ,10 1.1204 ,-0.000029,20141008,0 ,1.12029575,0.000145,56 7.20425,0.000141 , ,0 ,7.2034485 , ,140 1.26237,0 , ,1 , , ,068

我正在用awk开发一个bash脚本,但我不知道我是否可以在R中做到这一点。目前,我最大的挑战是到达中间表。 从中间表到最终表我想到使用R与tidyr包,具体功能'分开'。 如果有人能提出更好的逻辑,我将不胜感激!

I have a csv/log file of 35=S (Quote messages; "Tag=Value") and I need to extract the rates into a proper CSV file for data mining. This is not strictly FIX related, it's more of a R related question on how to clean a dataset.

The raw messages look something like this:

190=1.1204 ,191=-0.000029,193=20141008,537=0 ,631=1.12029575,642=0.000145,10=56 190=7.20425,191=0.000141 ,537=0 ,631=7.2034485,10=140 , , 190=1.26237,191=0 ,537=1 ,10=068 , , ,

I need first to get to an intermediate data set that looks like this, where the same tags are aligned.

190=1.1204 ,191=-0.000029,193=20141008,537=0,631=1.12029575,642=0.000145,10=56 190=7.20425,191=0.000141 , ,537=0,631=7.2034485 , ,10=140 190=1.26237,191=0 , ,537=1, , ,10=068

which in turn will need to be converted to this:

190 ,191 ,193 ,537,631 ,642 ,10 1.1204 ,-0.000029,20141008,0 ,1.12029575,0.000145,56 7.20425,0.000141 , ,0 ,7.2034485 , ,140 1.26237,0 , ,1 , , ,068

I'm in the midst of developing a bash script with awk but I wonder if I can do that in R. At present, my greatest challenge is arriving to the intermediate table. From the intermediate to the final table I thought of using the R with the tidyr package, specifically function 'separate'. If anybody can suggest a better logic, I'll greatly appreciate!

最满意答案

另一种可能。 从@Andrie开始scan ,但也使用参数strip.white和na.strings :

x <- scan(text = "190=1.1204 ,191=-0.000029,193=20141008,537=0 ,631=1.12029575,642=0.000145,10=56 190=7.20425,191=0.000141 ,537=0 ,631=7.2034485,10=140 , , 190=1.26237,191=0 ,537=1 ,10=068 , , ,", sep = ",", what = "character", strip.white = TRUE, na.strings = "") # remove NA x <- x[!is.na(x)]

然后使用reshape2包中的colsplit和dcast :

library(reshape2) # split 'x' into two columns d1 <- colsplit(string = x, pattern = "=", names = c("x", "y")) # create an id variable, needed in dcast d1$id <- ave(d1$x, d1$x, FUN = seq_along) # reshape from long to wide d2 <- dcast(data = d1, id ~ x, value.var = "y") # id 10 190 191 193 537 631 642 # 1 1 56 1.12040 -0.000029 20141008 0 1.120296 0.000145 # 2 2 140 7.20425 0.000141 NA 0 7.203449 NA # 3 3 68 1.26237 0.000000 NA 1 NA NA

因为你提到tidyr :

library(tidyr) d1 <- separate(data = data.frame(x), col = x, into = c("x", "y"), sep = "=") d1$id <- ave(d1$x, d1$x, FUN = seq_along) spread(data = d1, key = x, value = y) # id 10 190 191 193 537 631 642 # 1 1 56 1.1204 -0.000029 20141008 0 1.12029575 0.000145 # 2 2 140 7.20425 0.000141 <NA> 0 7.2034485 <NA> # 3 3 068 1.26237 0 <NA> 1 <NA> <NA>

这保留了character的价值。 如果你想要numeric ,你可以设置convert = TRUE 。

Another possibility. Start with same scan as @Andrie, but also use arguments strip.white and na.strings:

x <- scan(text = "190=1.1204 ,191=-0.000029,193=20141008,537=0 ,631=1.12029575,642=0.000145,10=56 190=7.20425,191=0.000141 ,537=0 ,631=7.2034485,10=140 , , 190=1.26237,191=0 ,537=1 ,10=068 , , ,", sep = ",", what = "character", strip.white = TRUE, na.strings = "") # remove NA x <- x[!is.na(x)]

Then use colsplit and dcast from reshape2package:

library(reshape2) # split 'x' into two columns d1 <- colsplit(string = x, pattern = "=", names = c("x", "y")) # create an id variable, needed in dcast d1$id <- ave(d1$x, d1$x, FUN = seq_along) # reshape from long to wide d2 <- dcast(data = d1, id ~ x, value.var = "y") # id 10 190 191 193 537 631 642 # 1 1 56 1.12040 -0.000029 20141008 0 1.120296 0.000145 # 2 2 140 7.20425 0.000141 NA 0 7.203449 NA # 3 3 68 1.26237 0.000000 NA 1 NA NA

Because you mentioned tidyr:

library(tidyr) d1 <- separate(data = data.frame(x), col = x, into = c("x", "y"), sep = "=") d1$id <- ave(d1$x, d1$x, FUN = seq_along) spread(data = d1, key = x, value = y) # id 10 190 191 193 537 631 642 # 1 1 56 1.1204 -0.000029 20141008 0 1.12029575 0.000145 # 2 2 140 7.20425 0.000141 <NA> 0 7.2034485 <NA> # 3 3 068 1.26237 0 <NA> 1 <NA> <NA>

This retains the values as character. If you want numeric, you can set convert = TRUE in spread.

更多推荐

本文发布于:2023-07-24 10:52:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1245006.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:转换为   消息   格式   Tag   FIX

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!