将CSV文件内容转换为Markdown(Convert CSV file contents to Markdown)

编程入门 行业动态 更新时间:2024-10-27 14:34:24
将CSV文件内容转换为Markdown(Convert CSV file contents to Markdown)

背景

目标是从CSV文件读取并以Markdown表格格式写入内容。

该应用程序使用R引擎Renjin ,它不支持knitr , kable或pandoc 。

问题

write.table命令有一个eol选项,但没有相应的sol选项。 因此对于以下情况:

f <- read.csv('planning.csv')
write.table(
   format(f, digits=2), "",
   sep="|", row.names=F, col.names=F, quote=F, eol="|\n")
 

输出显示如下:

Geothermal|1250.0|Electricity|0.0|
Houses|  13.7|Shelter|4.2|
Compostor|   1.2|Recycling|0.2|
 

但是每行应该以| 前缀如下:

|Geothermal|1250.0|Electricity|0.0|
|Houses|  13.7|Shelter|4.2|
|Compostor|   1.2|Recycling|0.2|
 

应该可以做类似的事情(注意额外的eol管道):

write.table(
       format(f, digits=2), "",
       sep="|", row.names=F, col.names=F, quote=F, eol="|\n|")
 

然后将所有内容捕捉为一个字符串,连接一个前导管道,最后修剪无关的末端管道。 也就是说,解决与输出相似的问题:

Geothermal|1250.0|Electricity|0.0|
|Houses|  13.7|Shelter|4.2|
|Compostor|   1.2|Recycling|0.2|
|Fire Station|  -9.6|Protection|0.5|
|Roads|   0.0|Transport|0.9|
|
 

虽然这样的字符串操作看起来不像R。

在不依赖第三方库的情况下将CSV文件转换为Markdown格式的最有效方法是什么?

有问题的减价风格如下所示:

|Header|Header|Header|
|---|---|---|
|Data|Data|Data|
|Data|Data|Data|
 

关于如何只写头数据和表头分隔符的提示也很受欢迎。

Background

The objective is to read from a CSV file and write the contents in a Markdown table format.

The application uses the R engine Renjin, which does not support knitr, kable, or pandoc.

Problem

The write.table command has an eol option, but no corresponding sol option. Thus for the following:

f <- read.csv('planning.csv')
write.table(
   format(f, digits=2), "",
   sep="|", row.names=F, col.names=F, quote=F, eol="|\n")
 

The output appears as follows:

Geothermal|1250.0|Electricity|0.0|
Houses|  13.7|Shelter|4.2|
Compostor|   1.2|Recycling|0.2|
 

But each line should appear with a | prefix, as follows:

|Geothermal|1250.0|Electricity|0.0|
|Houses|  13.7|Shelter|4.2|
|Compostor|   1.2|Recycling|0.2|
 

It should be possible to do something like (note the extra eol pipe):

write.table(
       format(f, digits=2), "",
       sep="|", row.names=F, col.names=F, quote=F, eol="|\n|")
 

Then capture everything as a string, concatenate a leading pipe, and finally trim the extraneous ending pipe. That is, fix the problems with the output that'll resemble:

Geothermal|1250.0|Electricity|0.0|
|Houses|  13.7|Shelter|4.2|
|Compostor|   1.2|Recycling|0.2|
|Fire Station|  -9.6|Protection|0.5|
|Roads|   0.0|Transport|0.9|
|
 

Such string manipulation doesn't seem very R-like, though.

Question

What is the most efficient way to transmogrify a CSV file into a Markdown format without relying on third-party libraries?

The Markdown flavour in question looks like:

|Header|Header|Header|
|---|---|---|
|Data|Data|Data|
|Data|Data|Data|
 

Hints for how to write only the header data and the table header separator are also welcome.

最满意答案

既然你想把它放到降价中,我认为可以肯定地说表格大小是可管理的,所以性能不是一个因素。 ( 编辑#3 :我有一些与行名存在有关的小错误,所以为了简化,我将从样本数据中完全删除它们。)

mtcars$rowname <- rownames(mtcars) rownames(mtcars) <- NULL mtcars <- mtcars[,c(ncol(mtcars), 1:(ncol(mtcars)-1))] head(mtcars) # rowname mpg cyl disp hp drat wt qsec vs am gear carb # 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 # 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 # 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 # 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 # 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 # 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

现在的工作:

dashes <- paste(rep("---", ncol(mtcars)), collapse = "|") txt <- capture.output( write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE) ) txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1])) head(txt2) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|---|---|---|---|---|---|---|---|---|---|---|---|" # [3] "|Mazda RX4|21|6|160|110|3.9|2.62|16.46|0|1|4|4|" # [4] "|Mazda RX4 Wag|21|6|160|110|3.9|2.875|17.02|0|1|4|4|" # [5] "|Datsun 710|22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|" # [6] "|Hornet 4 Drive|21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|"

如果你关心对齐,你可以检查character (也可能是其他character )。 这使用降格表格式的对齐行:

(ischar <- vapply(mtcars, is.character, logical(1))) # rowname mpg cyl disp hp drat wt qsec vs am gear carb # TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|") txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)) txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1])) head(txt2) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|" # [3] "|Mazda RX4|21|6|160|110|3.9|2.62|16.46|0|1|4|4|" # [4] "|Mazda RX4 Wag|21|6|160|110|3.9|2.875|17.02|0|1|4|4|" # [5] "|Datsun 710|22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|" # [6] "|Hornet 4 Drive|21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|"

当你终于准备好保存时,使用cat(txt2, file = "sometable.md") (或writeLines )。

编辑#1 :请注意,其他建议的答案(包括上面的内容)不涉及内容中的管道符号:

mtcars$mpg[1] <- "2|1.0" ischar <- vapply(mtcars, is.character, logical(1)) dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|") txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)) txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1])) head(txt2, n = 3) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|" # [3] "|Mazda RX4|2|1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|" ### ^ this is the problem

您可以在所有字符上手动转义它(或添加因素)列:

ischar <- vapply(mtcars, is.character, logical(1)) mtcars[ischar] <- lapply(mtcars[ischar], function(x) gsub("\\|", "&#124;", x)) dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|") txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)) txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1])) head(txt2, n = 3) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|" # [3] "|Mazda RX4|2&#124;1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|" ### ^^^^^^ this is the pipe, interpreted correctly in markdown

这在管道位于代码块内时不起作用,但在此处提供了解决方法: https : //stackoverflow.com/a/17320389/3358272

在这一点上,正如@alistaire建议的,你有点重新实现knitr::kable 。 对于这个问题,只需抓住knitr/R/table.R )并使用kable_markdown ,它kable_markdown管道泄漏。 它需要一个character matrix ,而不是data.frame ,所以kable_markdown(as.matrix(mtcars)) 。 您不能只抓住单个函数,因为它在该文件中也使用了几个辅助函数。 你当然可以修剪一些函数,包括需要其他文件中的函数的kable本身。

编辑#2 :因为你说仁人不支持*apply函数(评论表明这是不正确的,但我会继续争论),这是一个for -loop实现,包括对齐和| -escaping:

mtcars$mpg[1] <- "2|1.0" # just a reminder that it's here dashes <- rep("--:", length(mtcars)) for (i in seq_along(mtcars)) { if (is.character(mtcars[[i]]) || is.factor(mtcars[[i]])) { mtcars[[i]] <- gsub("\\|", "&#124;", mtcars[[i]]) dashes[i] <- ":--" } } txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)) txt2 <- sprintf("|%s|", c(txt[1], paste(dashes, collapse = "|"), txt[-1])) head(txt2, n = 3) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|" # [3] "|Mazda RX4|2&#124;1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|"

为了记录,我的*apply和for -loop实现实际上是相同的性能,而@ alistaire的解决方案速度是两倍(使用mtcars ):

Unit: microseconds expr min lq mean median uq max neval apply_noalign 917.881 947.9665 1031.9288 971.3060 1041.5050 1999.499 100 apply_align 945.960 975.1350 1083.2856 995.7390 1063.7500 3523.101 100 apply_align_pipes 1110.429 1148.5360 1255.5460 1176.9815 1275.2600 1905.778 100 forloop 1188.104 1217.0950 1309.2549 1261.2205 1342.3600 2979.010 100 alistaire 451.830 473.7105 511.5778 496.1370 518.5645 827.443 100 alistaire_pipes 593.687 626.6900 718.6898 652.7645 700.5360 5460.970 100

我用他原来的功能为alistaire添加了一个简单的gsub for alistaire_pipes 。 可能有一种更有效的方法来做到这一点,但(a)简单/直接的做法是好的,(b)我认为你的桌子足够小,真正的表演不会成为推动力。

Since you want to put it into markdown, I think it's safe to say that the table size is manageable, so performance is not a factor. (Edit #3: I had some small bugs relating to presence of row names, so to simplify things I'm going to remove them completely from the sample data.)

mtcars$rowname <- rownames(mtcars) rownames(mtcars) <- NULL mtcars <- mtcars[,c(ncol(mtcars), 1:(ncol(mtcars)-1))] head(mtcars) # rowname mpg cyl disp hp drat wt qsec vs am gear carb # 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 # 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 # 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 # 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 # 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 # 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Now the work:

dashes <- paste(rep("---", ncol(mtcars)), collapse = "|") txt <- capture.output( write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE) ) txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1])) head(txt2) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|---|---|---|---|---|---|---|---|---|---|---|---|" # [3] "|Mazda RX4|21|6|160|110|3.9|2.62|16.46|0|1|4|4|" # [4] "|Mazda RX4 Wag|21|6|160|110|3.9|2.875|17.02|0|1|4|4|" # [5] "|Datsun 710|22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|" # [6] "|Hornet 4 Drive|21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|"

If you are concerned about alignment, you can check for characters (and perhaps others, over to you). This uses the alignment row of the markdown table format:

(ischar <- vapply(mtcars, is.character, logical(1))) # rowname mpg cyl disp hp drat wt qsec vs am gear carb # TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|") txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)) txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1])) head(txt2) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|" # [3] "|Mazda RX4|21|6|160|110|3.9|2.62|16.46|0|1|4|4|" # [4] "|Mazda RX4 Wag|21|6|160|110|3.9|2.875|17.02|0|1|4|4|" # [5] "|Datsun 710|22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|" # [6] "|Hornet 4 Drive|21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|"

And when you're finally ready to save, use cat(txt2, file = "sometable.md") (or writeLines).

Edit #1: note that the other suggested answers (including mine above) do not address pipe symbols within the content:

mtcars$mpg[1] <- "2|1.0" ischar <- vapply(mtcars, is.character, logical(1)) dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|") txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)) txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1])) head(txt2, n = 3) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|" # [3] "|Mazda RX4|2|1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|" ### ^ this is the problem

You can escape it manually on all character (or add in factors, too) columns:

ischar <- vapply(mtcars, is.character, logical(1)) mtcars[ischar] <- lapply(mtcars[ischar], function(x) gsub("\\|", "&#124;", x)) dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|") txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)) txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1])) head(txt2, n = 3) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|" # [3] "|Mazda RX4|2&#124;1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|" ### ^^^^^^ this is the pipe, interpreted correctly in markdown

This doesn't work well when the pipe is within a code block, though a workaround was suggested here: https://stackoverflow.com/a/17320389/3358272

At this point, as @alistaire suggested, you're somewhat reimplementing knitr::kable. For that matter, just grab knitr/R/table.R) and use kable_markdown which does the pipe-escaping for you. It takes a character matrix, not a data.frame, so kable_markdown(as.matrix(mtcars)). You can't just grab the single function as it uses several helper functions also in that file. You can certainly prune some functions, including kable itself which requires functions in other files.

Edit #2: since you said renjin doesn't support *apply functions (a comment suggests that is incorrect, but I'll continue for the sake of argument), here's a for-loop implementation that includes alignment and |-escaping:

mtcars$mpg[1] <- "2|1.0" # just a reminder that it's here dashes <- rep("--:", length(mtcars)) for (i in seq_along(mtcars)) { if (is.character(mtcars[[i]]) || is.factor(mtcars[[i]])) { mtcars[[i]] <- gsub("\\|", "&#124;", mtcars[[i]]) dashes[i] <- ":--" } } txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)) txt2 <- sprintf("|%s|", c(txt[1], paste(dashes, collapse = "|"), txt[-1])) head(txt2, n = 3) # [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" # [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|" # [3] "|Mazda RX4|2&#124;1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|"

For the record, my *apply and for-loop implementations are effectively the same performance, while @alistaire's solution is over twice as fast (with mtcars):

Unit: microseconds expr min lq mean median uq max neval apply_noalign 917.881 947.9665 1031.9288 971.3060 1041.5050 1999.499 100 apply_align 945.960 975.1350 1083.2856 995.7390 1063.7500 3523.101 100 apply_align_pipes 1110.429 1148.5360 1255.5460 1176.9815 1275.2600 1905.778 100 forloop 1188.104 1217.0950 1309.2549 1261.2205 1342.3600 2979.010 100 alistaire 451.830 473.7105 511.5778 496.1370 518.5645 827.443 100 alistaire_pipes 593.687 626.6900 718.6898 652.7645 700.5360 5460.970 100

I used his original function for alistaire and added a simple gsub for alistaire_pipes. There may be a more efficient way to do it, but (a) simple/straight-forward is good, and (b) I think your tables will be small enough where true performance will not be a driving force.

更多推荐

本文发布于:2023-08-05 14:41:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1434043.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:转换为   文件   内容   CSV   contents

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!