假设由一个或多个空格分隔的字符串多行文件。进一步假设,串组可以用双引号括起来。
Assume a multi-line file with strings separated by one or more whitespaces. Assume further that groups of strings can be enclosed by double quotes.
> cat file foo bar "foobar baz qux" foo "bar foobar baz" qux "foo bar foobar" baz qux # multiple whitespaces in this line如果我想使用,以取代单一的制表符双引号外的所有空格的 AWK 的下面列出,我收到以下内容:
If I wish to replace all whitespaces outside the double quotes with single tab characters using awk as listed below, I receive the following:
awk '{OFS="\t"; FPAT="([^, ]+)|(\"[^\"]+\")"; $1=$1; print}' file # foo bar "foobar baz qux" # In this line, strings inside the quote are separated by tabs # foo "bar foobar baz" qux # "foo bar foobar" baz qux问题只似乎仅限于以双引号结束行。
The problem only seems to be restricted to the line that ends with a double quote.
的编辑1:的为了更好地可视化的问题在眼前:
EDIT 1: To better visualize the issue at hand:
awk '{OFS="\t"; FPAT="([^, ]+)|(\"[^\"]+\")"; $1=$1; print}' file | cat -A # foo^Ibar^I"foobar^Ibaz^Iqux"$ # foo^I"bar foobar baz"^Iqux$ # "foo bar foobar"^Ibaz^Iqux$的编辑2:的看来,这两个命令回答部分做工精细建议,除非非字母字符一定数量或组合在输入present。下面是一个例子:
EDIT 2: It appears that both commands suggested in the answer section work fine unless a certain number or combination of non-letter characters are present in the input. Here is an example:
> cat file foo_bar_baz foo foo_bar . Name=foo;product="bar baz qux" foo_bar_baz foo foo_bar . Name=foo;product="bar baz qux" foo_bar_baz foo foo_bar . Name=foo;product="bar baz qux" > awk -v FPAT='"[^"]*"|[^[:blank:]]+' -v OFS='\t' '{$1=$1} 1' file | cat -A foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$ foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$ foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$ > awk '{$1=$1}1' OFS='\t' FPAT='"[^"]+"|[^ ]+' file | cat -A foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$ foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$ foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$的编辑3:的这个问题提出的编辑2 的进一步这里讨论:Replacing空白单标签,除非在双引号 - 第二部分
EDIT 3: This question posed EDIT 2 is further discussed here: Replacing whitespace with single tab unless in double quotes - Part II
推荐答案使用的GNU AWK 你可以做到这一点很容易:
Using gnu-awk you can do this easily:
awk -v FPAT='"[^"]*"|[^[:blank:]]+' -v OFS='\t' '{$1=$1} 1' file foo bar "foobar baz qux" foo "bar foobar baz" qux "foo bar foobar" baz qux更多推荐
除非在双引号单卡更换空白
发布评论