我有一个具有三种模式的数据集:
I have a data set that has three patterns:
第一:
abrasion abrade:stem<>ion:suffix abstainer abstain:stem<>er:suffix abstention abstain:stem<>ion:suffix第二:
inaccurate in:prefix<>accurate:stem inactive in:prefix<>active:stem第三:
incommunicable in:prefix<>communicate:stem<>able:suffix incompatibility in:prefix<>compatible:stem<>ity:suffix我需要将以上形式转换为以下形式:匹配宾夕法尼亚州树银行的方括号( languagelog.ldc.upenn.edu/myl/PennTreebank1995.pdf )
I need to convert the above to following form : Matching the brackets in the way for Penn Tree Bank (languagelog.ldc.upenn.edu/myl/PennTreebank1995.pdf)
第一:
abrasion ((abrade:stem) ion:suffix) abstainer ((abstain:stem)er:suffix) abstention ((abstain:stem)ion:suffix)第二:
inaccurate (in:prefix(accurate:stem)) inactive (in:prefix(active:stem))第三:
incommunicable (in:prefix ((communicate:stem)able:suffix)) incompatibility (in:prefix ((compatible:stem)ity:suffix))我正在工作的代码正在使用awk
The code, I am working is using awk
{ n = gsub(/<>/,")",$2) s = sprintf("%*s",n,"") gsub(/ /,"(",s) print "(" $1, s "((" $2 "))" }编辑
更复杂的表格
nationalistic national: stem <>ism:suffix<>ist:suffix<>ic:suffix收件人:
nationalistic ((((national: stem) ism:suffix)ist:suffix)ic:suffix)没有产生示例中提到的预期输出.
It is not producing the expected outputs that mentioned in the examples.
推荐答案这应该足够通用,因为它考虑了:stem,:prefix和:suffix进行匹配:
This should be general enough as it takes into account :stem, :prefix, and :suffix for matching:
awk 'BEGIN{FS=OFS="\n"}{ a=gensub(/([a-zA-Z]*):stem/,"(\\1:stem)", "g"); b=gensub(/(\([a-zA-Z]*:stem\))<>([a-zA-Z]*):suffix/,"(\\1\\2:suffix)", "g", a); c=gensub(/([a-zA-Z]*:prefix)<>(.*)/,"(\\1\\2)", "g", b); print c;}' testfile此处演示: ideone/U3ux91
编辑
这应该照顾多个后缀和前缀:
This should take care of multiple suffixes and prefixes:
awk 'BEGIN{FS=OFS="\n"}{ a=gensub(/([a-zA-Z]*):stem/,"(\\1:stem)", "g"); while ( a ~ /stem)<>.*:suffix/) { a=gensub(/(\([a-zA-Z]*:stem\).*?)<>([a-zA-Z]*):suffix/,"(\\1\\2:suffix)", "g", a); } while ( a ~ /<>/) { a=gensub(/([a-zA-Z]*?:prefix)<>(.*)/,"(\\1\\2)", "g", a); } print a;}' test此处演示: ideone/U7LYXi (很抱歉,如果不是反民族主义,而是为了测试……)
Demo here: ideone/U7LYXi (sorry if antinationalistic is not a word, but for testing sake....)
更多推荐
创建匹配的括号
发布评论