我有一个语法树,以“LISP风格”保存在文本文件中,带有显示关系的开括号和闭括号。 我想删除所有的叶子。 例如,我有“(Det the)”我想成为“Det”。 我不是正则表达式的专家,所以我想知道如何在一个更复杂的结构中使用嵌套括号来处理这种行为。 树的一个例子(在我的文件中是一行,缩进只是为了更简单的可视化):
(S (NP I) (VP (VP (V shot) (NP (Det an) (N elephant))) (PP (P in) (NP (Det my) (N pajamas)))))我会有类似的东西:
(S NP (VP (VP V (NP Det N)) (PP P (NP Det N))))I have a syntax tree, saved in a text file in a "LISP-style", with open and closed brackets that show relations. I want to delete all leaves. For example, I have " (Det the)" that I want to become " Det". I'm not expert of regex, so I wonder how I could handle this behaviour in a more complex structure, with nested brackets. An example of tree (in my file is in one row, is indented just for a simpler visualization):
(S (NP I) (VP (VP (V shot) (NP (Det an) (N elephant))) (PP (P in) (NP (Det my) (N pajamas)))))I would have something like:
(S NP (VP (VP V (NP Det N)) (PP P (NP Det N))))最满意答案
像这样的东西?
re.sub("\((\w*) (\w*)\)", r"\1", t)其中t是保存语法树的变量。
有关unicode支持,请参阅下面的注释。
Something like this?
re.sub("\((\w*) (\w*)\)", r"\1", t)where t is the variable holding your syntax tree.
For unicode support, see the comments below.
更多推荐
发布评论