我一直在尝试使用 CMU的TurboParser 生成的依赖关系解析树.它完美地工作.但是,问题在于文档很少.我需要精确地了解其解析器的输出.例如,句子"我已解决了统计问题."将生成以下输出:
I have been trying to use the dependency parse trees generated by CMU's TurboParser. It works flawlessly. The problem, however, is that there is very little documentation. I need to precisely understand the output of their parser. For example, the sentence "I solved the problem with statistics." generates the following output:
1 I _ PRP PRP _ 2 SUB 2 solved _ VBD VBD _ 0 ROOT 3 the _ DT DT _ 4 NMOD 4 problem _ NN NN _ 2 OBJ 5 with _ IN IN _ 2 VMOD 6 statistics _ NNS NNS _ 5 PMOD 7 . _ . . _ 2 P我还没有找到任何文档可以帮助您理解各个列的含义,以及如何创建倒数第二列(2,0,4,2,...)中的索引.另外,我也不知道为什么有两列专门用于词性标签.任何帮助(或指向外部文档的链接)都会有很大帮助.
I haven't found any documentation that can help understand what the various columns stand for, and how the indices in the second-last column (2, 0, 4, 2, ... ) are created. Also, I have no idea why there are two columns devoted to part-of-speech tags. Any help (or link to external documentation) will be of great help.
P.S.如果您想试用他们的解析器,请这是他们的在线演示.
P.S. If you want to try out their parser, here is their online demo.
P.P.S.请不要建议使用Stanford的依赖项解析输出.我对线性编程算法很感兴趣,这不是斯坦福大学的NLP系统所做的.
P.P.S. Please do not suggest using Stanford's dependency parse output. I am interested in linear programming algorithms, which is not what Stanford's NLP system does.
推荐答案我不知道TurboParser,但我的猜测是第一个数字表示令牌的ID,第二个数字表示其调控器的ID. 也就是说,以您的示例为例:
I don't know TurboParser, but my guess is that the first number indicates the id of the token and that the second number indicates the id of its governor. That is, for your example:
solved( I, problem(the), with(statistics), . )实际上,这是CoNLL-X格式.您可以在此处获取更多信息: ilk.uvt.nl/conll/#dataformat
Actually, that's CoNLL-X format. You can get more information here: ilk.uvt.nl/conll/#dataformat
更多推荐
TurboParser的依存关系分析输出是什么意思?
发布评论