激发NLTK词性标记器来报告复数专有名词

编程入门 行业动态 更新时间:2024-10-25 14:30:33
本文介绍了激发NLTK词性标记器来报告复数专有名词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

让我们尝试在nltk包中试用Python的语音部分标记器.

Let's try out Python's renouned part-of-speech tagger in the nltk package.

import nltk # You might also need to run nltk.download('maxent_treebank_pos_tagger') # even after installing nltk string = 'Buddy Billy went to the moon and came Back with several Vikings.' nltk.pos_tag(nltk.word_tokenize(string))

这给了我

[('Buddy','NNP'),('Billy','NNP'),('went','VBD'),('to','TO'), ('the','DT'),('moon','NN'),('and','CC'),('came','VBD'), ('Back','NNP'),('with','IN'),('几个','JJ'),('Vikings', 'NNS'),('.','.')]

[('Buddy', 'NNP'), ('Billy', 'NNP'), ('went', 'VBD'), ('to', 'TO'), ('the', 'DT'), ('moon', 'NN'), ('and', 'CC'), ('came', 'VBD'), ('Back', 'NNP'), ('with', 'IN'), ('several', 'JJ'), ('Vikings', 'NNS'), ('.', '.')]

您可以在此处解释代码.我对"Back"被归类为专有名词(NNP)感到有些失望,尽管这种混淆是可以理解的.我更沮丧的是,维京人"被称为简单复数名词(NNS)而不是复数专有名词(NNPS).任何人都可以拿出一个简短输入的示例来产生至少一个NNPS标签吗?

You can interpret the codes here. I'm slightly disappointed that 'Back' got categorized as a proper noun (NNP), although the confusion is understandable. I'm more upset that 'Vikings' got called a simple plural noun (NNS) instead of a plural proper noun (NNPS). Can anyone come up with a single example of a brief input that leads to at least one NNPS tag?

推荐答案

NLTK棕色语料库中的标记似乎有些问题,这些标记将NNPS标记为NPS(可能是NLTK标记集是更新/过时的标记与 www.ling.upenn.edu/courses不同/Fall_2003/ling001/penn_treebank_pos.html )

There seems to be some problems with the tags in NLTK brown corpus that tags NNPS as NPS (Possibly the NLTK tagset is an updated/outdated tags that is different from www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)

这是plural proper nouns的示例:

>>> from nltk.corpus import brown >>> for sent in brown.tagged_sents(): ... if any(pos for word, pos in sent if pos == 'NPS'): ... print sent ... break ... [(u'Georgia', u'NP'), (u'Republicans', u'NPS'), (u'are', u'BER'), (u'getting', u'VBG'), (u'strong', u'JJ'), (u'encouragement', u'NN'), (u'to', u'TO'), (u'enter', u'VB'), (u'a', u'AT'), (u'candidate', u'NN'), (u'in', u'IN'), (u'the', u'AT'), (u'1962', u'CD'), (u"governor's", u'NN$'), (u'race', u'NN'), (u',', u','), (u'a', u'AT'), (u'top', u'JJS'), (u'official', u'NN'), (u'said', u'VBD'), (u'Wednesday', u'NR'), (u'.', u'.')]

但是,如果您使用nltk.pos_tag进行标记,则会得到NNPS:

But if you tag with nltk.pos_tag, you'll get NNPS:

>>> for sent in brown.tagged_sents(): ... if any(pos for word, pos in sent if pos == 'NPS'): ... print " ".join([word for word, pos in sent]) ... break ... Georgia Republicans are getting strong encouragement to enter a candidate in the 1962 governor's race , a top official said Wednesday . >>> from nltk import pos_tag >>> pos_tag("Georgia Republicans are getting strong encouragement to enter a candidate in the 1962 governor's race , a top official said Wednesday .".split()) [('Georgia', 'NNP'), ('Republicans', 'NNPS'), ('are', 'VBP'), ('getting', 'VBG'), ('strong', 'JJ'), ('encouragement', 'NN'), ('to', 'TO'), ('enter', 'VB'), ('a', 'DT'), ('candidate', 'NN'), ('in', 'IN'), ('the', 'DT'), ('1962', 'CD'), ("governor's", 'NNS'), ('race', 'NN'), (',', ','), ('a', 'DT'), ('top', 'JJ'), ('official', 'NN'), ('said', 'VBD'), ('Wednesday', 'NNP'), ('.', '.')]

更多推荐

激发NLTK词性标记器来报告复数专有名词

本文发布于:2023-10-23 01:53:42,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1519390.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:词性   专有名词   复数   标记   报告

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!