使用NLTK WordNet查找专有名词

编程入门 行业动态 更新时间:2024-10-25 16:26:43
本文介绍了使用NLTK WordNet查找专有名词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

是否可以使用NLTK WordNet查找专有名词?即,我可以使用nltk Wordnet标记所有名词吗?

Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?

推荐答案

我认为您不需要WordNet来查找专有名词,我建议使用词性标记器pos_tag.

I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag.

要查找专有名词,请查找NNP标记:

To find Proper Nouns, look for the NNP tag:

from nltk.tag import pos_tag sentence = "Michael Jackson likes to eat at McDonalds" tagged_sent = pos_tag(sentence.split()) # [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')] propernouns = [word for word,pos in tagged_sent if pos == 'NNP'] # ['Michael','Jackson', 'McDonalds']

由于Michael和Jackson被分为2个标记,您可能会不太满意,那么您可能需要诸如名称实体标记器之类的更复杂的东西.

You may not be very satisfied since Michael and Jackson is split into 2 tokens, then you might need something more complex such as Name Entity tagger.

如penntreebank标签集所述,对于所有格名词,您只需查找POS标签 www.mozart-oz/mogul/doc/lager/brill-tagger/penn.html .但是,当标记器为NNP时,标记器通常不会标记POS.

By right, as documented by the penntreebank tagset, for possessive nouns, you can simply look for the POS tag, www.mozart-oz/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POS when it's an NNP.

要查找所有名词,请查找str.endswith('s")或str.endswith("s'"):

from nltk.tag import pos_tag sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries" tagged_sent = pos_tag(sentence.split()) # [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')] possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")] # ["Jackson's", "Agnes'"]

或者,您可以使用NLTK ne_chunk,但是除非您担心从句子中获得什么样的专有名词,否则它似乎没有其他作用:

Alternatively, you can use NLTK ne_chunk but it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:

>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk >>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)] [Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])] >>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))] ['Michael', 'Jackson', 'Daniel']

使用ne_chunk有点冗长,并且不能使您拥有所有格.

Using ne_chunk is a little verbose and it doesn't get you the possessives.

更多推荐

使用NLTK WordNet查找专有名词

本文发布于:2023-10-23 01:53:24,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1519389.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:专有名词   NLTK   WordNet

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!