使用NLTK WordNet查找专有名词

编程入门行业动态更新时间:2024-10-25 16:26:43

本文介绍了使用NLTK WordNet查找专有名词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

是否可以使用NLTK WordNet查找专有名词?即，我可以使用nltk Wordnet标记所有名词吗?

Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?

推荐答案

我认为您不需要WordNet来查找专有名词，我建议使用词性标记器pos_tag.

I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag.

要查找专有名词，请查找NNP标记:

To find Proper Nouns, look for the NNP tag:

from nltk.tag import pos_tag sentence = "Michael Jackson likes to eat at McDonalds" tagged_sent = pos_tag(sentence.split()) # [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')] propernouns = [word for word,pos in tagged_sent if pos == 'NNP'] # ['Michael','Jackson', 'McDonalds']

由于Michael和Jackson被分为2个标记，您可能会不太满意，那么您可能需要诸如名称实体标记器之类的更复杂的东西.

You may not be very satisfied since Michael and Jackson is split into 2 tokens, then you might need something more complex such as Name Entity tagger.

如penntreebank标签集所述，对于所有格名词，您只需查找POS标签 www.mozart-oz/mogul/doc/lager/brill-tagger/penn.html .但是，当标记器为NNP时，标记器通常不会标记POS.

By right, as documented by the penntreebank tagset, for possessive nouns, you can simply look for the POS tag, www.mozart-oz/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POS when it's an NNP.

要查找所有名词，请查找str.endswith('s")或str.endswith("s'"):

from nltk.tag import pos_tag sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries" tagged_sent = pos_tag(sentence.split()) # [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')] possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")] # ["Jackson's", "Agnes'"]

或者，您可以使用NLTK ne_chunk，但是除非您担心从句子中获得什么样的专有名词，否则它似乎没有其他作用:

Alternatively, you can use NLTK ne_chunk but it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:

>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk >>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)] [Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])] >>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))] ['Michael', 'Jackson', 'Daniel']

使用ne_chunk有点冗长，并且不能使您拥有所有格.

Using ne_chunk is a little verbose and it doesn't get you the possessives.

更多推荐

使用NLTK WordNet查找专有名词

本文发布于:2023-10-23 01:53:24，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1519389.html