是否可以使用NLTK WordNet查找专有名词?即,我可以使用nltk Wordnet标记所有名词吗?
Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?
推荐答案我认为您不需要WordNet来查找专有名词,我建议使用词性标记器pos_tag.
I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag.
要查找专有名词,请查找NNP标记:
To find Proper Nouns, look for the NNP tag:
from nltk.tag import pos_tag sentence = "Michael Jackson likes to eat at McDonalds" tagged_sent = pos_tag(sentence.split()) # [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')] propernouns = [word for word,pos in tagged_sent if pos == 'NNP'] # ['Michael','Jackson', 'McDonalds']由于Michael和Jackson被分为2个标记,您可能会不太满意,那么您可能需要诸如名称实体标记器之类的更复杂的东西.
You may not be very satisfied since Michael and Jackson is split into 2 tokens, then you might need something more complex such as Name Entity tagger.
如penntreebank标签集所述,对于所有格名词,您只需查找POS标签 www.mozart-oz/mogul/doc/lager/brill-tagger/penn.html .但是,当标记器为NNP时,标记器通常不会标记POS.
By right, as documented by the penntreebank tagset, for possessive nouns, you can simply look for the POS tag, www.mozart-oz/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POS when it's an NNP.
要查找所有名词,请查找str.endswith('s")或str.endswith("s'"):
from nltk.tag import pos_tag sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries" tagged_sent = pos_tag(sentence.split()) # [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')] possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")] # ["Jackson's", "Agnes'"]或者,您可以使用NLTK ne_chunk,但是除非您担心从句子中获得什么样的专有名词,否则它似乎没有其他作用:
Alternatively, you can use NLTK ne_chunk but it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:
>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk >>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)] [Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])] >>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))] ['Michael', 'Jackson', 'Daniel']使用ne_chunk有点冗长,并且不能使您拥有所有格.
Using ne_chunk is a little verbose and it doesn't get you the possessives.
更多推荐
使用NLTK WordNet查找专有名词
发布评论