我正在尝试从很小的文本(如sms)中提取专有名词,如名称和组织名称,这些文本是nltk提供的基本解析器能够获取名词,但问题是当我们得到专有名词 not 以大写字母开头时,对于像这样的文本,诸如sumit之类的名称不会被识别为专有名词
I am trying to extract proper nouns as in Names and Organization names from very small chunks of texts like sms, the basic parsers available with nltk Finding Proper Nouns using NLTK WordNet are being able to get the nouns but the problem is when we get proper nouns not starting with a capital letter , for texts like this the names like sumit do not get recognized as proper nouns
>>> sentence = "i spoke with sumit and rajesh and Samit about the gridlock situation last night @ around 8 pm last nite" >>> tagged_sent = pos_tag(sentence.split()) >>> print tagged_sent [('i', 'PRP'), ('spoke', 'VBP'), ('with', 'IN'), **('sumit', 'NN')**, ('and', 'CC'), ('rajesh', 'JJ'), ('and', 'CC'), **('Samit', 'NNP'),** ('about', 'IN'), ('the', 'DT'), ('gridlock', 'NN'), ('situation', 'NN'), ('last', 'JJ'), ('night', 'NN'), ('@', 'IN'), ('around', 'IN'), ('8', 'CD'), ('pm', 'NN'), ('last', 'JJ'), ('nite', 'NN')]推荐答案
您可能想看看 python-nameparser . 它还尝试猜测名称的大写.抱歉,答案不完整,但是我在使用python-nameparser方面经验不足.
You might want to have a look at python-nameparser. It tries to guess capitalization of names also. Sorry for the incomplete answer but I don't have much experience using python-nameparser.
祝你好运!
更多推荐
解析文本以获取专有名词(名称和组织)
发布评论