问题描述
限时送ChatGPT账号..我正在使用 spacy
进行简单的自然语言处理.我正在通过测量单词之间的相似性来过滤单词.
I am doing simple natural language processing using spacy
.
I'm working on filtering out words by measuring the similarity between words.
我编写并使用了 spacy 文档中显示的以下简单代码,但结果看起来不像 文档.
I wrote and used the following simple code shown in the spacy documentation, but the result does not look like a documentation.
import spacy
nlp = spacy.load('en_core_web_lg')
tokens = nlp('dog cat banana')
for token1 in tokens:
for token2 in tokens:
sim = token1.similarity(token2)
print("{:>6s}, {:>6s}: {}".format(token1.text, token2.text, sim))
代码结果如下.
dog, dog: 1.0
dog, cat: 2.307269867164827e-21
dog, banana: 0.0
cat, dog: 2.307269867164827e-21
cat, cat: 1.0
cat, banana: -0.04468117654323578
banana, dog: -7.828739256116838e+17
banana, cat: -8.242222286053048e+17
banana, banana: 1.0
特别是狗"和猫"之间的相似度应该在0.8左右,但并不是非常非常小的值.
Especially, similarity between "dog" and "cat" should be about 0.8, but it is not a nd very very small value.
此外,dog"和banana"之间的相似度为 0.0,但banana"和dog"之间的相似度为 -7.828739256116838e+17.
In addition, similarity between "dog" and "banana" is 0.0 but similarity between 'banana' and 'dog' is -7.828739256116838e+17.
我不知道如何解决它.
请帮帮我.
推荐答案
首先安装大型 EN 模型(或所有模型).
First install large EN model (or all models).
python3 -m spacy.en.download all
接下来,尝试按照文档使用示例代码,
Next, try with sample code as per documentation using,
nlp = spacy.load('en_core_web_md')
如果这不起作用,请不要尝试加载,
If that doesnt work, Instead of above try loading,
nlp = spacy.load('en')
执行上述更改后,结果与文档一致.
After doing above changes the result is as per documentation.
python3 /tmp/c.py
dog, dog: 1.000000078333395
dog, cat: 0.8016855098942641
dog, banana: 0.2432764518408807
cat, dog: 0.8016855098942641
cat, cat: 1.0000001375986456
cat, banana: 0.2815436412709355
banana, dog: 0.2432764518408807
banana, cat: 0.2815436412709355
banana, banana: 1.000000107068369
这篇关于空间相似度方法不能正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论