BLEU分数(bilingual evaluation understudy(双语替换测评)) —衡量机器翻译质量(BLEUScore)(pycocoevalcap)

编程入门行业动态更新时间:2024-10-11 21:29:35

BLEU分数(bilingual evaluation understudy(双语替换测评)) —衡量<a href=https://www.elefans.com/category/jswz/34/1751022.html style= 机器翻译质量(BLEUScore)(pycocoevalcap)"/>

BLEU分数(bilingual evaluation understudy(双语替换测评)) —衡量机器翻译质量(BLEUScore)(pycocoevalcap)

BLEU的全名为：bilingual evaluation understudy，即：双语互译质量评估辅助工具(双语替换测评)。它是用来评估机器翻译质量的工具。BLEU的设计思想：机器翻译结果越接近专业人工翻译的结果，则越好。BLEU算法实际上就是在判断两个句子的相似程度。想知道一个句子翻译前后的表示是否意思一致，直接的办法是拿这个句子的标准人工翻译与机器翻译的结果作比较，如果它们是很相似的，说明我的翻译很成功。

BLEU的具体计算方法基于N-gram匹配和几何平均值，具体过程如下：

对于每个句子，将机器翻译结果中的每个N-gram与参考翻译结果中的所有N-gram进行比较，计算匹配的N-gram数量。

计算每个N-gram的权重，例如，BLEU-4会对四元组的匹配结果赋予更高的权重。

计算BLEU得分的几何平均值，即将所有权重的N-gram匹配数量相乘后开N次方。

BLEU分数的范围从0到1，表示机器翻译结果与参考翻译结果的相似度。通常情况下，BLEU分数越高，表示机器翻译结果与参考翻译结果越接近。

需要注意的是，BLEU分数只是一种机器翻译评估指标，不能完全代表机器翻译质量，因为它无法捕捉翻译结果的语法、流畅度、连贯性等方面的问题。因此，BLEU分数应该与其他评估指标一起使用来评估机器翻译的质量。

这是torchmetrics 0.6.2版本

0.8.2就不行了
from torchmetrics import BLEUScore
translate_corpus = ['the cat is on the mat'.split()]
#translate_corpus是list
#list中的每个元素是一个句子#有多个可供选择
reference_corpus = [['there is a cat on the mat'.split(), 'a cat is on the mat'.split()]]
#reference_corpuss是list
#list中的每个元素是一个句子bleu_1 = BLEUScore(n_gram=1)  #默认n_gram=4
bleu_2 = BLEUScore(n_gram=2) 
bleu_3 = BLEUScore(n_gram=3) 
bleu_4 = BLEUScore(n_gram=4) 
print(bleu_1(reference_corpus, translate_corpus))
print(bleu_2(reference_corpus, translate_corpus))
print(bleu_3(reference_corpus, translate_corpus))
print(bleu_4(reference_corpus, translate_corpus))
而如果是一样的话
from torchmetrics import BLEUScore
translate_corpus = ['the cat is on the mat'.split()]
#translate_corpus是list
#list中的每个元素是一个句子#有多个可供选择
# reference_corpus = [['there is a cat on the mat'.split(), 'a cat is on the mat'.split()]]
reference_corpus = [['the cat is on the mat'.split()]]
#reference_corpuss是list
#list中的每个元素是一个句子bleu_1 = BLEUScore(n_gram=1)  #默认n_gram=4
bleu_2 = BLEUScore(n_gram=2) 
bleu_3 = BLEUScore(n_gram=3) 
bleu_4 = BLEUScore(n_gram=4) 
print(bleu_1(reference_corpus, translate_corpus))
print(bleu_2(reference_corpus, translate_corpus))
print(bleu_3(reference_corpus, translate_corpus))
print(bleu_4(reference_corpus, translate_corpus))
就全是1

如果是多条文本一起算，那就是
from torchmetrics import BLEUScore
translate_corpus = ['the cat is on the mat'.split(),'the cat is on the mat'.split()]
#translate_corpus是list
#list中的每个元素是一个句子#有多个可供选择
# reference_corpus = [['there is a cat on the mat'.split(), 'a cat is on the mat'.split()]]
reference_corpus = [['the cat is on the mat'.split()],['the cat is on the mat'.split()]]
#reference_corpuss是list
bleu_1 = BLEUScore(n_gram=1)  #默认n_gram=4
bleu_2 = BLEUScore(n_gram=2) 
bleu_3 = BLEUScore(n_gram=3) 
bleu_4 = BLEUScore(n_gram=4) 
print(bleu_1(reference_corpus, translate_corpus))
print(bleu_2(reference_corpus, translate_corpus))
print(bleu_3(reference_corpus, translate_corpus))
print(bleu_4(reference_corpus, translate_corpus))

用pycocoevalcap.bleu

from pycocoevalcap.bleu.bleu import Bleu# 加载参考文本
references = {"image1": ["there is a cat on the mat.", "a cat is on the mat."],# "image2": ["A dog is running in the park.", "Children are playing in the playground."]
}# 加载生成文本
hypotheses = {"image1": ["the cat is on the mat."],# "image2": ["A dog is playing in the field."]
}# 创建Bleu对象
bleu_scorer = Bleu(n=4)# 计算BLEU分数
bleu_scores, _ = bleu_scorerpute_score(references, hypotheses)
# 打印BLEU分数
print("BLEU scores:", bleu_scores)

import nltk
from nltk.translate.bleu_score import sentence_bleu# 参考翻译结果
ref = [['this', 'is', 'a', 'test']]
# 机器翻译结果
mt = ['this', 'is', 'a', 'test', 'too']# 计算BLEU分数
score = sentence_bleu(ref, mt)print(score)

浅谈用Python计算文本BLEU分数 - 云+社区 - 腾讯云

更多推荐

BLEU分数(bilingual evaluation understudy(双语替换测评)) —衡量机器翻译质量(BLEUScore)(pycocoevalc

本文发布于:2024-03-08 20:41:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1722265.html