显著性检验（Significance Test）|电子爱好者

admin管理员组
文章数量:1568566

参考链接：Click Here

显著性检验（Significance Test）主要分为两个类别：

Statistical Significance Test (统计显著性检验)

计量方式：p-value < 0.05

目的：检验原始分布与目标分布之间是否具有显著差异性
Practical Significance Test (现实显著性检验)

计量方式：effect size（cohen's d）（统计效应）

目的：检验原始分布与目标分布之间的差异性有多大“

NLPStatTest: A Toolkit for Comparing NLP System Performance”中提出在NLP领域除了Statistical Significance，做Practical Significance也是有必要的

2.2.3 Effect Size Estimation

In most experimental NLP papers employing significance testing, the p-value is the only quantity reported. However, the p-value is often misused and misinterpreted. For instance, statistical significance is easily conflated with practical significance; as a result, NLP researchers often run significance tests to show that the performances of two NLP systems are different (i.e., statistical significance), without measuring the degree or the importance of such a difference (i.e., practical significance).

使用说明：

Statistical Significance Test (统计显著性检验)：

python Statistical_significance.py file1 file2 0.05

import sys
import numpy as np
from scipy import stats


### Normality Check
# H0: data is normally distributed
def normality_check(data_A, data_B, name, alpha):

    if(name=="Shapiro-Wilk"):
        # Shapiro-Wilk: Perform the Shapiro-Wilk test for normality.
        shapiro_results = stats.shapiro([a - b for a, b in zip(data_A, data_B)])
        return shapiro_results[1]

    elif(name=="Anderson-Darling"):
        # Anderson-Darling: Anderson-Darling test for data coming from a particular distribution
        anderson_results = stats.anderson([a - b for a, b in zip(data_A, data_B)], 'norm')
        sig_level = 2
        if(float(alpha) <= 0.01):
            sig_level = 4
        elif(float(alpha)>0.01 and float(alpha)<=0.025):
            sig_level = 3
        elif(float(alpha)>0.025 and float(alpha)<=0.05):
            sig_level = 2
        elif(float(alpha)>0.05 and float(alpha)<=0.1):
            sig_level = 1
        else:
            sig_level = 0

        return anderson_results[1][sig_level]

    else:
        # Kolmogorov-Smirnov: Perform the Kolmogorov-Smirnov test for goodness of fit.
        ks_results = stats.kstest([a - b for a, b in zip(data_A, data_B)], 'norm')
        return ks_results[1]

## McNemar test
def calculateContingency(data_A, data_B, n):
    ABrr = 0
    ABrw = 0
    ABwr = 0
    ABww = 0
    for i in range(0,n):
        if(data_A[i]==1 and data_B[i]==1):
            ABrr = ABrr+1
        if (data_A[i] == 1 and data_B[i] == 0):
            ABrw = ABrw + 1
        if (data_A[i] == 0 and data_B[i] == 1):
            ABwr = ABwr + 1
        else:
            ABww = ABww + 1
    return np.array([[ABrr, ABrw], [ABwr, ABww]])

def mcNemar(table):
    statistic = float(np.abs(table[0][1]-table[1][0]))**2/(table[1][0]+table[0][1])
    pval = 1-stats.chi2.cdf(statistic,1)
    return pval


#Permutation-randomization
#Repeat R times: randomly flip each m_i(A),m_i(B) between A and B with probability 0.5, calculate delta(A,B).
# let r be the number of times that delta(A,B)<orig_delta(A,B)
# significance level: (r+1)/(R+1)
# Assume that larger value (metric) is better
def rand_permutation(data_A, data_B, n, R):
    delta_orig = float(sum([ x - y for x, y in zip(data_A, data_B)]))/n
    r = 0
    for x in range(0, R):
        temp_A = data_A
        temp_B = data_B
        samples = [np.random.randint(1, 3) for i in xrange(n)] #which samples to swap without repetitions
        swap_ind = [i for i, val in enumerate(samples) if val == 1]
        for ind in swap_ind:
            temp_B[ind], temp_A[ind] = temp_A[ind], temp_B[ind]
        delta = float(sum([ x - y for x, y in zip(temp_A, temp_B)]))/n
        if(delta<=delta_orig):
            r = r+1
    pval = float(r+1.0)/(R+1.0)
    return pval


#Bootstrap
#Repeat R times: randomly create new samples from the data with repetitions, calculate delta(A,B).
# let r be the number of times that delta(A,B)<2*orig_delta(A,B). significance level: r/R
# This implementation follows the description in Berg-Kirkpatrick et al. (2012),
# "An Empirical Investigation of Statistical Significance in NLP".
def Bootstrap(data_A, data_B, n, R):
    delta_orig = float(sum([x - y for x, y in zip(data_A, data_B)])) / n
    r = 0
    for x in range(0, R):
        temp_A = []
        temp_B = []
        samples = np.random.randint(0,n,n) #which samples to add to the subsample with repetitions
        for samp in samples:
            temp_A.append(data_A[samp])
            temp_B.append(data_B[samp])
        delta = float(sum([x - y for x, y in zip(temp_A, temp_B)])) / n
        if (delta > 2*delta_orig):
            r = r + 1
    pval = float(r)/(R)
    return pval


def main():
    if len(sys.argv) < 3:
        print("You did not give enough arguments\n ")
        sys.exit(1)
    filename_A = sys.argv[1]
    filename_B = sys.argv[2]
    alpha = sys.argv[3]


    with open(filename_A) as f:
        data_A = f.read().splitlines()

    with open(filename_B) as f:
        data_B = f.read().splitlines()

    data_A = list(map(float,data_A))
    data_B = list(map(float,data_B))

    print("\nPossible statistical tests: Shapiro-Wilk, Anderson-Darling, Kolmogorov-Smirnov, t-test, Wilcoxon, McNemar, Permutation, Bootstrap")
    name = input("\nEnter name of statistical test: ")

    ### Normality Check
    if(name=="Shapiro-Wilk" or name=="Anderson-Darling" or name=="Kolmogorov-Smirnov"):
        output = normality_check(data_A, data_B, name, alpha)

        if(float(output)>float(alpha)):
            answer = input("\nThe normal test is significant, would you like to perform a t-test for checking significance of difference between results? (Y\\N) ")
            if(answer=='Y'):
                # two sided t-test
                t_results = stats.ttest_rel(data_A, data_B)
                # correct for one sided test
                pval = t_results[1]/2
                if(float(pval)<=float(alpha)):
                    print("\nTest result is significant with p-value: {}".format(pval))
                    return
                else:
                    print("\nTest result is not significant with p-value: {}".format(pval))
                    return
            else:
                answer2 = input("\nWould you like to perform a different test (permutation or bootstrap)? If so enter name of test, otherwise type 'N' ")
                if(answer2=='N'):
                    print("\nbye-bye")
                    return
                else:
                    name = answer2
        else:
            answer = input("\nThe normal test is not significant, would you like to perform a non-parametric test for checking significance of difference between results? (Y\\N) ")
            if (answer == 'Y'):
                answer2 = input("\nWhich test (Permutation or Bootstrap)? ")
                name = answer2
            else:
                print("\nbye-bye")
                return

    ### Statistical tests

    # Paired Student's t-test: Calculate the T-test on TWO RELATED samples of scores, a and b. for one sided test we multiply p-value by half
    if(name=="t-test"):
        t_results = stats.ttest_rel(data_A, data_B)
        # correct for one sided test
        pval = float(t_results[1]) / 2
        if (float(pval) <= float(alpha)):
            print("\nTest result is significant with p-value: {}".format(pval))
            return
        else:
            print("\nTest result is not significant with p-value: {}".format(pval))
            return

    # Wilcoxon: Calculate the Wilcoxon signed-rank test.
    if(name=="Wilcoxon"):
        wilcoxon_results = stats.wilcoxon(data_A, data_B)
        if (float(wilcoxon_results[1]) <= float(alpha)):
            print("\nTest result is significant with p-value: {}".format(wilcoxon_results[1]))
            return
        else:
            print("\nTest result is not significant with p-value: {}".format(wilcoxon_results[1]))
            return

    if(name=="McNemar"):
        print("\nThis test requires the results to be binary : A[1, 0, 0, 1, ...], B[1, 0, 1, 1, ...] for success or failure on the i-th example.")
        f_obs = calculateContingency(data_A, data_B, len(data_A))
        mcnemar_results = mcNemar(f_obs)
        if (float(mcnemar_results) <= float(alpha)):
            print("\nTest result is significant with p-value: {}".format(mcnemar_results))
            return
        else:
            print("\nTest result is not significant with p-value: {}".format(mcnemar_results))
            return

    if(name=="Permutation"):
        R = max(10000, int(len(data_A) * (1 / float(alpha))))
        pval = rand_permutation(data_A, data_B, len(data_A), R)
        if (float(pval) <= float(alpha)):
            print("\nTest result is significant with p-value: {}".format(pval))
            return
        else:
            print("\nTest result is not significant with p-value: {}".format(pval))
            return


    if(name=="Bootstrap"):
        R = max(10000, int(len(data_A) * (1 / float(alpha))))
        pval = Bootstrap(data_A, data_B, len(data_A), R)
        if (float(pval) <= float(alpha)):
            print("\nTest result is significant with p-value: {}".format(pval))
            return
        else:
            print("\nTest result is not significant with p-value: {}".format(pval))
            return

    else:
        print("\nInvalid name of statistical test")
        sys.exit(1)




if __name__ == "__main__":
    main()

Practical Significance Test (现实显著性检验):

python Practical_significance.py file1 file2

import sys
import numpy as np
from numpy import mean, std, sqrt


def read_data_from_file(file_name):
    with open(file_name, 'r', encoding='utf-8') as reader:
        data_file = []
        try:
            lines = reader.readlines()
            data_file = [float(line.strip()) for line in lines]
        except:
            print('Data format error, please check')
        if len(data_file) == 0:
            print('Empty file, exit')
            sys.exit(0)
    return data_file


def two_side_data_reader(file1_name, file2_name):
    data_file1 = read_data_from_file(file1_name)
    data_file2 = read_data_from_file(file2_name)

    return data_file1, data_file2


def cal_cohen_d(data1, data2):
    def cohen_d(x, y):
        return (mean(x) - mean(y)) / sqrt((std(x) ** 2 + std(y) ** 2) / 2.0)
    mean1 = np.mean(data1)
    mean2 = np.mean(data2)
    # print(type(mean1))
    std1 = np.std(data1)
    std2 = np.std(data2)

    cohen = cohen_d(data1, data2)

    print('Data1 [mean:%.4f, std:%.4f]' % (mean1, std1))
    print('Data2 [mean:%.4f, std:%.4f]' % (mean2, std2))
    print("cohen's d value = %.4f" % (cohen))
    
    return cohen


if __name__ == '__main__':
    file_1 = sys.argv[1]
    file_2 = sys.argv[2]

    data1, data2 = two_side_data_reader(file_1, file_2)

    res = cal_cohen_d(data1, data2)

本文标签： significance Test

版权声明：本文标题：显著性检验（Significance Test）内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/dongtai/1725723069a1038613.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

显著性检验（Significance Test）

更多相关文章

计算机test的应用,雨林木风memtest工具使用教程?

优课联盟 实境英语Test for unit 1

关于电脑屏幕中出现test mode水印问题

running test

AV-TEST杀毒软件能力测试（2018年1月-12月）杀毒软件排名

测试策略模板——Test Strategy（中英文）

R语言ggplot2可视化小提琴图（violin plot）并使用ggsignif添加分组显著性（significance）标签

Through the comprehensive test of the Miwu

Google Benchmark &amp; Google Test

谷歌AB Test在Web线上网站的实践

Coded UI Test在Windows Vista、2003和2008上需要Windows Automation API 3.0

Significance of Processing Equipment Automation

IEC 60870-5-104 Client Test Suite Data Sheet

计算机test的应用,memtest怎么用,教您如何使用MemTest检测内存

QCN9274 Achieves 8.9Gbps Speed in 6GHz Throughput Test: What Does It Actually Offer?

English improvement for IT Test(2010)

Significance of Coal slurry dryer for sand industry

Significance of E-commerce to the Crushing Industry

CMake test目录和项目同名错误

Hypothesis test-----T-test

发表评论

推荐文章

Android wifi密码 源码,Android 修改WiFi热点的默认SSID和密码

Centos7安装输入法

开源项目教程：Luci-app-xlnetacc 在 OpenWrtLEDE 上的应用

在Windows操作系统中下载、安装、运行Sentinel

计算机u盘中文名显示乱码,Win7系统电脑插入U盘后发现U盘文件名都是乱码怎么办...

热门文章

google源码下载方法

为什么两台电脑互连怎么一边能ping通在一边却ping不通

查看 WIFI 密码

手机百度输入法环境:android 1.6,百度手机输入法Android 5.1版—新增粤语语音输入...

计算机制图缺点,CAD与其它制图软件相比较的优缺点

全新版图片放大工具PhotoZoom 9中文特别版

【视频文稿】车载Android应用开发与分析 - AOSP的下载与编译

win10无限蓝屏_win10蓝屏died怎么办_win10无限process died解决方法

第一章 Dell 游匣 G15 5511 安装Ubuntu 20.04 系统

web期末作业设计网页：动漫网站设计——大鱼海棠(12页) HTML+CSS+JavaScript 学生DW网页设计作业成品 动漫网页设计作业 web网页设计与开发 html实训大作业

最新文章

怎么样才能在计算机里找到u盘,怎样让电脑在插入自己的U盘后才能启动

如何解决打开U盘时遇到提示：请将磁盘插入驱动器

计算机u盘中文名显示乱码,Win7系统电脑插入U盘后发现U盘文件名都是乱码怎么办...

xp系统使用u盘“提示请将磁盘插入驱动器”的操作流程--win10专业版

插入USB设备、虚拟机 VMware Workstation不提示？

优盘提示插入多卷集的最后一卷解决办法（5）

Ubuntu18.04关闭手机U盘插入后自动弹出

ppt提示内存或系统资源不足_Win10电脑插U盘提示系统资源不足无法完成请求服务怎么办？...

ubuntu系统识别不了U盘，报错Ubuntu插u盘报错无法显示这一位置怎么解决

插上USB设备虚拟机不弹提示框

u盘为什么一插上电脑就蓝屏,u盘一插电脑就蓝屏

ubuntu16.04下插入的U盘提示没有读写权限

插入USB设备时，vmware不提示连接到主机还是虚拟机

插入U盘后 计算机未响应,电脑插入U盘后没有反应怎么办？

关闭 VMware Workstation 虚拟机 USB 插入弹出提示窗

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

如何通过treenode实现二叉树

优课联盟实境英语Test for unit 1

Google Benchmark & Google Test

Android wifi密码源码,Android 修改WiFi热点的默认SSID和密码

web期末作业设计网页：动漫网站设计——大鱼海棠(12页) HTML+CSS+JavaScript 学生DW网页设计作业成品动漫网页设计作业 web网页设计与开发 html实训大作业

插入U盘后计算机未响应,电脑插入U盘后没有反应怎么办？

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载