admin管理员组文章数量:1568741
Table of contents
- 1 Proportion
- 2 Mean
- 3 Welch's T-test
1 Proportion
Experiment: test color color of a button
- Click through probability: N(users who clicked) / N(total users)
- 1000 users in both control and treatment groups
Results:
- Control group: 1.1% CTP
- Treatment group: 2.3% CTP
Significance:
- Practical significant boundary: 0.01
- Significance level α \alpha α : 0.05
Make a decision:
- Significant difference? Launch the “feature”?
Questions
1. Which hypothesis test to use?
2. What is the null hypothesis?
3. Is the result statistically significant?
4. Is the result practically significant?
- Bernoulli population: either clicks or doesn’t click
- Control group: n*p = 1000 * 1.1% = 11
- Treatment group: n * p = 1000 * 2.3% = 23
- Both np and n(1-p) are larger than 10, so we can consider it as large samples. Test statistic follows Z-distribution.
T-Test Z-Test 的区别?
- https://zhuanlan.zhihu/p/120181558
Measurements
- Users clicked X c t X_{ct} Xct, X t r X_{tr} Xtr
- Total number of users n c t n_{ct} nct, n t r n_{tr} ntr
P
c
t
P_{ct}
Pct =
X
c
t
X_{ct}
Xct /
n
c
t
n_{ct}
nct = 11 / 1000
P
t
r
P_{tr}
Ptr =
X
t
r
X_{tr}
Xtr /
n
t
r
n_{tr}
ntr = 23 / 1000
What is the null hypothesis?
We want to measure the difference of P t r P_{tr} Ptr and P c t P_{ct} Pct .
d = P t r P_{tr} Ptr - P c t P_{ct} Pct
Null hypothesis:
H
0
H_{0}
H0:
P
t
r
P_{tr}
Ptr =
P
c
t
P_{ct}
Pct , d = 0
d ~ N(0,
S
E
2
SE^{2}
SE2)
We don’t know the standard deviation of d, so we need to estimate it.
Test statistic:
TS = ( P t r (P_{tr} (Ptr - P c t ) / S E P_{ct}) / SE Pct)/SE
Estimate a standard error:
- Choose a SE can represent both groups
- “Pooled” standard error
Compute “pooled” SE
- “Pooled” probability of a click, p’
- Total probability across 2 groups:
P
′
P'
P′ =
(
X
c
t
+
X
t
r
)
/
(
n
c
t
+
n
t
r
)
(X_{ct} + X_{tr}) / (n_{ct} + n_{tr})
(Xct+Xtr)/(nct+ntr) = (11+23) / (1000+1000) = 0.017
Test statistics
TS = ( P t r (P_{tr} (Ptr - P c t ) / S E P_{ct}) / SE Pct)/SE = 0.012 / 0.00578 = 2.076
Is result statistically significant?
- critical z-score ( α \alpha α: 0.05) = 1.96
- TS > 1.96 or TS < -1.96, reject null hypothesis
- In this example, Test is statistically significant.
Is result practically significant?
-
Confidence interval of d
-
Center of C.I. = 0.012 (This is P t r P_{tr} Ptr - P c t P_{ct} Pct )
-
Width of C.I. (margin of error)
m = Z * S p o o l S_{pool} Spool = 1.96 * 0.00578 = 0.0113
CI of d: 0.012 ± 0.0113 = 0.0007 ~ 0.0233
Best guess: There is a practical significant change.
It’s possible the change is not practical significant.
Make launch decision:
- Not confident the change is practically significant.
- Not recommend launch the feature.
Checking statistical significance:
- Check if CI overlaps with 0: If it does, result is not statistically significant.
- Equivalent to comparing TS with critical value.
2 Mean
Experiment: if a new feature changes avg. number of posts
Correction: Mean of treatment is 1.7
What conclusion can you draw?
- Assume variances are similar.
Significance:
- Practical significant boundary: 0.05
- Significance level α \alpha α : 0.05
Correction: Spool = 1.06
SS: Sum of square
Margin of error would be t-score*Spool (1/(1/nc +1/nt)^1/2), which would come to be ~0.51 which +/- from d-hat (0.6) would be above the significance level of 0.05
3 Welch’s T-test
Reference: https://www.youtube/watch?v=6uw0A3aKwMc
本文标签: Hypothesistesting
版权声明:本文标题:Hypothesis testing 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/dongtai/1725721894a1038477.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论