Machine Learning A Probabilistic Perspective第二章学习笔记|电子爱好者

admin管理员组
文章数量:1573064

Machine Learning A Probabilistic Perspective学习笔记or机器学习学习笔记

- 闲扯
2 Probability
- 2.2 A brief review of probability theory
- - 2.2.4 Independence and conditional independence
- 2.3 Some common discrete distributions一些离散分布
- - 2.3.1 The binomial and Bernoulli distributions（二项分布和伯努利分布）
  - 2.3.2 The multinomial and multinoulli distributions（多项分布和multinoulli分布）
- 2.4 Some common continuous distributions
- - 2.4.1 Gaussian (normal) distribution
  - 2.4.2 Degenerate pdf
- 2.5 Joint probability distributions
- - 2.5.1 Covariance and correlation
  - 2.5.2 The multivariate Gaussian or multivariate normal (MVN)
- 2.6 Transformations of random variables
- - 2.6.1 linear transformation
  - 2.6.2 general transformation
  - 2.6.3 central limit theorem 中心极限定理
- 2.7 Monte Carlo approximation 蒙特卡洛近似
- - 2.7.2 Example: estimating π by Monte Carlo integration
  - 2.7.3 Accuracy of Monte Carlo approximation
- 2.8 Information theory
- - 2.8.1 Entropy
  - 2.8.2 KL divergence or relative entropy相对熵
  - 2.8.3 mutual information 互信息

闲扯

1.为什么学这本书？
之前学习了很多机器学习的东西，看了很多书（机器学习，周志华；统计学习方法，李航；Introduction to machine learning，阿培丁）。十月份粗略的看完了第三本书，感觉到机器学习和统计有着莫大的关系，因此觉得学习这本书可以更好地巩固自己的基础。
2.为什么写博客？
博客每日一更可以督促自己学习，不然就在看直播睡觉中虚度大好时光了
3.我的计划
写不熟悉的
写重要的
把几本书搞在一起琢磨

2 Probability

2.2 A brief review of probability theory

2.2.4 Independence and conditional independence

无条件独立或边缘独立(unconditionally independent or marginally independent)
p ( x , y ) = p ( x ) p ( y ) p(x,y)=p(x)p(y) p(x,y)=p(x)p(y)，用下面这个图理解很棒

条件独立怎么来的?
“Unfortunately, unconditional independence is rare, because most variables can influence most other variables. However, usually this influence is mediated via other variables rather than being direct.”
给定 z z z, x x x和 y y y是条件独立（conditionally independent，CI）的，当且仅当 p ( x , y ∣ z ) = p ( x ∣ z ) p ( y ∣ z ) p(x,y|z)=p(x|z)p(y|z) p(x,y∣z)=p(x∣z)p(y∣z)

"Theorem 2.2.1. X ⊥ Y ∣ Z X ⊥ Y|Z X⊥Y∣Z iff there exist function g g g and h h h such that p ( x , y ∣ z ) = g ( x , z ) h ( y , z ) p(x, y|z) = g(x, z)h(y, z) p(x,y∣z)=g(x,z)h(y,z), for all x , y , z x, y, z x,y,z such that p ( z ) p(z) p(z)>0. "
我是这么理解的， g ( x , z ) h ( y , z ) = g ( x ∣ z ) h ( z ) h ( y ∣ z ) g ( z ) g(x, z)h(y, z)=g(x|z)h(z)h(y|z)g(z) g(x,z)h(y,z)=g(x∣z)h(z)h(y∣z)g(z)，这样就和 p ( x , y ∣ z ) = p ( x ∣ z ) p ( y ∣ z ) p(x,y|z)=p(x|z)p(y|z) p(x,y∣z)=p(x∣z)p(y∣z)一样了。

2.3 Some common discrete distributions一些离散分布

常见的有二项分布，伯努利分布，多项分布，multinoulli分布，泊松分布（The Poisson distribution），经验分布（The empirical distribution），这里只说前两种

2.3.1 The binomial and Bernoulli distributions（二项分布和伯努利分布）

假设进行 n n n次投硬币试验， X X X ∈ \in ∈{ 0 , … , n 0,\dots,n 0,…,n}是正面的个数，假设正面的概率为 θ \theta θ，那么
X ∼ B i n ( n , θ ) X ∼ Bin(n, θ) X∼Bin(n,θ), X X X 服从二项分布
B i n ( k ∣ n , θ ) = ( n k ) θ k ( 1 − θ ) n − k Bin(k|n, θ)=\binom{n}{k}θ^k(1 − θ)^{n−k} Bin(k∣n,θ)=(kn)θk(1−θ)n−k
mean = θ θ θ, var = n θ ( 1 − θ ) nθ(1 − θ) nθ(1−θ)

特殊情况， n = 1 n=1 n=1时为伯努利分布，
B e r ( x ∣ θ ) = θ I ( x = 1 ) ( 1 − θ ) I ( x = 0 ) Ber(x|θ) = θ^{I(x=1)}(1 − θ)^{I(x=0)} Ber(x∣θ)=θI(x=1)(1−θ)I(x=0)
I ( x = i ) I(x=i) I(x=i)为示性函数，mean = θ θ θ, var = θ ( 1 − θ ) θ(1 − θ) θ(1−θ)

2.3.2 The multinomial and multinoulli distributions（多项分布和multinoulli分布）

令 x = ( x 1 , … , x K ) x=(x_1,\dots,x_K) x=(x1,…,xK)为随机变量， K K K为总的情况个数， x i x_i xi为第 i i i种情况出现的个数，那么概率质量函数（probability mass function）为：
M u ( x ∣ n , θ ) = ( n x 1 . . . x K ) ∏ i = 1 K θ i x i Mu(x|n, θ)=\binom{n}{x_1 . . . x_K}\prod_{i=1}^K\theta_i^{x_i} Mu(x∣n,θ)=(x1...xKn)∏i=1Kθixi， θ i \theta_i θi为第 i i i种情况出现的概率, n = ∑ k = 1 K x k n=\sum_{k=1}^Kx_k n=∑k=1Kxk

( n x 1 . . . x K ) = n ! x 1 ! x 2 ! ⋯ x K ! \binom{n}{x_1 . . . x_K}=\frac{n!}{x_1!x_2!\cdots x_K!} (x1...xKn)=x1!x2!⋯xK!n!

特殊情况， n = 1 n=1 n=1时为multinoulli分布
x=[I(x = 1), . . . , I(x = K)]， M u ( x ∣ 1 , θ ) = ∏ i = 1 K θ i I ( x i = 1 ) Mu(x|1, θ)=\prod_{i=1}^K\theta_i^{I(x_i=1)} Mu(x∣1,θ)=∏i=1KθiI(xi=1)

总结：伯努利分布可以看看成二项分布和multinoulli分布的特例

小知识点：
PDF概率密度函数（probability density function）对连续随机变量
PMF概率质量函数（probability mass function）对离散随机变量
CDF累积分布函数 (cumulative distribution function)对前两者的积分或求和

2.4 Some common continuous distributions

常见的有Gaussian (normal) distribution, Degenerate pdf, The Laplace distribution, The gamma distribution, The beta distribution, Pareto distribution.

2.4.1 Gaussian (normal) distribution

高斯分布的精度常用参数 λ = 1 σ 2 \lambda=\frac{1}{\sigma^2} λ=σ21表示, λ \lambda λ越大说明越集中在 μ \mu μ附近
通常用误差函数来计算CDF, Φ ( x ; μ , σ ) = 1 2 [ 1 + \Phi(x;\mu,\sigma)=\frac{1}{2}[1+ Φ(x;μ,σ)=21[1+erf ( z 2 ) ] (\frac{z}{\sqrt2})] (2 z)]
其中， z = ( x − μ ) / σ z = (x − μ)/σ z=(x−μ)/σ,

2.4.2 Degenerate pdf

冲激函数：

我们有

留张图，说明高斯分布对边缘值敏感

2.5 Joint probability distributions

2.5.1 Covariance and correlation

协方差矩阵

相关矩阵

范围在[-1,1]

相关矩阵对角线全为1
独立意味着不相关，不相关并不意味着独立

2.5.2 The multivariate Gaussian or multivariate normal (MVN)

其中， μ = E [ x ] ∈ R D μ=E[x]\in R^D μ=E[x]∈RD是均值向量, and Σ = c o v [ x ] Σ = cov[x] Σ=cov[x]是D × D的协方差矩阵,D维，一共有D(D+1)/2个参数。

这一块《introduce to machine learning》5.4节介绍的较好，可以参考进行学习。
后序还需学习，原理尚未搞懂！！！

2.6 Transformations of random variables

2.6.1 linear transformation

假设 f f f是一个线性函数， y = f ( x ) = A x + b y=f(x)=\textbf{A}x+b y=f(x)=Ax+b
E [ y ] = A μ + b E[y]=\textbf{A}\mu+b E[y]=Aμ+b
c o v [ y ] = A Σ A T cov[y]=\textbf{A}\Sigma \textbf{A}^T cov[y]=AΣAT

2.6.2 general transformation

三个式子看穿一切

如果是 R n → R n R^n\to R^n Rn→Rn, 可以用jacobian 矩阵

特别的，如果是单个x,y即为：

2.6.3 central limit theorem 中心极限定理

N N N个随机变量pdf为 p ( x i ) p(x_i) p(xi),均值为 μ \mu μ,方差为 σ 2 \sigma^2 σ2, 假设每个变量之间是独立同分布的（independent and identically distributed，iid）
令 S N = ∑ i = 1 N X i S_N=\sum_{i=1}^NX_i SN=∑i=1NXi 是所有自由变量的求和，随着 N N N的增加， S N S_N SN的分布为

收敛到标准正态分布

2.7 Monte Carlo approximation 蒙特卡洛近似

使用变量公式计算PDF是困难的，因此可以采用蒙特卡洛近似，方法如下：
首先产生 S S S个样本 x 1 , x 2 , … , x S x_1,x_2,\dots,x_S x1,x2,…,xS(高维分布可以采用Markov chain Monte Carlo,MCMC方法);然后通过经验分布函数{ f ( x s ) f(x_s) f(xs)} s = 1 S _{s=1}^S s=1S来近似 f ( X ) f(X) f(X)。
Monte Carlo integration

通过改变函数 f f f, 我们可以近似许多感兴趣的量，例如

2.7.2 Example: estimating π by Monte Carlo integration

可以看出 π = I / r 2 \pi=I/r^2 π=I/r2, 令 f ( x , y ) = I ( x 2 + y 2 ≤ r 2 ) f(x, y) =I(x^2+y^2≤r^2) f(x,y)=I(x2+y2≤r2), 令 p ( x ) , p ( y ) p(x),p(y) p(x),p(y)为[-1,1]上的自由分布， p ( x ) = p ( y ) = 1 / ( 2 r ) p(x) = p(y) = 1/(2r) p(x)=p(y)=1/(2r)，那么我们有

2.7.3 Accuracy of Monte Carlo approximation

精度随样本的增加增加。记 μ = E [ X ] \mu=E[X] μ=E[X]为精确的均值，MC近似得到的是 μ ^ \hat{\mu} μ^, 如果样本是独立的，那么

σ 2 \sigma^2 σ2可以通过MC估计

那么我们有

其中， σ ^ 2 S \sqrt{\frac{\hat{\sigma}^2}{S}} Sσ^2 为标准误差，是我们估计 μ \mu μ的不确定性。

2.8 Information theory

2.8.1 Entropy

自由变量 X X X的分布为 p p p, 熵记做 H ( p ) H(p) H(p)或者 H ( X ) H(X) H(X), 离散变量熵如下所示，其中 K K K为状态数

若为 l o g 2 log_2 log2 记为bits，若为 l o g e log_e loge 记为nats

2.8.2 KL divergence or relative entropy相对熵

一种判断两种分布相异程度的方法

其中，求和可以换成对pdf积分，展开之后为

交叉熵

容易看出，
pq的相对熵=pq交叉熵-p的熵，因此相对熵可以理解为通过q分布编码p比p编码自身多出来的，因此相对熵 ≥ 0 \ge0 ≥0.

通过如下jensen不等式可以证明定理2.8.1

离散分布中随机分布具有最大熵，
令 u ( x ) = 1 / ∣ X ∣ u(x)=1/|\mathcal{X}| u(x)=1/∣X∣，我们有

如果我们不知道什么分布更合适的时候就使用均匀分布，这是理由不充分原则（principle of insufficient reason）。

2.8.3 mutual information 互信息

判断 p ( x , y ) p(x,y) p(x,y)和 p ( x ) p ( y ) p(x)p(y) p(x)p(y)关系的量，如果 x x x跟 y y y不相关，则 p ( x , y ) = p ( x ) p ( y ) p(x,y)=p(x)p(y) p(x,y)=p(x)p(y)。二者相关性越大，则p(x, y)就相比于p(x)p(y)越大

H ( Y ∣ X ) \mathbb{H}(Y|X) H(Y∣X)为条件熵,
点互信息(pointwise mutual information)和互信息相似，都是判断 p ( x , y ) p(x,y) p(x,y)和 p ( x ) p ( y ) p(x)p(y) p(x)p(y)关系的量，可以把互信息理解成点互信息的加权和。

–2018.11.15–

本文标签：第二章学习笔记 Learning machine perspective

版权声明：本文标题：Machine Learning A Probabilistic Perspective第二章学习笔记内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/xitong/1725895075a1047538.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

Machine Learning A Probabilistic Perspective第二章学习笔记

Machine Learning A Probabilistic Perspective学习笔记or机器学习学习笔记

闲扯

2 Probability

2.2 A brief review of probability theory

2.2.4 Independence and conditional independence

2.3 Some common discrete distributions一些离散分布

2.3.1 The binomial and Bernoulli distributions（二项分布和伯努利分布）

2.3.2 The multinomial and multinoulli distributions（多项分布和multinoulli分布）

2.4 Some common continuous distributions

2.4.1 Gaussian (normal) distribution

2.4.2 Degenerate pdf

2.5 Joint probability distributions

2.5.1 Covariance and correlation

2.5.2 The multivariate Gaussian or multivariate normal (MVN)

2.6 Transformations of random variables

2.6.1 linear transformation

2.6.2 general transformation

2.6.3 central limit theorem 中心极限定理

2.7 Monte Carlo approximation 蒙特卡洛近似

2.7.2 Example: estimating π by Monte Carlo integration

2.7.3 Accuracy of Monte Carlo approximation

2.8 Information theory

2.8.1 Entropy

2.8.2 KL divergence or relative entropy相对熵

2.8.3 mutual information 互信息

更多相关文章

【5G学习笔记-7】38.331 NR RRC_INACTIVE态

《白帽子讲WEB安全》学习笔记之第9章 认证与会话管理

现代大学英语精读第二版（第一册）学习笔记（原文及全文翻译）——10A - Mandela‘s Garden（曼德拉的菜园）

现代大学英语精读第二版（第三册）学习笔记（原文及全文翻译）——1A - Your College Years（你的大学生活）

墙裂推荐！看完全面掌握，最详细的 Docker 学习笔记总结（2021最新版）

Advances in Financial Machine Learning 导言 (附pdf下载链接)

第二章第二十二题（金融应用：货币单位）(Financial application: monetary units)

第二章第二十一题（金融应用：计算未来投资回报）(Financial application: calculate future investment value)

ssh框架学习笔记

现代大学英语精读第二版（第四册）学习笔记（原文及全文翻译）——13B - My Daughter Smokes（我女儿抽烟）

「How The Economic Machine Works」《经济机器是如何运转的》中文字幕

【STM32学习笔记——WIFI模块】

Java SE学习笔记

Python课程学习笔记 下

Nginx学习笔记(一)

django学习笔记

VirtualBox网络连接方式学习笔记

ae学习笔记（粒子爱心教程笔记）

解决”正尝试安装的adobe flash player不是最新版本“的办法（学习笔记）

Python学习笔记：23 爬虫

发表评论

推荐文章

使用 PyGame 显示图像的四种方案

HarmonyOS鸿蒙学习笔记（17）获取屏幕宽高等属性

云服务器无法远程连接常见原因如下：

widow10系统查找局域网网络计算机,win10查看局域网电脑的方法_win10怎么查看局域网内的其他电脑...

计算机行业新技术 —— 区块链

热门文章

acer笔记本装金泰克SSD&amp;重装win10系统

php屏蔽微信网页投诉按钮,屏蔽微信（QQ）内置浏览器菜单中的投诉按钮

微信网页开发 -- 网页授权

Centos7挂载U盘报错

解决安装 Ubuntu 后无法进入BIOS、UEFI 和Grub 引导

docker 容器访问不了外网问题

计算机多人远程桌面连接,配置网络：实现多个远程桌面连接

【小组会整理】ICLR2016Particular object retrieval with integral max-pooling of CNN activations

android手机照片导出来,华为手机相册怎么导出到电脑？华为手机相册批量导出电脑的三种方法...

linux系统u盘安装教程图解教程,使用U盘安装Ubuntu的详细图文教程

最新文章

附录 区块链技术名词与核心原理

淘宝网页练习

OSChina 周四乱弹 ——程序员要赚多少钱才能让妻子保持温柔和美丽

概述和HTTP请求与响应处理

document.write()会清空原来的内容原因

ListView获取网络数据并展示优化练习

Ubuntu速配指南之软件参考

11-Flask之支付宝集成

IT回忆录-1

java体系学习总结记录——超长篇

区块链惊现山寨万融链怎样判断一个区块链项目的真假

多线程技术

数据库资源集合

《白帽子讲WEB安全》学习笔记之第9章认证与会话管理

Python课程学习笔记下

acer笔记本装金泰克SSD&重装win10系统

附录　区块链技术名词与核心原理

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载