admin管理员组文章数量:1593935
文章目录
- 13.梯度
- 14.激活函数
- 15.感知机
- 16.链式法则
- 17.反向传播
- 18.2D函数优化实例
- 19.Logistic Regression
- 20.交叉熵
- 21.多分类
- 22.全连接层
- 23.激活函数与GPU加速
- 24. 测试
根据龙良曲Pytorch学习视频整理,视频链接:
【计算机-AI】PyTorch学这个就够了!
(好课推荐)深度学习与PyTorch入门实战——主讲人龙良曲
13.梯度
- 导数 derivative
- 偏微分 partial derivate
- 梯度 gradient(向量)
How to search for minima?
- θ t + 1 = θ t − α t ▽ f ( θ t ) \theta_{t+1}=\theta_t-\alpha_t\triangledown f(\theta_t) θt+1=θt−αt▽f(θt)
Optimizer performance
- initialization status 何恺明初始化方法
- learning rate (learning_rate_decay)
- momentum
14.激活函数
- 连续不可导
- Sigmoid / Logistic
σ
′
=
σ
(
1
−
σ
)
\sigma'=\sigma(1-\sigma)
σ′=σ(1−σ)
torch.sigmoid()
F.sigmoid() (import torch.nn.functional as F)
- Tanh
torch.tanh()
- Relu
torch.relu()
F.relu() (import torch.nn.functional as F)
Typical Loss
- Mean Squared Error
MSE l o s s = ∑ [ y − ( x w + b ) ] 2 loss = \sum [y-(xw+b)]^2 loss=∑[y−(xw+b)]2
L 2 − n o r m = ∣ ∣ y − ( x w + b ) ∣ ∣ 2 L2-norm=||y-(xw+b)||_2 L2−norm=∣∣y−(xw+b)∣∣2 - Cross Entropy Loss
binary
multi-class
+softmax
Leave it to Logistic Regression Part - Softmax
soft version of max
S ( y i ) = e y i ∑ j e y j S(y_i)=\frac{e^{y_i}}{\sum_je^{y_j}} S(yi)=∑jeyjeyi
∂ p i ∂ p j = { p i ( 1 − p i ) i = j − p j ∗ p i i ≠ j \frac{\partial p_i}{\partial p_j}=\left\{\begin{matrix} p_i(1-p_i)&i=j \\ -p_j*p_i& i\neq j \end{matrix}\right. ∂pj∂pi={pi(1−pi)−pj∗pii=ji=j
Gradient API
torch.autograd.grad(loss, [w1, w2,...])
loss.backward()
import torch
import torch.nn.functional as F
x = torch.ones(1)
w = torch.full([1], 2.)
mse = F.mse_loss(torch.ones(1), x*w)
print(mse) # tensor(1., grad_fn=<MseLossBackward>)
# torch.autograd.grad(mse, [w]) # RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
print(w.requires_grad_()) # tensor([2.], requires_grad=True)
# print(torch.autograd.grad(mse, [w])) # 动态图未更新会报错RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
mse = F.mse_loss(torch.ones(1), x*w)
# print(torch.autograd.grad(mse, [w])) # (tensor([2.]),
mse.backward()
print(w.grad) # tensor([2.])
a = torch.rand(3, requires_grad=True)
print(a) # tensor([0.0377, 0.4542, 0.1386], requires_grad=True)
p = F.softmax(a, dim=0)
# p.backward() # 报错 RuntimeError: grad can be implicitly created only for scalar outputs
# retain_graph=True 不会清除计算图
print(torch.autograd.grad(p[0], [a], retain_graph=True)) # (tensor([ 0.1998, -0.1156, -0.0843]),)
print(torch.autograd.grad(p[1], [a], retain_graph=True)) # (tensor([-0.1156, 0.2434, -0.1278]),)
print(torch.autograd.grad(p[2], [a], retain_graph=True)) # (tensor([-0.0843, -0.1278, 0.2121]),)
15.感知机
单一输出感知机求导
∂
E
∂
w
j
0
=
(
O
0
−
t
)
O
0
(
1
−
O
0
)
x
j
0
\frac{\partial E}{\partial w_{j0}}=(O_0-t)O_0(1-O_0)x^0_j
∂wj0∂E=(O0−t)O0(1−O0)xj0
多输出Loss层 (Multi-output Perception)
∂
E
∂
w
j
k
=
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
x
j
0
\frac{\partial E}{\partial w_{jk}}=(O_k-t_k)O_k(1-O_k)x^0_j
∂wjk∂E=(Ok−tk)Ok(1−Ok)xj0
import torch
import torch.nn.functional as F
x = torch.randn(1, 10)
# w = torch.randn(1, 10, requires_grad=True) # 单一层感知机
w = torch.randn(2, 10, requires_grad=True) # 多输出Loss层
o = torch.sigmoid(x@w.t())
print(o.shape) # torch.Size([1, 2])
loss = F.mse_loss(torch.ones(1, 1), o) # broadcasting
print(loss.shape) # torch.Size([])
print(loss) # tensor(0.2094, grad_fn=<MseLossBackward>)
loss.backward()
print(w.grad)
"""
tensor([[-2.0498e-01, 2.4619e-02, -8.0208e-04, -1.3723e-01, -1.3014e-01,
-1.4648e-01, -7.5119e-02, 4.9381e-02, 2.7161e-01, 4.8075e-02],
[-4.8705e-03, 5.8495e-04, -1.9058e-05, -3.2607e-03, -3.0922e-03,
-3.4804e-03, -1.7849e-03, 1.1733e-03, 6.4536e-03, 1.1423e-03]])
"""
16.链式法则
import torch
import torch.nn.functional as F
x = torch.tensor(1.)
w1 = torch.tensor(2., requires_grad=True)
b1 = torch.tensor(1.)
w2 = torch.tensor(2., requires_grad=True)
b2 = torch.tensor(1.)
y1 = x * w1 + b1
y2 = y1 * w2 + b2
dy2_dy1 = torch.autograd.grad(y2, [y1], retain_graph=True)[0]
dy1_dw1 = torch.autograd.grad(y1, [w1], retain_graph=True)[0]
dy2_dw1 = torch.autograd.grad(y2, [w1], retain_graph=True)[0]
print(dy2_dy1 * dy1_dw1) # tensor(2.)
print(dy2_dw1) # tensor(2.)
17.反向传播
For an output layer node k ∈ \in ∈ K ∂ E ∂ W j k = O j δ k \frac{\partial E}{\partial W_{jk}}=O_j\delta_k ∂Wjk∂E=Ojδk
where δ k = O k ( 1 − O k ) ( O k − t k ) \delta _k = O_k(1-O_k)(O_k-t_k) δk=Ok(1−Ok)(Ok−tk)
For a hidden layer node j ∈ \in ∈ J ∂ E ∂ W i j = O i δ j \frac{\partial E}{\partial W_{ij}}=O_i\delta_j ∂Wij∂E=Oiδj
where δ j = O j ( 1 − O j ) ∑ k ∈ K δ k W j k \delta _j = O_j(1-O_j)\sum _{k \in K}\delta _k W_{jk} δj=Oj(1−Oj)k∈K∑δkWjk
18.2D函数优化实例
import torch
import numpy as np
import matplotlib.pyplot as plt
def himmelblau(x):
return (x[0] ** 2 + x[1] - 11) ** 2 + (x[0] + x[1] ** 2 - 7) ** 2
x = np.arange(-6, 6, 0.1)
y = np.arange(-6, 6, 0.1)
print('x, y range:', x.shape, y.shape)
X, Y = np.meshgrid(x, y)
print('X, Y maps', X.shape, Y.shape)
Z = himmelblau([X, Y])
fig = plt.figure('himmelblau')
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z)
ax.view_init(60, -30)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
x = torch.tensor([0., 0.], requires_grad=True)
optimizer = torch.optim.Adam([x], lr=1e-3)
for step in range(20000):
pred = himmelblau(x)
optimizer.zero_grad()
pred.backward()
optimizer.step()
if step % 2000 == 0:
print('step {}: x = {}, f(x) = {}'.format(step, x.tolist(), pred.item()))
"""
x, y range: (120,) (120,)
X, Y maps (120, 120) (120, 120)
G:/Project/PYTHON/Demo/Pytorch21_7_29/12himmelblau.py:17: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
ax = fig.gca(projection='3d')
step 0: x = [0.0009999999310821295, 0.0009999999310821295], f(x) = 170.0
step 2000: x = [2.3331806659698486, 1.9540694952011108], f(x) = 13.730916023254395
step 4000: x = [2.9820079803466797, 2.0270984172821045], f(x) = 0.014858869835734367
step 6000: x = [2.999983549118042, 2.0000221729278564], f(x) = 1.1074007488787174e-08
step 8000: x = [2.9999938011169434, 2.0000083446502686], f(x) = 1.5572823031106964e-09
step 10000: x = [2.999997854232788, 2.000002861022949], f(x) = 1.8189894035458565e-10
step 12000: x = [2.9999992847442627, 2.0000009536743164], f(x) = 1.6370904631912708e-11
step 14000: x = [2.999999761581421, 2.000000238418579], f(x) = 1.8189894035458565e-12
step 16000: x = [3.0, 2.0], f(x) = 0.0
step 18000: x = [3.0, 2.0], f(x) = 0.0
"""
不同的初始化会产生不同的解
19.Logistic Regression
Goal v.s. Approach
- For regression
Goal: p r e d = y pred = y pred=y
Approach: minimize d i s t ( p r e d , y ) dist(pred, y) dist(pred,y) - For classification
Goal: maximize benchmark, e.g. accuracy
Approach1: minimize d i s t ( p θ ( y ∣ x ) , p r ( y ∣ x ) ) dist(p_\theta(y|x), p_r(y|x)) dist(pθ(y∣x),pr(y∣x))
Approach2: minimize d i v e r g e n c e ( p θ ( y ∣ x ) , p r ( y ∣ x ) ) divergence(p_\theta(y|x), p_r(y|x)) divergence(pθ(y∣x),pr(y∣x))
Q1. why not maximize accuracy?
- a c c . = ∑ I ( p r e d i = = y i ) l e n ( Y ) acc.=\frac{\sum I(pred_i==y_i)}{len(Y)} acc.=len(Y)∑I(predi==yi)
- issues 1. gradient = 0 if accuracy unchanged but weights changed
- issues 2. gradient not continuous since the number of correct is not continuous
Q2. why call logistic regression
- use sigmoid
- Controversial
Mse => regression
Cross Entropy => classification
20.交叉熵
Entropy
- Uncertainty
- measure of surprise
- higher entropy = less info.
E n t r o p y = − ∑ i P ( i ) l o g P ( i ) Entropy=-\sum_iP(i)logP(i) Entropy=−i∑P(i)logP(i)
Lottery
import torch
a = torch.full([4], 1/4.)
print(a) # tensor([0.2500, 0.2500, 0.2500, 0.2500])
print(a * torch.log2(a)) # tensor([-0.5000, -0.5000, -0.5000, -0.5000])
print(-(a * torch.log2(a)).sum()) # tensor(2.)
a = torch.tensor([0.1, 0.1, 0.1, 0.7])
print(a) # tensor([0.1000, 0.1000, 0.1000, 0.7000])
print(a * torch.log2(a)) # tensor([-0.3322, -0.3322, -0.3322, -0.3602])
print(-(a * torch.log2(a)).sum()) #tensor(1.3568)
Cross Entropy
- H ( p , q ) = ∑ p ( x ) l o g q ( x ) H(p,q)=\sum p(x)log\space q(x) H(p,q)=∑p(x)log q(x)
-
H
(
p
,
q
)
=
H
(
p
)
+
D
K
L
(
p
∣
q
)
H(p,q)=H(p)+D_{KL}(p|q)
H(p,q)=H(p)+DKL(p∣q)
D
K
L
D_{KL}
DKL指KL Divergence相对熵
P=Q: cross Entropy = Entropy
for one-hot encoding: entropy = 1log1=0
Binary Classfication
H
(
P
,
Q
)
=
−
(
y
l
o
g
(
p
)
+
(
1
−
y
)
l
o
g
(
1
−
p
)
)
H(P, Q)=-(ylog(p)+(1-y)log(1-p))
H(P,Q)=−(ylog(p)+(1−y)log(1−p))
Why not use MSE?
- sigmoid+MSE
gradient vanish - converge slower
- But, sometimes
e.g. meta-learning
Numerical Stability
import torch
import torch.nn.functional as F
x = torch.randn(1, 784)
w = torch.randn(10, 784)
logits = x @ w.t()
print(logits.size()) # torch.Size([1, 10])
print(F.cross_entropy(logits, torch.tensor([3]))) # tensor(0.1694)
pred = F.softmax(logits, dim=1)
print(pred.size()) # torch.Size([1, 10])
pred_log = torch.log(pred)
print(F.nll_loss(pred_log, torch.tensor([3]))) # tensor(0.1694)
21.多分类
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10
# load data
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
])),
batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
])),
batch_size=batch_size, shuffle=True
)
# Network Architecture
w1, b1 = torch.randn(200, 784, requires_grad=True), torch.zeros(200, requires_grad=True)
w2, b2 = torch.randn(200, 200, requires_grad=True), torch.zeros(200, requires_grad=True)
w3, b3 = torch.randn(10, 200, requires_grad=True), torch.zeros(10, requires_grad=True)
torch.nn.init.kaiming_normal_(w1)
torch.nn.init.kaiming_normal_(w2)
torch.nn.init.kaiming_normal_(w3)
def forward(x):
x = x @ w1.t() + b1
x = F.relu(x)
x = x @ w2.t() + b2
x = F.relu(x)
x = x @ w3.t() + b3
x = F.relu(x)
return x
# Train
optimizer = torch.optim.SGD([w1, b1, w2, b2, w3, b3], lr=learning_rate)
criteon = nn.CrossEntropyLoss()
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader):
data = data.view(-1, 28 * 28)
logits = forward(data)
loss = criteon(logits, target)
optimizer.zero_grad()
loss.backward()
# print(w1.grad.norm(), w2.grad.norm())
optimizer.step()
if batch_idx % 100 == 0:
print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()
))
test_loss = 0
correct = 0
for data, target in test_loader:
data = data.view(-1, 28 * 28)
logits = forward(data)
test_loss += criteon(logits, target).item()
pred = logits.data.max(1)[1]
correct += pred.eq(target.data).sum()
test_loss /= len(test_loader.dataset)
print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)
))
"""
Train Epoch:0 [0 / 60000 (0%)] Loss: 2.486867
Train Epoch:0 [20000 / 60000 (33%)] Loss: 0.724861
Train Epoch:0 [40000 / 60000 (67%)] Loss: 0.376032
Test set Average loss:0.0018, Accuracy: 8983 / 10000 (90%)
Train Epoch:1 [0 / 60000 (0%)] Loss: 0.366988
Train Epoch:1 [20000 / 60000 (33%)] Loss: 0.377176
Train Epoch:1 [40000 / 60000 (67%)] Loss: 0.399104
Test set Average loss:0.0014, Accuracy: 9186 / 10000 (92%)
Train Epoch:2 [0 / 60000 (0%)] Loss: 0.252696
Train Epoch:2 [20000 / 60000 (33%)] Loss: 0.302346
Train Epoch:2 [40000 / 60000 (67%)] Loss: 0.266919
Test set Average loss:0.0012, Accuracy: 9284 / 10000 (93%)
Train Epoch:3 [0 / 60000 (0%)] Loss: 0.320602
Train Epoch:3 [20000 / 60000 (33%)] Loss: 0.223881
Train Epoch:3 [40000 / 60000 (67%)] Loss: 0.198832
Test set Average loss:0.0011, Accuracy: 9364 / 10000 (94%)
Train Epoch:4 [0 / 60000 (0%)] Loss: 0.253680
Train Epoch:4 [20000 / 60000 (33%)] Loss: 0.147065
Train Epoch:4 [40000 / 60000 (67%)] Loss: 0.194152
Test set Average loss:0.0010, Accuracy: 9406 / 10000 (94%)
Train Epoch:5 [0 / 60000 (0%)] Loss: 0.163504
Train Epoch:5 [20000 / 60000 (33%)] Loss: 0.216691
Train Epoch:5 [40000 / 60000 (67%)] Loss: 0.166883
Test set Average loss:0.0010, Accuracy: 9460 / 10000 (95%)
Train Epoch:6 [0 / 60000 (0%)] Loss: 0.120956
Train Epoch:6 [20000 / 60000 (33%)] Loss: 0.122348
Train Epoch:6 [40000 / 60000 (67%)] Loss: 0.167381
Test set Average loss:0.0009, Accuracy: 9484 / 10000 (95%)
Train Epoch:7 [0 / 60000 (0%)] Loss: 0.218382
Train Epoch:7 [20000 / 60000 (33%)] Loss: 0.141006
Train Epoch:7 [40000 / 60000 (67%)] Loss: 0.156644
Test set Average loss:0.0009, Accuracy: 9501 / 10000 (95%)
Train Epoch:8 [0 / 60000 (0%)] Loss: 0.152702
Train Epoch:8 [20000 / 60000 (33%)] Loss: 0.167587
Train Epoch:8 [40000 / 60000 (67%)] Loss: 0.182679
Test set Average loss:0.0008, Accuracy: 9528 / 10000 (95%)
Train Epoch:9 [0 / 60000 (0%)] Loss: 0.210252
Train Epoch:9 [20000 / 60000 (33%)] Loss: 0.150022
Train Epoch:9 [40000 / 60000 (67%)] Loss: 0.097077
Test set Average loss:0.0008, Accuracy: 9559 / 10000 (96%)
"""
22.全连接层
concisely
- inherit from nn.Module
- init layer in __init__
- implement forward()
nn.Relu v.s. F.relu()
- class-style API
- function-style API
import torch
import torch.nn as nn
from torchvision import datasets, transforms
import torch.optim as optim
from visdom import Visdom
# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10
# load data
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
])),
batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
])),
batch_size=batch_size, shuffle=True
)
class MLP(nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.model = nn.Sequential(
nn.Linear(784, 200),
nn.LeakyReLU(inplace=True), # inplace计算可以节省内(显)存,还可省去反复申请和释放内存时间,但会对原变量覆盖
nn.Linear(200,200),
nn.LeakyReLU(inplace=True),
nn.Linear(200,10),
nn.LeakyReLU(inplace=True),
)
def forward(self, x):
x = self.model(x)
return x
device = torch.device('cuda:0')
net = MLP().to(device)
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
criteon = nn.CrossEntropyLoss().to(device)
viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train_loss'))
viz.line([[0.0, 0.0]], [0.], win='test', opts=dict(title='test loss&acc.', legend=['loss', 'acc.']))
global_step = -1
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader):
data = data.view(-1, 28 * 28)
data, target = data.to(device), target.cuda()
logits = net(data)
loss = criteon(logits, target)
optimizer.zero_grad()
loss.backward()
# print(w1.grad.norm(), w2.grad.norm())
optimizer.step()
if batch_idx % 100 == 0:
print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()
))
# lines: single trace
global_step += 1
viz.line([loss.item()], [global_step], win='train_loss', update='append')
test_loss = 0
correct = 0
for data, target in test_loader:
data = data.view(-1, 28 * 28)
data, target = data.to(device), target.cuda()
logits = net(data)
test_loss += criteon(logits, target).item()
# pred = logits.data.max(1)[1]
pred = logits.argmax(dim=1)
correct += pred.eq(target).float().sum().item()
# lines: multi-traces
viz.line([[test_loss, correct / len(test_loader.dataset)]], [global_step], win='test', update='append')
# visual X
viz.images(data.view(-1, 1, 28, 28), win='x')
viz.text(str(pred.detach().cpu().numpy()), win='pred', opts=dict(title='pred'))
test_loss /= len(test_loader.dataset)
print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)
))
"""
Train Epoch:0 [0 / 60000 (0%)] Loss: 2.302269
Train Epoch:0 [20000 / 60000 (33%)] Loss: 1.884911
Train Epoch:0 [40000 / 60000 (67%)] Loss: 1.336271
Test set Average loss:0.0038, Accuracy: 8169 / 10000 (82%)
Train Epoch:1 [0 / 60000 (0%)] Loss: 0.721071
Train Epoch:1 [20000 / 60000 (33%)] Loss: 0.565047
Train Epoch:1 [40000 / 60000 (67%)] Loss: 0.506850
Test set Average loss:0.0021, Accuracy: 8889 / 10000 (89%)
Train Epoch:2 [0 / 60000 (0%)] Loss: 0.368552
Train Epoch:2 [20000 / 60000 (33%)] Loss: 0.301212
Train Epoch:2 [40000 / 60000 (67%)] Loss: 0.406262
Test set Average loss:0.0017, Accuracy: 9061 / 10000 (91%)
Train Epoch:3 [0 / 60000 (0%)] Loss: 0.372895
Train Epoch:3 [20000 / 60000 (33%)] Loss: 0.390528
Train Epoch:3 [40000 / 60000 (67%)] Loss: 0.389583
Test set Average loss:0.0015, Accuracy: 9141 / 10000 (91%)
Train Epoch:4 [0 / 60000 (0%)] Loss: 0.220136
Train Epoch:4 [20000 / 60000 (33%)] Loss: 0.281799
Train Epoch:4 [40000 / 60000 (67%)] Loss: 0.291274
Test set Average loss:0.0014, Accuracy: 9211 / 10000 (92%)
Train Epoch:5 [0 / 60000 (0%)] Loss: 0.280618
Train Epoch:5 [20000 / 60000 (33%)] Loss: 0.305418
Train Epoch:5 [40000 / 60000 (67%)] Loss: 0.334693
Test set Average loss:0.0014, Accuracy: 9226 / 10000 (92%)
Train Epoch:6 [0 / 60000 (0%)] Loss: 0.342200
Train Epoch:6 [20000 / 60000 (33%)] Loss: 0.294665
Train Epoch:6 [40000 / 60000 (67%)] Loss: 0.220197
Test set Average loss:0.0013, Accuracy: 9280 / 10000 (93%)
Train Epoch:7 [0 / 60000 (0%)] Loss: 0.211271
Train Epoch:7 [20000 / 60000 (33%)] Loss: 0.358451
Train Epoch:7 [40000 / 60000 (67%)] Loss: 0.236865
Test set Average loss:0.0012, Accuracy: 9338 / 10000 (93%)
Train Epoch:8 [0 / 60000 (0%)] Loss: 0.226759
Train Epoch:8 [20000 / 60000 (33%)] Loss: 0.288015
Train Epoch:8 [40000 / 60000 (67%)] Loss: 0.263826
Test set Average loss:0.0012, Accuracy: 9349 / 10000 (93%)
Train Epoch:9 [0 / 60000 (0%)] Loss: 0.144617
Train Epoch:9 [20000 / 60000 (33%)] Loss: 0.166465
Train Epoch:9 [40000 / 60000 (67%)] Loss: 0.296551
Test set Average loss:0.0011, Accuracy: 9367 / 10000 (94%)
"""
23.激活函数与GPU加速
大部分时候使用ReLU激活函数
- ReLU: R ( z ) = m a x ( 0 , z ) R(z)=max(0, z) R(z)=max(0,z)
- Leaky ReLU
- SELU
- softplus
GPU accelerated
- 一键切换
device=torch.device('cuda: 0')
data.to(device)
data.cuda()
不推荐
任务管理器查看GPU使用情况
24. 测试
Loss != Accuracy
When to test
- test once per serveral batch
- test once per epoch
- epoch v.s. step?
版权声明:本文标题:【Pytorch学习笔记】3.深度学习基础 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/xitong/1728182075a1148482.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论