2022-02-10发表2024-05-30更新知识学习13 分钟读完 (大约2010个字)

深度学习基础知识

本文主要介绍了深度学习的一些基础知识，包括深度学习的理论知识、深度学习框架、深度学习的基础知识、神经网络炼丹技巧等。

深度学习框架

TensorFlow

PyTorch

JAX

JAX: High-Performance Array Computing — JAX documentation
GitHub - google/jax: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Keras

不同框架的比较

基础知识

结构化神经网络模型代码

端到端的含义

End to End learning in the context of AI and ML is a technique where the model learns all the steps between the initial input phase and the final output result. This is a deep learning process where all of the different parts are simultaneously trained instead of sequentially.

End to End learning in AI
[End-to-end learning, the (almost) every purpose ML method](https://towardsdatascience.com/e2e-the-every-purpose-ml-method-5d4f20

计算神经网络模型中的参数

Batch and Epoch

The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.
Difference Between a Batch and an Epoch in a Neural Network - Machine Learning Mastery

Softmax

在数学，尤其是概率论和相关领域中，Softmax 函数，或称归一化指数函数，是逻辑函数的一种推广。它能将一个含任意实数的 K 维向量 z “压缩”到另一个 K 维实向量*σ(z)*中，使得每一个元素的范围都在(0, 1)之间，并且所有元素的和为 1（也可视为一个(k − 1)维的 hyperplane 或 subspace）。该函数的形式通常由下面的形式给出:

$$
\sigma(z){j} = \frac {e^{z_j}} {\sum{k=1}^{K}e^{z_k}} for j = 1, \cdots, K.
$$

输入向量[1, 2, 3, 4, 1, 2, 3]对应的 Softmax 函数的值为[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]。输出向量中拥有最大权重的项对应着输入向量中的最大值“4”。这也显示了这个函数通常的意义：对向量进行归一化，凸显其中最大的值并抑制远低于最大值的其他分量。

Backpropagation

In machine learning, backpropagation (backprop, BP) is a widely used algorithm for training feedforward neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions generally. These classes of algorithms are all referred to generically as “backpropagation”. In fitting a neural network, backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually. This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss; gradient descent, or variants such as stochastic gradient descent, are commonly used. The backpropagation algorithm works by computing the gradient of the loss function with respect to each weight by the chain rule, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this is an example of dynamic programming.

The term backpropagation strictly refers only to the algorithm for computing the gradient, not how the gradient is used; however, the term is often used loosely to refer to the entire learning algorithm, including how the gradient is used, such as by stochastic gradient descent. Backpropagation generalizes the gradient computation in the delta rule, which is the single-layer version of backpropagation, and is in turn generalized by automatic differentiation, where backpropagation is a special case of reverse accumulation (or “reverse mode”). The term backpropagation and its general use in neural networks was announced in Rumelhart, Hinton & Williams (1986a), then elaborated and popularized in Rumelhart, Hinton & Williams (1986b), but the technique was independently rediscovered many times, and had many predecessors dating to the 1960s.

Latent Space

Embedding

什么是 Embedding？

过拟合 (Overfitting)

神经网络炼丹技巧

超参数调节

学习率调整

PyTorch 学习笔记（八）：PyTorch 的六个学习率调整方法 - 知乎 (zhihu.com)

梯度裁剪（Gradient Clipping）

损失函数正则化

神经网络参数共享

只需要将神经网络的参数保存起来然后重新加载就可以了。

提升神经网络的鲁棒性和稳定性

How to use Data Scaling Improve Deep Learning Model Stability and Performance - Machine Learning Mastery

提高模型的泛化能力

保存神经网络模型与权重

Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn’t safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format=”tf”) or using save_weights.

How to get weights of layers in TensorFlow

神经网络中的求导

链接收藏

深度学习基础知识

https://latexalpha.github.io/b13f8cf9eb56/

作者

Shangyu ZHAO

发布于

2022-02-10

更新于

2024-05-30

许可协议

#Deep Learning

深度学习基础知识

深度学习框架

TensorFlow

PyTorch

JAX

Keras

不同框架的比较

基础知识

结构化神经网络模型代码

端到端的含义

计算神经网络模型中的参数

Batch and Epoch

Softmax

Backpropagation

Latent Space

Embedding

过拟合 (Overfitting)

神经网络炼丹技巧

超参数调节

学习率调整

梯度裁剪（Gradient Clipping）

损失函数正则化

神经网络参数共享

提升神经网络的鲁棒性和稳定性

提高模型的泛化能力

保存神经网络模型与权重

神经网络中的求导

链接收藏

作者

发布于

更新于

许可协议

喜欢这篇文章？打赏一下作者吧

分类

目录