深度学习基础知识

本文主要介绍了深度学习的一些基础知识,包括深度学习的理论知识、深度学习框架、深度学习的基础知识、神经网络炼丹技巧等。

深度学习框架

TensorFlow

PyTorch

JAX

Keras

不同框架的比较

基础知识

结构化神经网络模型代码

端到端的含义

End to End learning in the context of AI and ML is a technique where the model learns all the steps between the initial input phase and the final output result. This is a deep learning process where all of the different parts are simultaneously trained instead of sequentially.

计算神经网络模型中的参数

Batch and Epoch

Softmax

在数学,尤其是概率论和相关领域中,Softmax 函数,或称归一化指数函数,是逻辑函数的一种推广。它能将一个含任意实数的 K 维向量 z “压缩”到另一个 K 维实向量*σ(z)*中,使得每一个元素的范围都在(0, 1)之间,并且所有元素的和为 1(也可视为一个(k − 1)维的 hyperplane 或 subspace)。该函数的形式通常由下面的形式给出:

$$
\sigma(z){j} = \frac {e^{z_j}} {\sum{k=1}^{K}e^{z_k}} for j = 1, \cdots, K.
$$

输入向量[1, 2, 3, 4, 1, 2, 3]对应的 Softmax 函数的值为[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]。输出向量中拥有最大权重的项对应着输入向量中的最大值“4”。这也显示了这个函数通常的意义:对向量进行归一化,凸显其中最大的值并抑制远低于最大值的其他分量。

Backpropagation

In machine learning, backpropagation (backprop, BP) is a widely used algorithm for training feedforward neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions generally. These classes of algorithms are all referred to generically as “backpropagation”. In fitting a neural network, backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually. This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss; gradient descent, or variants such as stochastic gradient descent, are commonly used. The backpropagation algorithm works by computing the gradient of the loss function with respect to each weight by the chain rule, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this is an example of dynamic programming.

The term backpropagation strictly refers only to the algorithm for computing the gradient, not how the gradient is used; however, the term is often used loosely to refer to the entire learning algorithm, including how the gradient is used, such as by stochastic gradient descent. Backpropagation generalizes the gradient computation in the delta rule, which is the single-layer version of backpropagation, and is in turn generalized by automatic differentiation, where backpropagation is a special case of reverse accumulation (or “reverse mode”). The term backpropagation and its general use in neural networks was announced in Rumelhart, Hinton & Williams (1986a), then elaborated and popularized in Rumelhart, Hinton & Williams (1986b), but the technique was independently rediscovered many times, and had many predecessors dating to the 1960s.

Latent Space

Embedding

什么是 Embedding?

过拟合 (Overfitting)

神经网络炼丹技巧

超参数调节

学习率调整

梯度裁剪(Gradient Clipping)

损失函数正则化

神经网络参数共享

只需要将神经网络的参数保存起来然后重新加载就可以了。

提升神经网络的鲁棒性和稳定性

提高模型的泛化能力

保存神经网络模型与权重

Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn’t safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format=”tf”) or using save_weights.

神经网络中的求导

链接收藏

作者

Shangyu ZHAO

发布于

2022-02-10

更新于

2024-05-30

许可协议