深度学习基础知识
本文主要介绍了深度学习的一些基础知识,包括深度学习的理论知识、深度学习框架、深度学习的基础知识、神经网络炼丹技巧等。
深度学习框架
TensorFlow
- TensorFlow
- TensorFlow Forum
- TensorFlow Core Tutorials
- TensorFlow Core Guide
- TensorFlow API for Python
PyTorch
JAX
- JAX: High-Performance Array Computing — JAX documentation
- GitHub - google/jax: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Keras
- Keras: Deep Learning for humans
- Developer guides (keras.io)
- Keras API reference
- Code examples (keras.io)
不同框架的比较
- Tensorflow vs. PyTorch vs. Keras for Deep Learning
- Pytorch Vs Tensorflow Vs Keras: Here are the Difference You Should Know
基础知识
结构化神经网络模型代码
端到端的含义
End to End learning in the context of AI and ML is a technique where the model learns all the steps between the initial input phase and the final output result. This is a deep learning process where all of the different parts are simultaneously trained instead of sequentially.
- End to End learning in AI
- [End-to-end learning, the (almost) every purpose ML method](https://towardsdatascience.com/e2e-the-every-purpose-ml-method-5d4f20
计算神经网络模型中的参数
Batch and Epoch
The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.
Difference Between a Batch and an Epoch in a Neural Network - Machine Learning Mastery
Softmax
在数学,尤其是概率论和相关领域中,Softmax 函数,或称归一化指数函数,是逻辑函数的一种推广。它能将一个含任意实数的 K 维向量 z “压缩”到另一个 K 维实向量*σ(z)*中,使得每一个元素的范围都在(0, 1)之间,并且所有元素的和为 1(也可视为一个(k − 1)维的 hyperplane 或 subspace)。该函数的形式通常由下面的形式给出:
$$
\sigma(z){j} = \frac {e^{z_j}} {\sum{k=1}^{K}e^{z_k}} for j = 1, \cdots, K.
$$
输入向量[1, 2, 3, 4, 1, 2, 3]对应的 Softmax 函数的值为[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]。输出向量中拥有最大权重的项对应着输入向量中的最大值“4”。这也显示了这个函数通常的意义:对向量进行归一化,凸显其中最大的值并抑制远低于最大值的其他分量。
Backpropagation
In machine learning, backpropagation (backprop, BP) is a widely used algorithm for training feedforward neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions generally. These classes of algorithms are all referred to generically as “backpropagation”. In fitting a neural network, backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually. This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss; gradient descent, or variants such as stochastic gradient descent, are commonly used. The backpropagation algorithm works by computing the gradient of the loss function with respect to each weight by the chain rule, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this is an example of dynamic programming.
The term backpropagation strictly refers only to the algorithm for computing the gradient, not how the gradient is used; however, the term is often used loosely to refer to the entire learning algorithm, including how the gradient is used, such as by stochastic gradient descent. Backpropagation generalizes the gradient computation in the delta rule, which is the single-layer version of backpropagation, and is in turn generalized by automatic differentiation, where backpropagation is a special case of reverse accumulation (or “reverse mode”). The term backpropagation and its general use in neural networks was announced in Rumelhart, Hinton & Williams (1986a), then elaborated and popularized in Rumelhart, Hinton & Williams (1986b), but the technique was independently rediscovered many times, and had many predecessors dating to the 1960s.
Latent Space
Embedding
什么是 Embedding?
过拟合 (Overfitting)
神经网络炼丹技巧
超参数调节
- Hyperparameter tuning. Grid search and random search | Your Data Teacher
- 炼丹宝典 | 整理 Deep Learning 调参 tricks - 山竹小果 - 博客园 (cnblogs.com)
- 深度学习调参有哪些技巧?
学习率调整
梯度裁剪(Gradient Clipping)
- 深度炼丹之梯度裁剪
- 【调参 19】如何使用梯度裁剪(Gradient Clipping)避免梯度爆炸_Constant dripping wears the stone-CSDN 博客_keras 梯度裁剪
损失函数正则化
- 深度学习中的优化
- How to Improve a Neural Network With Regularization
- Regularization in Deep Learning - L1, L2, and Dropout
神经网络参数共享
只需要将神经网络的参数保存起来然后重新加载就可以了。
- Parameter Sharing in Deep Learning
- Understanding Parameter Sharing (or weights replication) Within Convolutional Neural Networks
提升神经网络的鲁棒性和稳定性
提高模型的泛化能力
保存神经网络模型与权重
Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn’t safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format=”tf”) or using save_weights
.