神经网络结构

本文主要介绍了一些神经网络结构的基础知识。

前馈神经网络 Feed-Forward Neural Network (FNN/FFNN)

前馈神经网络是最简单、最基本的神经网络结构。

这里需要区分一下前馈神经网络和多层感知机 (Multilayer Perceptron, MLP) 的区别,简单来说,多层感知机是一种具有三层结构的前馈神经网络。

卷积神经网络 Convolutional Neural Network (CNN)

基础卷积神经网络

卷积神经网络(Convolutional Neural Network, CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现。

卷积神经网络由一个或多个卷积层和顶端的全连通层(对应经典的神经网络)组成,同时也包括关联权重和池化层(pooling layer)。这一结构使得卷积神经网络能够利用输入数据的二维结构。与其他深度学习结构相比,卷积神经网络在图像和语音识别方面能够给出更好的结果。这一模型也可以使用反向传播算法进行训练。相比较其他深度、前馈神经网络,卷积神经网络需要考量的参数更少,使之成为一种颇具吸引力的深度学习结构。

时间卷积网络 Temporal Convolutional Networks (TCN)

循环神经网络 Recurrent Neural Network (RNN)

经典 RNN

循环神经网络 Recurrent neural network

循环神经网络(Recurrent neural network:RNN)是神经网络的一种。单纯的 RNN 因为无法处理随着递归,权重指数级爆炸或梯度消失问题,难以捕捉长期时间关联;而结合不同的 LSTM 可以很好解决这个问题。

时间循环神经网络可以描述动态时间行为,因为和前馈神经网络(feedforward neural network)接受较特定结构的输入不同,RNN 将状态在自身网络中循环传递,因此可以接受更广泛的时间序列结构输入。手写识别是最早成功利用 RNN 的研究结果。

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes from a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

The term “recurrent neural network” is used to indiscriminately to refer to two broad classes of network with a similar general structure, where one is finite impulse and the other is infinite impulse. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

Both finite impulse and infinite impulse recurrent networks can have additional stored states, and the storage can be under direct control by the neural network. The storage can also be replaced by another network or graph if that incorporates time delays or has feedback loops. Such controlled states are referred to as gated state or gated memory, and are part of [[long short-term memory]] networks (LSTMs) and gated recurrent units. This is also called Feedback Neural Network (FNN).

In typical libraries like PyTorch Just-in-time compilation plays an important role in efficiently implementing recurrent neural networks.

The Unreasonable Effectiveness of Recurrent Neural Networks

长短期记忆 Long Short-Term Memory (LSTM)

Long Short-Term Memory

长短期记忆(英语:Long Short-Term Memory,LSTM)是一种时间循环神经网络(RNN),论文首次发表于 1997 年。由于独特的设计结构,LSTM 适合于处理和预测时间序列中间隔和延迟非常长的重要事件。

LSTM 的表现通常比时间循环神经网络及隐马尔科夫模型(HMM)更好,比如用在不分段连续手写识别上。2009 年,用 LSTM 构建的人工神经网络模型赢得过 ICDAR 手写识别比赛冠军。LSTM 还普遍用于自主语音识别,2013 年运用 TIMIT 自然演讲数据库达成 17.7%错误率的纪录。作为非线性模型,LSTM 可作为复杂的非线性单元用于构造更大型深度神经网络。

Schematic of deep LSTM networks

门控循环单元 Gated Recurrent Unit (GRU)

  • Wikipedia

    Gated recurrent units are a gating mechanism in recurrent networks, introduced in 2014 by Kyunghyun Cho al. The GRU is like a a long short-term memory (LSTM) with a forget gate, but with fewer parameters than LSTM, as it lacks an output gate. GRU’s performance on certain tasks of polyphonic music modeling and natural language processing was found to be similar to that of LSTM. GRUs have been shown to exhabit better performance on certain smaller and less frequent datasets.

  • Architecture

There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit.

残差网络 ResNet

The operator ⊙ denotes the Hadamard product in the following.

残差网络是为了解决深度神经网络(DNN)隐藏层过多时的网络退化问题而提出。退化(degradation)问题是指:当网络隐藏层变多时,网络的准确度达到饱和然后急剧退化,而且这个退化不是由于过拟合引起的。

编码器和变分自编码器 Autoencoders and Variational Autoencoders (AE & VAE)

生成式对抗网络 Generative Adversarial Networks (GAN)

Understanding Generative Adversarial Networks (GANs) | by Joseph Rocca | Towards Data Science

Generative Adversarial Networks (GANs for short) have had a huge success since they were introduced in 2014 by Ian J. Goodfellow and co-authors in the article Generative Adversarial Nets.

生成模型

图神经网络 Graph Neural Network (GNN)

Attention and Transformer

Attention (machine learning) - Wikipedia

Attention and Self-Attention

之前的 seq2seq 模型难以处理长序列,于是 Attention 被提出。
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) – Jay Alammar – Visualizing machine learning one concept at a time. (jalammar.github.io)

Atttention 的目的在于使得模型可以专注于输入中的核心部分,但是 Attention 存在的一个问题在于计算量和输入长度的平方成正比。
Demystifying efficient self-attention | by Thomas van Dongen | Towards Data Science

Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences.

Transformer

Temporal Fusion Transformer (TFT)

FNet: Mixing Tokens with Fourier Transforms

Neural Operator

Fourier Neural Operator (FNO)

Graph Neural Operator (GNO)

作者

Shangyu ZHAO

发布于

2022-02-11

更新于

2024-04-19

许可协议