[Audio] Introduction to Deep Learning Pabitra Mitra Indian Institute of Technology Kharagpur [email protected] NSM Workshop on Accelerated Data Science.
[Audio] Deep Learning Based on neural networks Uses deep architectures Very successful in many applications.
[Audio] Perceptron Bias b x1 w1 Activation function Induced Field v Output y x2 w2 () Input values Summing function xm wm weights.
[Audio] Neuron Models The choice of activation function determines the neuron model. if ( ) Examples: step function: c v b c v a v if c v a if ramp function: v d v b ) ( c d a b c v a )) otherwise ) /( )( (( if sigmoid function with z,x,y parameters 1 ( ) y xv z v ) exp( 1 2 Gaussian function: v v 2 1 exp 2 1 ( ) .
[Audio] Sigmoid unit x01 w0 w1 w2 x1 x2 S f f (net) o n : : net wn ixi w i 0 xn f is the sigmoid function 1 ) ( e x x f 1 Derivative can be easily computed: Logistic equation df x ) ( ( ) 1 ( ) f x f x dx used in many applications other functions possible (tanh) Single unit: apply gradient descent rule Multilayer networks: backpropagation 5.
[Audio] Multi layer feed-forward NN (FFNN) FFNN is a more general network architecture, where there are hidden layers between input and output layers. Hidden nodes do not directly receive inputs nor send outputs to the external environment. FFNNs overcome the limitation of single-layer NN. They can handle non-linearly separable learning tasks. Output layer Input layer Hidden Layer 3-4-2 Network.
[Audio] Backpropagation Initialize all weights to small random numbers Repeat For each training example 1. Input the training example to the network and compute the network outputs 2. For each output unit k dk ← ok 1 ok tk ok 3. For each hidden unit h dh ← oh 1 oh Skoutputs wk,hdk 4. Update each network weight wj,i wj,i ← wj,i + Dwj,i where Dwj,i h dj xj,i 7.
[Audio] NN DESIGN ISSUES Data representation Network Topology Network Parameters Training Validation.
[Audio] Expressiveness Every bounded continuous function can be approximated with arbitrarily small error, by network with one hidden layer (Cybenko et al '89) Hidden layer of sigmoid functions Output layer of linear functions Any function can be approximated to arbitrary accuracy by a network with two hidden layers (Cybenko '88) Sigmoid units in both hidden layers Output layer of linear functions 9.
[Audio] Choice of Architecture Neural Networks Training Set vs Generalization error.
[Audio] Motivation for Depth. Motivation for Depth.
[Audio] Motivation: Mimic the Brain Structure Decision Decision Learning Higher Brain Neurons Arranged In Coupled Layers Mid/Low Level Feature Extraction End-to-End Neural Architecture Neurons Input Signal Sensory.
[Audio] Motivation Practical success in computer vision, signal processing, text mining Increase in volume and complexity of data Availability of GPUs.
[Audio] Convolutional Neural Network: Motivation.
CNN.
image conv-64 conv-64 maxpool conv•128 conv-128 maxpool conv•2S6 conv-256 maxpool conv-512 conv-512 conv-512 conv-512 maxpool FC-4096 FC.4096 FC-IOOO softmax Convolutional Layer 224 224 ooo 64 Can be implemented efficiently with convolutions 224 224 Every blue neuron is connected to a 3x3x3 array of inputs.
conv-64 conv-64 maxpool conv.128 conv-128 conv-2S6 conv-256 conv-512 conv-512 conv.512 conv-512 maxpool FC4096 FC.4096 FC-iOOO softmax 224 224 Pooling Layer 112 downsampling 112.
Max Pooling Layer Single depth slice x 5 3 1 1 6 2 2 2 7 1 3 4 8 4 max pool 6 3 8 4.
im conv•128 L conv-128 J maxpool J conv.512 conv-512 maxpool FC-4096 FC.4096 FC-IOOO Fully Connected Layer [1 xl x4096] "neurons" [7x7x512] Every "neuron" in the output: 1. 2. computes a dot product between the input and its weights f = -4- b' thresholds it at zero.
Every layer of a ConvNet has the same API: - Takes a 3D volume of numbers - Outputs a 3D volume of numbers - Constraint: function must be differentiable depth height ooo ooo •0 00 e width probabilities [lxix1000] image.
Example activation maps CONV POOL CONV CONV POOL FC CONV CONV POOL CONV ReLlJ ReLlJ ReLlJ ReLlJ ReLU ReLlJ (Fully-connected) uck ar irplane hip orse.
conv-64 maxpool conv.128 conv-128 maxpool conv-256 conv-256 maxpool conv-512 conv-512 maxpool conv.512 conv-512 FC-4096 FC.4096 FC-IOOO softmax [224x224x3] differentiable function 0.2 cat 0.4 dog 0.09 chair 0.01 bagel 0.3 banana 1000].
Training Loop until tired: 1. Sample a batch of data 2. Forward it through the network to get predictions 3. Backprop the errors 4. Update the weights.
[Audio] ResNetCNN+SkipConnectionsPyramidalcellsincortex.
[Audio] Full ResNet architecture: Stack residual blocks Every residual block has two 3x3 conv layers Periodically, double # of filters and downsample spatially using stride 2 (in each dimension) Additional conv layer at the beginning No FC layers at the end (only FC 1000 to output classes).
Densenet.
[Audio] Challenges of Depth Overfitting – dropout Vanishing gradient – ReLU activation Accelerating training – batch normalization Hyperparameter tuning.
[Audio] Computational Complexity. Computational Complexity.
[Audio] Types of Deep Architectures RNN, LSTM (sequence learning) Stacked Autoencoders (representation learning) GAN (classification, distribution learning) Combining architectures – unified backprop if all layers differentiable Tensorflow, PyTorch.
[Audio] References Introduction to Deep Learning – Ian Goodfellow Stanford Deep Learning course.