# Classifying cifar-10 data with AlexNet

2021-04-25

Last week we used PaddlePaddle and Tensorflow to implement image classification. We identified the cifar-10 dataset with a simple CNN network simple_cnn handwritten by ourselves and a CNN network by LeNet-5. In the experimental performance last week, the accuracy rate of LeNet-5 after 200 iterations was about 60%. This result was far from satisfactory. After all, it was the network structure written 20 years ago. The result was simple and the number of layers was very small. In the section we talked about the AlexNet that shined in the 2012 Image contest and used AlexNet to classify the cifar-10 data, comparing the effect of LeNet-5 last week.

What is AlexNet?

AlexNet achieved a breakthrough rate of top5 error rate of 15.3% (26.2% in the second) in the ILSVRC-2012 game. The principle is derived from Alex's “ImageNet Classification with Deep Convolutional Neural Networks” 2012, which is deep learning. With a milestone and watershed in the booming development and with the development of hardware technology, deep learning will continue.

AlexNet network structure

Due to the limitations of the hardware devices at the time, AlexNet did a lot of work on GPU granularity. At the time, GTX 580 only had 3G graphics memory. In order to allow the model to run on a large amount of data, the author used two GPUs in parallel and made network structures. The segmentation is as follows:

Network structure

Input input layer

The input is a 224×224×3 three-channel RGB image. In order to facilitate the subsequent calculation, the padding is used as a preprocessing in the actual operation, and the image is changed to 227×227×3.

C1 convolution layer

This layer consists of: Convolution operation + Max Pooling + LRN (more on this later).

Convolution layer: Consisting of 96 feature maps, each feature map is generated by an 11×11 convolution kernel under stride=4, and the output feature map is 55×55×48×2, where 55=(227-11)/ 4+1, 48 is the number of feature map points on each GPU, and 2 is the number of GPUs.

Activation function: using ReLU;

Max Pooling: Using stride=2 and a kernel size of 3x3 (Experimental results show that Max Pooling using a 2×2 non-overlapping mode is relatively easy to overfit, and the error rates are 0.4% higher at top 1 and top 5, respectively. And 0.3%), the output feature map is 27 × 27 × 48 × 2, where 27 = (55-3) / 2 + 1, 48 is the number of feature map points on each GPU, 2 is the number of GPU;

LRN: The number of neighbors is set to 5 for normalization.

The final output data is normalized: 27×27×48×2.

C2 convolution layer

This layer consists of: Convolution operation + Max Pooling + LRN

Convolution layer: Consists of 256 feature maps. Each feature map is generated by a 5×5 convolution kernel at stride=1. To make the input and convolution output the same size, a padding with a parameter of 2 is required, and the feature map is output. It is 27×27×128×2, where 27=(27-5+2×2)/1+1, 128 is the number of feature map points on each GPU, and 2 is the number of GPUs.

Activation function: using ReLU;

Max Pooling: With stride=2 and a core size of 3x3, the output feature map is 13x13x128x2, where 13=(27-3)/2+1, 128 is the feature on each GPU Number of maps, 2 is the number of GPUs;

LRN: The number of neighbors is set to 5 for normalization.

The final output data is normalized: 13×13×128×2.

C3 convolution layer

This layer consists of: Convolution operation + LRN (note that there is no Pooling layer)

The input is 13 x 13 x 256 because the two GPUs at this layer will communicate (dotted line intersection)

Convolution layer: It is composed of 384 feature maps. Each feature map is generated by 3×3 convolution kernel under stride=1. In order to make the input and convolution output the same size, it needs to be parameterized as padding, output feature. The map is 13×13×192×2, where 13=(13-3+2×1)/1+1,192 is the number of feature map points on each GPU, and 2 is the number of GPUs.

Activation function: using ReLU;

The final output data is normalized: 13×13×192×2.

C4 convolutional layer

This layer consists of: Convolution operation + LRN (note that there is no Pooling layer)

Convolution layer: It is composed of 384 feature maps. Each feature map is generated by 3×3 convolution kernels under stride=1. In order to make the input and convolution output the same size, padding with parameter 1 is required, and the output feature map It is 13×13×192×2, where 13=(13-3+2×1)/1+1,192 is the number of feature maps on each GPU, and 2 is the number of GPUs.

Activation function: using ReLU;

The final output data is normalized: 13×13×192×2.

C5 convolution layer

This layer consists of: Convolution operation + Max Pooling

Convolution layer: Consists of 256 feature maps. Each feature map is generated by a 3×3 convolutional kernel at stride=1. To make the input and convolution output the same size, a padding with a parameter of 1 is required to output the feature map. It is 13×13×128×2, where 13=(13-3+2×1)/1+1, 128 is the number of feature map points on each GPU, and 2 is the number of GPUs.

Activation function: using ReLU;

Max Pooling: With stride=2 and a core size of 3x3, the output feature map is 6x6x128x2, where 6=(13-3)/2+1, 128 is the feature on each GPU Number of maps, 2 is the number of GPUs.

The final output data is normalized: 6×6×128×2.

F6 full connection layer

This layer is fully connected + Dropout

Use 4096 nodes;

Activation function: using ReLU;

Dropout operation with parameter 0.5

The final output data is 4096 neuron nodes.

F7 full connection layer

This layer is fully connected + Dropout

Use 4096 nodes;

Activation function: using ReLU;

Dropout operation with parameter 0.5

The final output is 4096 neuron nodes.

Output layer

This layer is fully connected + Softmax

Softmax with 1000 outputs

The final output is 1000 categories.

The AlexNet advantage

1. Use ReLu activation function

---- Original Relu-----

AlexNet introduced the ReLU activation function, a more accurate activation model proposed by neuroscientist Dayan and Abott in Theoretical Neuroscience. The original Relu activation function (see Hinton paper: "Rectified Linear Units Improve Restricted Boltzmann Machines") we are more familiar with, namely max (0, x)

This activation function clears all negative activations (simulating the sparseness mentioned above). This practice in practice retains the neural network's non-linearity and speeds up the training. However, this function also has disadvantages:

In the gradient calculation where the origin is not diffusively back-propagated, it is troublesome, so Charles Dugas et al. proposed Softplus to simulate the above ReLu function (can be regarded as a smooth version of it):

In fact its derivative is a

Sparseness

When the learning rate is set incorrectly, even a large gradient, the neuron may never be activated after passing through the ReLu unit and updating the parameters.

----Leaky ReLu----

In order to solve the problem of the inactivation of a large number of neurons caused by the above sparseness, Leaky ReLu was raised:

Where α is the artificially-set smaller value (eg, 0.1), it retains negative activation information to some extent.

There are many other improvements to the ReLu function, such as Parametric ReLu, Randomized ReLu, etc., which will not be discussed here.

2.Local Response Normalization

LRN makes use of feature maps to make features significantly. Experiments in the text show that the error rate can be reduced. The formula is as follows:

The intuitive explanation of the formula is as follows:

Since α is passed through the RELU output, it must be greater than 0.

Take the parameters of the image as follows (abscissa ):

when When the value is small, that is, the difference between the output value of the current node and its neighbor nodes is not obvious and everyone's output value is not too large, it can be considered that the competition between the features is fierce, and the function can make the output with a small difference between the original and the significant difference. At this point the function output is not saturated

when When the value is larger, it means that the feature itself has significant differences but the output value is too large to overfit. This function can make the final output close to 0 to ease overfitting and improve model generalization.

3.Dropout

Dropout is one of the highlights of the article. It belongs to the method of improving the generalization of the model. The operation is relatively simple. Some neurons output is set to 0 randomly with a certain probability. Neither participate in the forward propagation nor the reverse propagation, and can also be Regularize it to see it. (Regularization of deep learning had a share in the company at the beginning of the year, and it was released directly in the next time)

From the perspective of model integration:

No Dropout Network:

Dropout network:

Where p is the probability of Dropout (eg p=0.5, ie, 50% of neurons are randomly deactivated), and n is the layer in which it is located.

It is Bagging under extreme conditions, because in each step of training, neurons are randomly set to be invalid with a certain probability, which is equivalent to the new network structure of parameter sharing. Each model will learn as much as possible in order to reduce losses. The essence of "essence" can be understood as a feature that is extracted from more independent neurons with weak correlations with other neurons and has strong generalization ability; and if training is performed in a manner similar to SGD, each iteration will be Select different data sets so that the entire network is equivalent to an integrated combination of multiple models that are learned with different data sets.

Implement AlexNet with PaddlePaddle

1. Network Structure (alexnet.py)

This time I wrote two alextnets, one with a local mean LRN normalized, one without an LRN, and the contrast effect

#coding:utf-8 ''' Created by huxiaoman 2017.12.5 alexnet.py:alexnet network structure''' import paddle.v2 as paddle import os with_gpu = os.getenv('WITH_GPU', '0') != ' 1' def alexnet_lrn(img): conv1 = paddle.layer.img_conv(input=img, filter_size=11, num_channels=3, num_filters=96, stride=4, padding=1) cmrnorm1 = paddle.layer.img_cmrnorm( input= Conv1, size=5, scale=0.0001, power=0.75) pool1 = paddle.layer.img_pool(input=cmrnorm1, pool_size=3, stride=2) conv2 = paddle.layer.img_conv( input=pool1, filter_size=5, Num_filters=256, stride=1, padding=2, groups=1) cmrnorm2 = paddle.layer.img_cmrnorm( input=conv2, size=5, scale=0.0001, power=0.75) pool2 = paddle.layer.img_pool(input= Cmrnorm2, pool_size=3, stride=2) pool3 = paddle.networks.img_conv_group (input=pool2, pool_size=3, pool_stride=2, conv_num_filter=[384, 384, 256], conv_filter_size=3, pool_type=paddle.pooling. Max()) fc1 = paddle.layer.fc( Input=pool3, size=4096, act=paddle.activation.Relu(), layer_attr=paddle.attr.Extra(drop_rate=0.5)) fc2 = paddle.layer.fc( input=fc1, size=4096, act=paddle .activation.Relu(), layer_attr=paddle.attr.Extra(drop_rate=0.5)) return fc2 def alexnet(img): conv1 = paddle.layer.img_conv( input=img, filter_size=11, num_channels=3, num_filters= 96, stride=4, padding=1) cmrnorm1 = paddle.layer.img_cmrnorm( input=conv1, size=5, scale=0.0001, power=0.75) pool1 = paddle.layer.img_pool(input=cmrnorm1, pool_size=3, Stride=2) conv2 = paddle.layer.img_conv(input=pool1, filter_size=5, num_filters=256, stride=1, padding=2, groups=1) cmrnorm2 = paddle.layer.img_cmrnorm(input=conv2, size= 5, scale=0.0001, power=0.75) pool2 = paddle.layer.img_pool(input=cmrnorm2, pool_size=3, stride=2) pool3 = paddle.networks.img_conv_group(input=pool2, pool_size=3, pool_stride=2, Conv_num_filter=[384, 384, 25 6], conv_filter_size=3, pool_type=paddle.pooling.Max()) fc1 = paddle.layer.fc( input=pool3, size=4096, act=paddle.activation.Relu(), layer_attr=paddle.attr.Extra (drop_rate=0.5)) fc2 = paddle.layer.fc( input=fc1, size=4096, act=paddle.activation.Relu(), layer_attr=paddle.attr.Extra( drop_rate=0.5)) return fc3

2. Training code (train_alexnet.py)

#coding:utf-8 ''' Created by huxiaoman 2017.12.5 train_alexnet.py:Train alexnet to categorize the cifar10 data set''' import sys, os import paddle.v2 as paddle #alex model from alexnet without LRN Import alexnet #alexnet_lrn with lrn #from alextnet import alexnet_lrn with_gpu = os.getenv('WITH_GPU', '0') != '1' def main(): datadim = 3 * 32 * 32 classdim = 10 # PaddlePaddle Init paddle.init(use_gpu=with_gpu, trainer_count=7) image = paddle.layer.data( name="image", type=paddle.data_type.dense_vector(datadim)) # Add neural network config # option 1. resnet # net = resnet_cifar10(image, depth=32) # option 2. vgg #net = alexnet_lrn(image) net = alexnet(image) out = paddle.layer.fc( input=net, size=classdim, act=paddle.activation.Softmax ()) lbl = paddle.layer.data( name="label", type=paddle.data_type.integer_value(classdim)) cost = paddle.layer.classification_cost(input=out, label=lbl) # Create parameters parameters = paddle .parameters.create(c Ost) # Create optimizer momentum_optimizer = paddle.optimizer.Momentum( momentum = 0.9, regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128), learning_rate=0.1 / 128.0, learning_rate_decay_a=0.1, learning_rate_decay_b=50000 * 100, learning_rate_schedule= 'discexp') # End batch and end pass event handler def event_handler(event): if isinstance(event, paddle.event.EndIteration): if event.batch_id % 100 == 0: print "Pass %d, Batch %d, Cost %f, %s" % ( event.pass_id, event.batch_id, event.cost, event.metrics) else: sys.stdout.write('.') sys.stdout.flush() if isinstance(event, paddle .event.EndPass): # save parameters with open('params_pass_%d.tar' % event.pass_id, 'w') as f: parameters.to_tar(f) result = trainer.test( reader=paddle.batch( paddle .dataset.cifar.test10(), batch_size=128), feeding={'image': 0, 'label': 1}) print "Test with Pass %d, %s" % (event.pass_id, result.metrics) # Create trainer trainer = paddle.trainer.SGD( cost=cost, parameters=parameters, update_equation=momentum_optimizer) # Save the inference topology to protobuf . inference_topology = paddle.topology.Topology(layers=out) with open("inference_topology.pkl", 'wb') as f: inference_topology.serialize_for_inference(f) trainer.train( reader=paddle.batch( paddle.reader.shuffle ( paddle.dataset.cifar.train10(), buf_size=50000), batch_size=128), num_passes=200, event_handler=event_handler, feeding={'image': 0, 'label': 1}) # inference from PIL import Image import numpy as np import os def load_image(file): im = Image.open(file) im = im.resize((32, 32), Image.ANTIALIAS) im = np.array(im).astype(np. Float32) im = im.transpose((2, 0, 1)) # CHW im = im[(2, 1, 0), :, :] # BGR im = im.flatten() im = im / 255.0 return im Test_data = [] cur _dir = os.path.dirname(os.path.realpath(__file__)) test_data.append((load_image(cur_dir + '/image/dog.png'), )) probs = paddle.infer( output_layer=out, parameters= Parameters, input=test_data) lab = np.argsort(-probs) # probs and lab are the results of one batch data print "Label of image/dog.png is: %d" % lab[0][0] if __name__ == '__main__': main()

Implement AlexNet with Tensorflow

1. Network structure

Def inference(images): ''' Alexnet model input: tensor of images Returns: last layer of Alexnet's convolutional layer''' parameters = [] # conv1 with tf.name_scope('conv1') as scope: kernel = tf .Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32, stddev=1e-1), name='weights') conv = tf.nn.conv2d(images, kernel, [1 , 4, 4, 1], padding='SAME') biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='biases') bias = tf.nn.bias_add(conv, biases) conv1 = tf.nn.relu(bias, name=scope) print_activations(conv1) parameters += [kernel, biases] # lrn1 with tf.name_scope('lrn1') as scope : lrn1 = tf.nn.local_response_normalization(conv1, alpha=1e-4, beta=0.75, depth_radius=2, bias=2.0) # pool1 pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], Strides=[1, 2, 2, 1], padding='VALID', name='pool1') print_activations(pool1) # conv2 with tf.name_scope('conv2') as scope: kernel = tf.Variable(tf. Truncated_normal([5, 5, 64, 192], dtype=tf.float32, stddev=1e-1), name='weights') conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1 1], padding='SAME') biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32), trainable=True, name='biases') bias = tf.nn. Bias_add(conv, biases) conv2 = tf.nn.relu(bias, name=scope) parameters += [kernel, biases] print_activations(conv2) # lrn2 with tf.name_scope('lrn2') as scope: lrn2 = tf. Nn.local_response_normalization(conv2, alpha=1e-4, beta=0.75, depth_radius=2, bias=2.0) # pool2 pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides= [1, 2, 2, 1], Padding='VALID', name='pool2') print_activations(pool2) # conv3 with tf.name_scope('conv3') as scope: kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], Dtype=tf.float32, stddev=1e-1), name='weights') conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME') biases = tf .Variable(tf.constant(0.0, shape=[384], dtype=tf.float32), trainable=True, name='biases') bias = tf.nn.bias_add(conv, biases) conv3 = tf.nn. Relu(bias, name=scope) parameters += [kernel, biases] print_activations(conv3) # conv4 with tf.name_scope('conv4') as scope: kernel = tf.Variable(tf.truncated_normal([3, 3, 384 , 256], dtype=tf.float32, stddev=1e-1), name='weights') conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME' ) biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trai Nable=True, name='biases') bias = tf.nn.bias_add(conv, biases) conv4 = tf.nn.relu(bias, name=scope) parameters += [kernel, biases] print_activations(conv4) # conv5 With tf.name_scope('conv5') as scope: kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights' ) conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME') biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf .float32), trainable=True, name='biases') bias = tf.nn.bias_add(conv, biases) conv5 = tf.nn.relu(bias, name=scope) parameters += [kernel, biases] print_activations( Conv5) # pool5 pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool5') Print_activations(pool5) return pool5, parameters

The full code is visible: alexnet_tf.py

Comparison of experimental results

After the three codes have run, compare the experimental results as shown in the figure:

It can be seen that under the conditions of the same number of batch_size, num_epochs, devices, and threads, the alexnet network results with LDR's paddlepaddle version works best. The shortest time is alexnet without LRN, both in terms of time and accuracy. The average is tensorflow version of alexnet, of course, tf version of the same increase in LRN, so LRN for the experimental results are still somewhat improved.

to sum up

AlexNet is an important network in image classification. In the process of learning, not only must we learn to write the network structure, we must know the structure of each layer, and more importantly, we must know why we should design it. What are the benefits of this design? What happens to the results of some adjustments to some parameters? Why do such changes occur? In practical applications, if you need to make some adjustments to the network structure, how should you make adjustments to make the network more suitable for our actual data? These are our concerns. It is also a point that is often visited during an interview. Yesterday I interviewed an algorithm engineer who was working for five years and asked him if the model he used in the project was alexnet. The network structure for alexnet is not very clear. If you want to change the network structure, you don't know how to change it. This is actually not good. Simply running the model is just the first step, and there is still a lot of work to do afterwards, which is also one of the values embodied as an algorithm engineer. This article for the alexnet's network structure reference to the article written by my previous leaders, if there is anything you do not understand can leave a message.

PD Toys plastic Co., Ltd is OEM & ODM manufacturer of inflatable products in the mainland of China with more 17 years of manufacturing experience. products ranges are inflatable toys, inflatable pools, inflatable pool floats, towable tubes, air furniture and Promotional Items etc. total have more than 1500 employees (4 factories) related to PVC inflatable products.

Operated under ISO 9001:2015 management system, We had passed factory Audit by Walmart, Taret, Disney ect, also passed all necessary certificates and testing such as ICTI, BSCI, SEMTA,Target FA, NBC Universal, FCCA, SGS, CVS Security, GSV, Disney FAMA ect. We have our own PVC raw materials manufacturing company, all the PVC we produced are compliance with European EN71, American ASTM standard and NON PHTHALATE (6P) standard.

Inflatable Play Center, Paddling Pool, Kids Pool Withslide, Inflatable Pool

Sports Floor,Plastic Floor Co., Ltd. http://www.leadingproduction.com