Convolutional Neural Network

3 things you need to know

What Is a Convolutional Neural Network?

A convolutional neural network (CNN or ConvNet) is one of the most popular algorithms for deep learning, a type of machine learning in which a model learns to perform classification tasks directly from images, video, text, or sound.

CNNs are particularly useful for finding patterns in images to recognize objects, faces, and scenes. They learn directly from image data, using patterns to classify images and eliminating the need for manual feature extraction.

Applications that call for object recognition and computer vision — such as self-driving vehicles and face-recognition applications — rely heavily on CNNs. Depending on your application, you can build a CNN from scratch, or use a pretrained model with your dataset.

What Makes CNNs So Useful?

Using CNNs for deep learning has become increasingly popular due to three important factors:

  • CNNs eliminate the need for manual feature extraction—the features are learned directly by the CNN.
  • CNNs produce state-of-the-art recognition results.
  • CNNs can be retrained for new recognition tasks, enabling you to build on pre-existing networks.

Deep learning workflow. Images are passed to the CNN, which automatically learns features and classifies objects.

CNNs Enable Advances in Object Detection and Object Recognition

CNNs provide an optimal architecture for image recognition and pattern detection. Combined with advances in GPUs and parallel computing, CNNs are a key technology underlying new developments in automated driving and facial recognition.

For example, deep learning applications use CNNs to examine thousands of pathology reports to visually detect cancer cells. CNNs also enable self-driving cars to detect objects and learn to tell the difference between a street sign and a pedestrian.

Learn More

How CNNs Work

A convolutional neural network can have tens or hundreds of layers that each learn to detect different features of an image. Filters are applied to each training image at different resolutions, and the output of each convolved image is used as the input to the next layer. The filters can start as very simple features, such as brightness and edges, and increase in complexity to features that uniquely define the object.

CNNs perform feature identification and classification of images, text, sound, and video.

Feature Learning, Layers, and Classification

Like other neural networks, a CNN is composed of an input layer, an output layer, and many hidden layers in between.

These layers perform operations that alter the data with the intent of learning features specific to the data. Three of the most common layers are: convolution, activation or ReLU, and pooling.

  • Convolution puts the input images through a set of convolutional filters, each of which activates certain features from the images.
  • Rectified linear unit (ReLU) allows for faster and more effective training by mapping negative values to zero and maintaining positive values. This is sometimes referred to as activation, because only the activated features are carried forward into the next layer.
  • Pooling simplifies the output by performing nonlinear downsampling, reducing the number of parameters that the network needs to learn.

These operations are repeated over tens or hundreds of layers, with each layer learning to identify different features.

Example of a network with many convolutional layers. Filters are applied to each training image at different resolutions, and the output of each convolved image is used as the input to the next layer.

Classification Layers

After learning features in many layers, the architecture of a CNN shifts to classification.

The next-to-last layer is a fully connected layer that outputs a vector of K dimensions where K is the number of classes that the network will be able to predict. This vector contains the probabilities for each class of any image being classified.

The final layer of the CNN architecture uses a classification layer such as softmax to provide the classification output.

Hardware Acceleration with GPUs

A convolutional neural network is trained on hundreds, thousands, or even millions of images. When working with large amounts of data and complex network architectures, GPUs can significantly speed the processing time to train a model. Once a CNN is trained, it can be used in real-time applications, such as pedestrian detection in advanced driver assistance systems (ADAS).

Using MATLAB with a CNN

Using MATLAB® with Deep Learning Toolbox™ enables you to train your own CNN from scratch or use a pretrained model to perform transfer learning.

Which method you choose depends on your available resources and the type of application you are building.

To train a network from scratch, the architect is required to define the number of layers and filters, along with other tunable parameters. Training an accurate model from scratch also requires massive amounts of data, on the order of millions of samples, which can take an immense amount of time.

A common alternative to training a CNN from scratch is to use a pretrained model to automatically extract features from a new data set. This method, called transfer learning, is a convenient way to apply deep learning without a huge dataset and long computation and training time.

Training From Scratch

Creating a network from scratch means you determine the network configuration. This approach gives you the most control over the network and can produce impressive results, but it requires an understanding of the structure of a neural network and the many options for layer types and configuration.

While results can sometimes exceed transfer learning (see below), this method tends to require more images for training, as the new network needs many examples of the object to understand the variation of features. Training times are often longer, and there are so many combinations of network layers that it can be overwhelming to configure a network from scratch. Typically, when constructing a network and organizing the layers, it helps to reference other network configurations to take advantage of what researchers have proven successful.

Learn More

Using Pretrained Models for Transfer Learning

Fine-tuning a pretrained network with transfer learning is typically much faster and easier than training from scratch. It requires the least amount of data and computational resources. Transfer learning uses knowledge from one type of problem to solve similar problems. You start with a pretrained network and use it to learn a new task. One advantage of transfer learning is that the pretrained network has already learned a rich set of features. These features can be applied to a wide range of other similar tasks. For example, you can take a network trained on millions of images and retrain it for new object classification using only hundreds of images.

With Deep Learning Toolbox, you can perform transfer learning with pretrained CNN models (such as GoogLeNet, AlexNet, vgg16, vgg19) and models from Caffe and TensorFlow-Keras.

Learn More

Applications Using CNNs

Object Detection

Object detection is the process of locating and classifying objects in images and video. Computer Vision Toolbox™ provides training frameworks to create deep learning-based object detectors using R-CNN (regions with CNN), Fast R-CNN, and Faster R-CNN.

You can use machine learning techniques from Statistics and Machine Learning Toolbox™ with Computer Vision Toolbox to create object recognition systems.

Deep Learning Toolbox provides functions for constructing and training CNNs, as well as making predictions with a trained CNN model.

This example shows how to train an object detector using deep learning and R-CNN (Regions with Convolutional Neural Networks).
This example shows how to train an object detector using a deep learning technique named Faster R-CNN (Regions with Convolutional Neural Networks).

How to Learn More About CNNs

Products that supporting using CNNs for image analysis include MATLAB, Computer Vision System Toolbox, Statistics and Machine Learning Toolbox, and Deep Learning Toolbox.

Convolutional neural networks require Deep Learning Toolbox. Training and prediction are supported on a CUDA® capable GPU with a compute capability of 3.0 or higher. Use of a GPU is recommended and requires Parallel Computing Toolbox™.

Software Reference

Get a Free Trial

30 days of exploration at your fingertips.

Have Questions?

Talk to a deep learning expert.