Build predictive models from known input and response data with machine learning techniques

Supervised learning is the most common type of machine learning algorithms. It uses a known dataset (called the training dataset) to train an algorithm with a known set of input data (called features) and known responses to make predictions. The training dataset includes labeled input data that pair with desired outputs or response values. From it, the supervised learning algorithm seeks to create a model by discovering relationships between the features and output data and then makes predictions of the response values for a new dataset.

Prior to applying supervised learning, unsupervised learning is frequently used to discover patterns in the input data that suggest candidate features, and feature engineering transforms them to be more suitable for supervised learning. In addition to identifying features, the correct category or response needs to be identified for all observations in the training set, which is a very labor-intensive step. Semi-supervised learning lets you train models with very limited labeled data and thus reduce the labelling effort.

Once the algorithm is trained, a test dataset, which hasn’t been used for training, is typically used to predict the performance of the algorithm and validate it. To obtain accurate performance results, it is critical that both the training and test set are a good representation of “reality”( i.e., data from the production environment and the model were both validated correctly).

Q&A on model validation

You can train, validate, and tune predictive supervised learning models in MATLAB® with Deep Learning Toolbox™, and Statistics and Machine Learning Toolbox™.

Supervised Learning Algorithms Categories

Classification: Used for categorical response values, where the data can be separated into specific classes. A binary classification model has two classes and a multiclass classification model has more. You can train classification models with the Classification Learner app with MATLAB.

Common classification algorithms include:

Regression: Used for numerical continuous-response values. You can train regression models with the Regression Learner app with MATLAB.

Common regression algorithms include:

Supervised Learning Applications

Supervised learning is used in financial applications for credit scoring, algorithmic trading, and bond classification, in image and video applications for object classification and tracking, in industrial applications for outlier detection, in predictive maintenance for life of equipment estimates, in biological applications for tumor detection and drug discovery, and in energy applications for price and load forecasting.


Let's assume you want to predict housing prices and have historical data on the housing sales with home sizes, locations, and year sold as features, and the actual sale price as known response. That is an excellent use case for supervised regression, and you can try this out yourself in this example. The weights of a linear model shown below make sense: type and size of home, year built, and neighborhood indeed determine home values. The residual plot indicates the linear model captures the relationship between variables and price reasonably well

See also: Statistics and Machine Learning Toolbox, Deep Learning Toolbox, machine learning, unsupervised learning, AdaBoost, linear regression, nonlinear regression, data fitting, data analysis, mathematical modeling, predictive modeling, artificial intelligence, AutoML, regularization

Mastering Machine Learning: A Step-by-Step Guide with MATLAB

How much do you know about machine learning?