MathWorks - Mobile View
  • Sign In to Your MathWorks AccountSign In to Your MathWorks Account
  • Access your MathWorks Account
    • My Account
    • My Community Profile
    • Link License
    • Sign Out
  • Products
  • Solutions
  • Academia
  • Support
  • Community
  • Events
  • Get MATLAB
MathWorks
  • Products
  • Solutions
  • Academia
  • Support
  • Community
  • Events
  • Get MATLAB
  • Sign In to Your MathWorks AccountSign In to Your MathWorks Account
  • Access your MathWorks Account
    • My Account
    • My Community Profile
    • Link License
    • Sign Out

Videos and Webinars

  • MathWorks
  • Videos
  • Videos Home
  • Search
  • Videos Home
  • Search
  • Contact sales
  • Trial software
  Register to watch video
  • Description
  • Full Transcript
  • Related Resources

Applied Machine Learning, Part 3: Hyperparameter Optimization

From the series: Applied Machine Learning

Adam Filion, MathWorks

Machine learning is all about fitting models to data. This process typically involves using an iterative algorithm that minimizes the model error. The parameters that control a machine learning algorithm’s behavior are called hyperparameters. Depending on the values you select for your hyperparameters, you might get a completely different model. So, by changing the values of the hyperparameters, you can find different, and hopefully better, models.    

This video walks through techniques for hyperparameter optimization, including grid search, random search, and Bayesian optimization. It explains why random search and Bayesian optimization are superior to the standard grid search, and it describes how hyperparameters relate to feature engineering in optimizing a model.

Machine learning is all about fitting models to data. The models consist of parameters, and we find the value for those through the fitting process. This process typically involves some type of iterative algorithm that minimizes the model error. That algorithm has parameters that control how it works, and those are what we call hyperparameters.

In deep learning, we also call the parameters that determine the layer characteristics hyperparameters. Today, we’ll be talking about techniques for both.

So, why do we care about hyperparameters?  Well, it turns out that most machine learning problems are non-convex. This means that depending on the values we select for the hyperparameters, we might get a completely different model. By changing the values of the hyperparameters, we can find different, and hopefully better, models.  

Ok, so we know that we have hyperparameters, and we know we want to tweak them, but how do we do that? Some hyperparameters are continuous, some are binary, and others might take on any number of discrete values. This makes for a tough optimization problem. It is almost always impossible to run an exhaustive search of the hyperparameter space, since it takes too long.  

So, traditionally, engineers and researchers have used techniques for hyperparameter optimization like grid search and random search. In this example, I’m using a grid search method to vary 2 hyperparameters – Box Constraint and Kernel Scale – for an SVM model.  As you can see, the error of the resulting model is different for different values of the hyperparameters. After 100 trials, the search has found 12.8 and 2.6 to be the most promising values for these hyperparameters.

Recently, random search has become more popular than grid search. 

 “How could that be?” you may be asking.

Wouldn’t grid search do a better job of evenly exploring the hyperparameter space?  

Let’s imagine you have 2 hyperparameters, “A” and “B”. Your model is very sensitive to “A,” but not sensitive to “B.”  If we did a 3x3 grid search, we would only ever evaluate 3 different values of “A.” But if we did a random search, we would probably get 9 different values of “A”, even though some may be close together. As a result, we have a much better chance of finding a good value for “A.”  In machine learning, we often have many hyperparameters. Some have a big influence over the results, and some don’t.  So random search is typically a better choice.

Grid search and random search are nice because it’s easy to understand what’s going on.  However, they still require many function evaluations. They also don’t take advantage of the fact that, as we evaluate more and more combinations of hyperparameters, we learn how those values affect our results. For that reason, you can use techniques that create a surrogate model – or an approximation of the error as a function of the hyperparameters.

Bayesian optimization is one such technique. Here we see an example of a Bayesian optimization algorithm running, where each dot corresponds to a different combination of hyperparameters. We can also see the algorithm’s surrogate model, shown here as the surface, which it is using to pick the next set of hyperparameters.

One other really cool thing about Bayesian optimization is that it doesn’t just look at how accurate a model is. It can also take into account how long it takes to train.  There could be sets of hyperparameters that cause the training time to increase by factors of 100 or more, and that might not be so great if we’re trying to hit a deadline. You can configure Bayesian optimization in a number of ways, including expected improvement per second, which penalizes hyperparameter values that are expected to take a very long time to train.

Now, the main reason to do hyperparameter optimization is to improve the model.  And, although there are other things we could do to improve it, I like to think of hyperparameter optimizations as being a low-effort, high-compute type of approach. This is in contrast to something like feature engineering, where you have higher effort to create the new features, but you need less computational time. It’s not always obvious which activity is going to have the biggest impact, but the nice thing about hyperparameter optimization is it lends itself well to “overnight runs,” so you can sleep while your computer works.

That was a quick explanation of hyperparameter optimization. For more information, check out the links in the description.

 

 

 

Related Products

  • Statistics and Machine Learning Toolbox

Learn More

Bayesian Optimization Workflow
Model Building and Assessment
Bayesian Optimization Documentation
What Is AutoML?
Related Information
MATLAB for Machine Learning

Feedback

Featured Product

Statistics and Machine Learning Toolbox

  • Request Trial
  • Get Pricing

Up Next:

Walk through several key techniques and best practices for running your machine learning model on embedded devices. 
2:30
Part 4: Embedded Systems
View full series (4 Videos)

Related Videos:

34:34
Machine Learning Made Easy
5:36
Machine Learning for Predictive Modelling (Highlights)
44:37
Machine Learning for Predictive Modelling
41:25
Machine Learning with MATLAB
34:31
Machine Learning with MATLAB: Getting Started with...

View more related videos

MathWorks - Domain Selector

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web site

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
    • 简体中文Chinese
    • English
  • 日本Japanese (日本語)
  • 한국Korean (한국어)

Contact your local office

  • Contact sales
  • Trial software

Explore Products

  • MATLAB
  • Simulink
  • Student Software
  • Hardware Support
  • File Exchange

Try or Buy

  • Downloads
  • Trial Software
  • Contact Sales
  • Pricing and Licensing
  • How to Buy

Learn to Use

  • Documentation
  • Tutorials
  • Examples
  • Videos and Webinars
  • Training

Get Support

  • Installation Help
  • Answers
  • Consulting
  • License Center
  • Contact Support

About MathWorks

  • Careers
  • Newsroom
  • Social Mission
  • Contact Sales
  • About MathWorks

MathWorks

Accelerating the pace of engineering and science

MathWorks is the leading developer of mathematical computing software for engineers and scientists.

Discover…

  • Select a Web Site United States
  • Patents
  • Trademarks
  • Privacy Policy
  • Preventing Piracy
  • Application Status

© 1994-2021 The MathWorks, Inc.

  • Facebook
  • Twitter
  • Instagram
  • YouTube
  • LinkedIn
  • RSS

Join the conversation

This website uses cookies to improve your user experience, personalize content and ads, and analyze website traffic.  By continuing to use this website, you consent to our use of cookies.  Please see our Privacy Policy to learn more about cookies and how to change your settings.