Cheat Sheet

Which Machine Learning Algorithm Is Right for You?

You have data and an application, but which algorithm should you try first? There are tradeoffs no matter what you choose. Here are some basic principles to get you started.

Size of Your Dataset

Algorithms are very sensitive to the size of your dataset. While there are no absolute rules that dictate which algorithm should be used for datasets under 50 MB or over 1 TB, here are the algorithms you may want to start with given the amount of data you have and assuming your sample dataset is balanced.

  • Decision trees
  • Linear models (including logistic regression and linear discriminant)

  • (Nonlinear) SVM
  • Naïve Bayes
  • Nearest neighbor
  • Neural network (shallow)

  • Deep nets
  • Ensembles

Training Speed

Training speed is how long the new model takes to build and train for a given computational resource. Factors like algorithm architecture and complexity (among others) affect how quickly the model will train. Here are algorithms to consider if your project is very sensitive to training speed and you don’t have acceleration hardware.

  • Decision trees
  • Linear models (including logistic regression and linear discriminant)
  • Naïve Bayes

  • Ensembles
  • Nearest neighbor
  • Neural network (shallow)

  • (Nonlinear) SVM

  • Deep nets

Interpretability

Machine learning models can be non-intuitive and difficult to understand. Interpretability refers how transparent the algorithm’s decision-making process is. However, interpretability often comes at the expense of power and accuracy. Different industries and applications can also have specific requirements around interpretability. To get you started, here are some basic ratings on how easy or difficult to interpret the algorithms are.

  • Decision trees
  • Linear models (including logistic regression and linear discriminant)

  • Nearest neighbor
  • Neural network (shallow)
  • Naïve Bayes

  • (Nonlinear) SVM
  • Ensembles
  • Deep nets

Tuning

Tuning is when you optimize the parameters or hyperparameters of a specific model to find the best result for your model. Some algorithms don’t want to be tuned, and limit the number of parameters or hyperparameters you can change to optimize it for your application. After you choose a particular type of model to train, you can automatically change the parameters that strongly affect its performance to optimize your model. How much tuning do you want to be able to perform?

  • Linear models (including logistic regression and linear discriminant)
  • Nearest neighbor

  • Decision trees
  • (Nonlinear) SVM
  • Ensembles
  • NaÏve Bayes
  • Neural network (shallow)

  • Deep nets