Random Forest

Random Forest (Machine Learning)

Why use Random Forest?

  • It takes less training time as compared to other algorithms.
  • It predicts output with high accuracy, even for the large dataset, it runs efficiently.
  • Furthermore, it can also maintain accuracy when a large proportion of data is missing.
  1. Selects k features (columns) from the dataset (table) with a total of m features randomly (where k<<m). Then, it builds a Decision Tree from those k features.
  2. Repeats n times so that you have n Decision Trees built from different random combinations of k features (or a different random sample of the data, called bootstrap sample).
  3. Takes each of the n built Decision Trees and passes a random variable to predict the outcome. Stores the predicted outcome (target), so that you have a total of n outcomes from the n Decision Trees.
  4. Calculates the votes for each predicted target and takes the mode (most frequent target variable). In other words, considers the high voted predicted target as the final prediction from the random forest algorithm.
  • In case of a regression problem, for a new record, each tree in the forest predicts a value for Y (output). The final value can be calculated by taking the average of all the values predicted by all the trees in a forest. Or, in case of a classification problem, each tree in the forest predicts the category to which the new record belongs. Finally, the new record is assigned to the category that wins the majority vote.
  1. Can be used for both classification and regression problems: Random Forest works well when you have both categorical and numerical features.
  2. Reduction in overfitting: by averaging several trees, there is a significantly lower risk of overfitting.
  3. Make a wrong prediction only when more than half of the base classifiers are wrong: Random Forest is very stable — even if a new data point is introduced in the dataset, the overall algorithm is not affected much as new data may impact one tree, but it is very hard for it to impact all the trees.
  1. Random forests have been observed to overfit some datasets with noisy classification/regression tasks.
  2. More complex and computationally expensive than the decision tree algorithm.
  3. Due to their complexity, they require much more time to train than other comparable algorithms.

“Time is more precious than anything, Make it accountable”




A Machine Learning technology researcher commencing a new Blog Series to make clear ML concepts Simple and ease for everyone.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Dog Breed Classification using CNN

Fine-Grained Image Similarity Detection Using Facebook AI Similarity Search(FAISS)

AWS DeepRacer — Looking under the hood for design of the reward function and adjusting…

Automate Facebook Messenger with Deep Learning Chatbot

Ask Julio to explain Machine Learning

Running a Convolutional Neural Network on Raspberry PI

A Formal Hierarchy of RNN Architectures

Automating Time Series Forecasting

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
sai krishna

sai krishna

A Machine Learning technology researcher commencing a new Blog Series to make clear ML concepts Simple and ease for everyone.

More from Medium

Z-score in detail with examples


Critical analysis

A pile of rocks, minerals and fossils

linearity and restrictions of media