Algorithms-by-meme-08: Random Forest

Random Forest là một thuật ngữ nổi tiếng cho một tập hợp các cây quyết định (decision trees). Trong Random Forest, chúng ta tập hợp các desicion trees (được gọi là Forest). Để phân biệt một đối tượng mới dựa trên các thuộc tính, mỗi tree được cho một phân loại và chúng ta sẽ nói rằng mỗi tree sẽ vote cho loại đó. Forest sẽ chọn các phân lọai có nhiều votes nhất (thông qua toàn thể tree trong forest).

Mỗi tree được trồng và phát triển.


Before understanding random forests, there are a couple of terms that you’ll need to know:

  • Ensemble learning is a method where multiple learning algorithms are used in conjunction. The purpose of doing so is that it allows you to achieve higher predictive performance than if you were to use an individual algorithm by itself.
  • Bootstrap sampling is a resampling method that uses random sampling with replacement. It sounds complicated but trust me when I say it’s REALLY simple — read more about it here.
  • Bagging when you use the aggregate of the bootstrapped datasets to make a decision — I dedicated an article to this topic so feel free to check it out here if this doesn’t make complete sense.

Now that you understand these terms, let’s dive into it.

Random forests are an ensemble learning technique that builds off of decision trees. Random forests involve creating multiple decision trees using bootstrapped datasets of the original data and randomly selecting a subset of variables at each step of the decision tree. The model then selects the mode of all of the predictions of each decision tree (bagging). What’s the point of this? By relying on a “majority wins” model, it reduces the risk of error from an individual tree.

For example, if we created one decision tree, the third one, it would predict 0. But if we relied on the mode of all 4 decision trees, the predicted value would be 1. This is the power of random forests!

Leave a reply:

Your email address will not be published.

Site Footer