Many a times, we are overwhelmed with the size of a problem, and left confounded. A common and an effective problem-solving strategy is to “break it down” into smaller components. This breaking down, helps manage the chunks, instead of one monolithic monster. Decision trees employ similar techniques to solve prediction problems.

**What is a decision
tree?**

Decision trees are supervised
learning algorithms, that employ the **“divide
& conquer”** strategy to solve the prediction problems. The tree-like
structures, that these algorithms use to predict the outcomes, gives them the
name. The tree begins with the “root node” representing the complete data set
(or observation), and various strategies to split the root(parent) node, into
branches (also referred to as child or internal nodes).

A sample decision tree, for accepting a new job offer, is:

Source: https://dataaspirant.com

In the case of above example, the
output is discrete (accept or decline offer). The decision trees, with discrete
output, are used for solving classification problems, and called as **classification trees**. When the output
is continuous, then the decision trees are also referred to as **regression trees**.

**Generating
decision trees**

A variety of strategies are used for splitting the decision trees – CHAID and CART being the most popular ones. In general, the following criteria are used for generating the decision trees:

**Root node:**Start with the root/parent node – which represents the entire data set (100% of the available observations)**Splitting the nodes**: Based on certain criteria (the splitting criteria), the root node is split into two or more child (internal) nodes**Stopping criteria:**The nodes keep splitting further, until the stopping criteria is met**Terminal nodes**: The final set of nodes, once the stopping criteria is met, are called as the terminal (or leaf nodes). These nodes are used for generating business rules. From the above example, one of the business rules to predict job offer acceptance, could be:*If*“Salary > $50,000”*and*“Commute < 1 hour”*and*“Coffee = Free”,*then*“accept the job”

**Random
Forests – An ensemble method**

Now, that you have an idea about the (decision) trees, let’s get into the (random) forests. First, the ensemble methods.

In my previous blog, I wrote about the variance of output, based on the quantity and quality of data available. Therefore, in certain situations, it isn’t wise to rely on the output/predictions based on a single model alone. Hence, ensemble methods are used.

Ensemble method, generates several models using different sampling strategies, and combine them to produce the result. Each classification model is given a weightage, and final observation is decided based on the majority.

Random forest is a popular ensemble method. In random forest, several trees (hence the name forest) are developed using different sampling strategies, and the result obtained from the combined weightage. A frequently used sampling strategy is Bootstrap Aggregating.

**Final blog**

We are nearing the completion of this series of blog on Analytics Translator, Machine Learning, and Analytics. In the next blog, which will be the final one, I will cover commonly used tools for Machine Learning and Analytics.

Pingback: Logistic Regression: The Discrete Beauty! - Vijay Raghunathan