Exponential Functions Since the Independent variable occurs in the Exponent...
Before we could go in understanding what is Decision Tree Algorithm, it is imperative to refresh the Basics of Machine Learning first, when it comes to the types of problems handled.
There are three basic categories of Problems that you will encounter in Machine learning. They are Classification, Regression and Clustering
Classification problems: In Classification problems you would have to make instantaneous decisions like, “Yes” or “No”, “good” or “bad”, “accepted” or “rejected”, “true” or “false”
Regression problems: You know the stock value of a product today, you should be able to predict the stock value of the product 2 weeks from now. In other words, the data is continuous in nature.
Clustering problems. Here the data ought to be clustered. Example, most visited discount shelves in a supermarket. In online shops, people who buy Ties, also bought, cuff links and tie pins. The products are organized in specific pattern, example: like purchase orders or choice preference of customers.
The 4 main problem solving techniques used in solving Classification Machine learning problems
- Naive Bayes
- Logistic Regression
- Decision Tree
- Random Forest
The first two are employed for non complex data sets, while the other two are used for complex data sets.
In this post we shall primarily focus on Decision Tree Algorithms, starting with its definition
What is a Decision Tree Algorithm
Decision Tree is a tree shaped Diagram used to assist in finalizing the course of Action intended. A branch in a decision tree represents a decision.
Types of Problems that Decision Tree can solve
Decision Trees can solve Classification problems and Regression problems.
The Classification Decision tree will determine the outcomes for If-Then condition. For example, If you work hard, Then you shall pass your exam. Determining the best Race car based on the 1 km race timings.
The Regression Decision tree: This model used when the target variable is continuous or numerical in nature.
Below is a simple example of a Decision Tree Algorithm. You have intention to start a business. You have two proposals. One to start a business that sells Ladies Hand Bags, and the other would be to start selling Ladies Shoes. If you were to Sell Hand bags, then the amount of money made on this Model would be $1000. On the other hand, if you were to sell Shoes, then the amount of money made on this model, would be $900.
Which of these Models would you choose? Obviously, Selling Hand Bags, why? Because the returns are more. But is this the Right Decision?
The above figure just illustrates the basics, what if Selling Hand Bags, has 50% Chance of success and a 50% Chance of failure and similarly, selling Shoes, has a 50% Chance of success and a 50% Chance of failure, then how would the Decision Tree Algorithm look like.
Why one of these Business would you choose now?
The Decision is based on the following formula
Now which of the Business Models would you Decide on pursuing?
Obviously, it would be Selling Shoes
The values that these formula produces is called as the Expected Value
Interpretation of the Expected Value
The Expected value does not mean that every time you will make a profit of $400, in the shoes selling business. It only means, if you did the Identical Shoes selling business very many times, then your Average earnings will probably be, $400 per time. Note the word, Probably.
Pros of using Decision Tree Algorithm
They are simple to use
They provide a lucid understanding to complex routines
The model works on Visualization, thus it captivates, both
the learner and the implementer.
Doesn’t require complex data preparation
Categorical data and Numerical data is handled with ease.
Even if a data doesn’t fit, still it can be used to effect the prediction.
Cons of using Decision Tree Algorithm
The focus is just on one particular situation instead of a
The Decision Tree model can get unstable due to small
changes in the data. The balance will be lost, and this in turn will impact the
decision arrived at.
This would impair the decision tress due to its inability to work with new incoming data.
Terms required for Decision Tree
This is the measure that defines the unpredictability in the
This is the measure that defines the decrease in
unpredictability after the data set is split.
This carries the decision
The top most decision node is known as the root node