Welcome! Log In Create A New Profile

Advanced

Decision Trees

Posted by dve83 
Announcements Last Post
Announcement SoC Curricula 09/30/2017 01:08PM
Announcement Demarcation or scoping of examinations and assessment 02/13/2017 07:59AM
Announcement School of Computing Short Learning Programmes 11/24/2014 08:37AM
Announcement Unisa contact information 07/28/2011 01:28PM
Decision Trees
May 05, 2012 11:17AM
Well, truth be told. I'm very worried about decision trees..
not even mentioning the exam in general.

Looking at the solutions to Assignment 3 we have roughly 12 pages as an answer to the decision tree question. Worried about the time it takes to do this and the chances of making mistakes through the process.

My summary on the concept (for those who need another perspective / who would like to comment and correct me)


1. Start with the whole example set and define S = [number for each category eg. 5l,3m,6h]
ie. we have a total of 14 values, of which 5 are low, 3 are medium and 6 are high

2. Calculate the Entropy for the Complete set defined as S (log is actually log base 2)
- (4/14 log 5/14) - (3/14 log 3/14) - (6/14 log 6/14) = 1.531

Now we determine the gain of each of the other attributes still left

3. For each attribute (eg here will be Credit History)
a) Define it in terms of the possible values for S where S = [5l,3m,6h] as defined above
Thus we have: 3 possible values for Credit History (bad, good, unknown)

- in terms of S's possible values, for Bad Credit History we will have 0 Low, 1 Medium and 3 High.
Thus for bad: Sb = [0l, 1m, 3h] Notice that it contains only 4 out of the 14 values (4/14)
Also, Sg = [3l,1m,1h] (5/15 values)
Su = [2l,1m,2h] (5/14 values)

We calculate Entropy for each of these - why? cause we want to deduct it from entropy of the set S to get the information GAIN.
Entrypy(Sb) = -(0/4 log 0/4) - (1/4 log 1/4) - (3/4 log 3/4) = 0.811
E(Sg) = 1.371
E(Su) = 1.522

Now Gain = E(S) - 4/14(E(Sb)) - 5/15(E(Sg)) - 5/15(S(Su)) = 0.266
This is the information gain for the first attribute. NOw we do it for the other three and pick the highest valued attribute.This is our first node. (InCOME)

b) Now notice that INcome in turn has three possible values. Each of these will form a branch to the next level in your tree. Thus for each of these
you will have to decide the next node. Thus for each of the 3 possible values, filter out all lines that aren't relevent (ie.start with value <R15k and work with those lines only - there are 4)
Now you again define S. S = [4H]. Because we only have 1 type of value, this will be the result of this branch.

second take the value R15k - R35K
Filter out all irrelevant lines.
We have to value types of our goal node (ie. 2x High and 2x Medium).
Define S = [2m, 2H]
Get Entropy (S) = -(2/4log2/4) - (2/4 log 2/4)
Now for each of the left over attributes (Credit History, Debt, Collateral), find its entropy and information gain. Pick the best 1 and use that as the next node in your tree

eg.
Credit History: For unknown we have 1 High and 1 Medium goal value Su = [1h, 1m] Fraction of total S = 2/4
For Good we only have 1 Medium goal value Sg =[0h, 1m] Fraction 1/4
For Bad we have 1 High goal value Sb = [1h, 0m] Fraction 1/4

Work out eaches entropy eg(unknown = - (1/2 log 1/2) - (1/2 log 1/2))
Then deduct it from entropy of S (with each attribute as a fraction)
Pick the highest gain and insert the node in your tree.

The moment your example list gets filtered and ends up with a single goal value (also not that if you end up with a single line - you will always have 1 goal value), then just insert the goal value as the node in your tree. The moment you have more than 1 optio for a goal value, you need to do the calc above.

Dan

Danie van Eeden
------------------------
avatar Re: Decision Trees
May 21, 2012 04:10PM
If you look at the last paper they do not expect you to do the whole tree using IG. They will either ask "Work out the IG for X" or "Build an tree(probably without IG)" I think
Sorry, only registered users may post in this forum.

Click here to login