Welcome! Log In Create A New Profile

Decision Trees

Posted by dve83

Announcements		Last Post
	SoC Curricula	09/30/2017 01:08PM
	Demarcation or scoping of examinations and assessment	02/13/2017 07:59AM
	School of Computing Short Learning Programmes	11/24/2014 08:37AM
	Unisa contact information	07/28/2011 01:28PM

Forums Messages New

dve83

Decision Trees
May 05, 2012 11:17AM

Registered: 18 years ago
Posts: 129
Rating: 1

Well, truth be told. I'm very worried about decision trees..
not even mentioning the exam in general.

Looking at the solutions to Assignment 3 we have roughly 12 pages as an answer to the decision tree question. Worried about the time it takes to do this and the chances of making mistakes through the process.

My summary on the concept (for those who need another perspective / who would like to comment and correct me)

1. Start with the whole example set and define S = [number for each category eg. 5l,3m,6h]
ie. we have a total of 14 values, of which 5 are low, 3 are medium and 6 are high

2. Calculate the Entropy for the Complete set defined as S (log is actually log base 2)
- (4/14 log 5/14) - (3/14 log 3/14) - (6/14 log 6/14) = 1.531

Now we determine the gain of each of the other attributes still left

3. For each attribute (eg here will be Credit History)
a) Define it in terms of the possible values for S where S = [5l,3m,6h] as defined above
Thus we have: 3 possible values for Credit History (bad, good, unknown)

- in terms of S's possible values, for Bad Credit History we will have 0 Low, 1 Medium and 3 High.
Thus for bad: Sb = [0l, 1m, 3h] Notice that it contains only 4 out of the 14 values (4/14)
Also, Sg = [3l,1m,1h] (5/15 values)
Su = [2l,1m,2h] (5/14 values)

We calculate Entropy for each of these - why? cause we want to deduct it from entropy of the set S to get the information GAIN.
Entrypy(Sb) = -(0/4 log 0/4) - (1/4 log 1/4) - (3/4 log 3/4) = 0.811
E(Sg) = 1.371
E(Su) = 1.522

Now Gain = E(S) - 4/14(E(Sb)) - 5/15(E(Sg)) - 5/15(S(Su)) = 0.266
This is the information gain for the first attribute. NOw we do it for the other three and pick the highest valued attribute.This is our first node. (InCOME)

b) Now notice that INcome in turn has three possible values. Each of these will form a branch to the next level in your tree. Thus for each of these
you will have to decide the next node. Thus for each of the 3 possible values, filter out all lines that aren't relevent (ie.start with value <R15k and work with those lines only - there are 4)
Now you again define S. S = [4H]. Because we only have 1 type of value, this will be the result of this branch.

second take the value R15k - R35K
Filter out all irrelevant lines.
We have to value types of our goal node (ie. 2x High and 2x Medium).
Define S = [2m, 2H]
Get Entropy (S) = -(2/4log2/4) - (2/4 log 2/4)
Now for each of the left over attributes (Credit History, Debt, Collateral), find its entropy and information gain. Pick the best 1 and use that as the next node in your tree

eg.
Credit History: For unknown we have 1 High and 1 Medium goal value Su = [1h, 1m] Fraction of total S = 2/4
For Good we only have 1 Medium goal value Sg =[0h, 1m] Fraction 1/4
For Bad we have 1 High goal value Sb = [1h, 0m] Fraction 1/4

Work out eaches entropy eg(unknown = - (1/2 log 1/2) - (1/2 log 1/2))
Then deduct it from entropy of S (with each attribute as a fraction)
Pick the highest gain and insert the node in your tree.

The moment your example list gets filtered and ends up with a single goal value (also not that if you end up with a single line - you will always have 1 goal value), then just insert the goal value as the node in your tree. The moment you have more than 1 optio for a goal value, you need to do the calc above.

Dan

Danie van Eeden
------------------------

Reply Quote

murdock619

Re: Decision Trees
May 21, 2012 04:10PM

Registered: 14 years ago
Posts: 43
Rating: 0

If you look at the last paper they do not expect you to do the whole tree using IG. They will either ask "Work out the IG for X" or "Build an tree(probably without IG)" I think