Decision Tree Inductive Learning

Homework 4: Decision Tree Inductive Learning

Due by 10 am on April 12

The file “dt_train.csv” contains 601 lines with 10 variables. The first line contains column headers that may be interpreted as follows:

id: observation identifier.

t1: measurement on test 1; t2: measurement on test 2.

t3: measurement on test 3; t4: measurement on test 4.

t5: measurement on test 5; t6: measurement on test 6.

t7: measurement on test 7; t8: measurement on test 8.

d: binary output variable set to 1 if product is defective and 0 otherwise.

The next lines contain examples, for which the values of the above features are specified.

The table below reproduces the first 2 observations.

id t1 t2 t3 t4 t5 t6 t7 t8 d
1 17 3 31 54 66 54 45 84 1
2 2 15 6 5 82 54 59 87 1
new_case t1 t2 t3 t4 t5 t6 t7 t8 d
1 8 86 55 53 36 12 82 19  
2 22 36 80 69 90 33 22 6  
3 74 26 32 26 38 52 63 12  
4 66 71 71 52 42 88 89 70  
5 55 72 61 41 91 39 50 96  
6 34 58 22 84 84 61 95 57  
7 23 70 39 65 16 71 96 78  
8 9 19 67 43 2 20 92 3  
9 6 71 20 6 27 58 6 22  
10 68 40 86 82 82 44 61 48