AI代写:CS3024 Decision Trees

代写AI中的Decision Trees,通过测试即可。

Summary

This assignment will involve writing a program that creates a decision tree based on a set of training data. You will implement the information gain process for building the best decision tree from a set of training data provided in a file. You can work with a partner on this assignment. I will post my usual Moodle thread asking for partners.

The Assignment

Your program should do the following:

  • Prompt the user to enter a file name.
  • Read the information from the file (see below).
  • Build the best decision tree that results from the training data.
  • Display or print the decision tree.

The printing of the tree should be readable. See the end of this document for an example of what your output might look like.

If you cannot implement the information gain properly, a program that produces any correct tree will gain most of the points.

Assumptions

You may make the following assumptions for the decision trees produced by your program:

  • The classification will be one of two values - TRUE and FALSE, for example.
  • The value for each attribute will be one of two values - TRUE and FALSE for example.
  • These values will not be named, and will be represented by 0 and 1 (see below).
  • There will be no pieces of training data with partial data.
  • There may be contradictory training data - you should assign values based on the majority in that case.

The File Structure

For this assignment, you will be reading in a file containing information. The file will consists of some number n of attribute names, including the classification name as the last of these. Then, there will be a blank line. Then, there will be some number m of training instances, consisting of n values that are 0 or 1, separated by commas. These represent the values of the attributes (and the classification) for that piece of training data. For example, a file might look like this:

Brown
Wrinkled
Smelly
Spongy
POISON

1,0,1,0,1
0,0,1,1,0
0,1,0,0,1
0,0,0,1,0
1,1,0,1,1
1,0,1,1,0
1,1,1,0,1
0,0,0,0,0

This file details a set of training data with five attributes: Brown, Wrinkled, Smelly, Spongy; the final attribute is POISON, which is the classification.

Then there are eight pieces of training data. The first is 1, 0, 1, 0, 1: the first 1 represents the value for Brown of true; the second, Wrinkled, is false (0); the third, Smelly, is true (1), the fourth, Spongy, is false (0). The classification (POISON) is true (1). The rest of the training data works in a similar way.

The best tree for this data would look like this if we printed it out:

1
2
3
4
5
6
7
8
9
10
11
12
13
Wrinkled
yes:
POISON = TRUE
no:
Brown
yes:
Spongy
yes:
POISON = FALSE
no:
POISON = TRUE
no:
POISON = FALSE