Java代写:ECS629 ID3

用Java代写AI中的ID3算法,用于构造决策树。

Requirement

The aim of this assignment is to implement the ID3 algorithm in Java to perform decision tree learning and classification for objects with discrete (String-valued) attributes. You will be given two input files, one containing labelled training examples, and the other containing examples which have no label, which your programme has to classify. In each file the examples will be described by a list of attributes, one example per line, with the attribute values separated by commas (comma-separated value or CSV format). You may assume that none of the attribute names or values contains a comma or other punctuation. The first line of each input file contains the names of the attributes. The last attribute on each line in the training file is the class that the example belongs to. Apart from the class, which only appears in the training data, the other attributes will be in the same order in both files.

You are provided with a skeleton programme to get you started. The skeleton programme takes care of the text processing: reading and parsing the CSV files, and finding the set of values that each attribute can have. You may assume that no attribute has a value in the testing data that does not also occur in the training data. The skeleton programme also contains most of the data structures that you need to write your methods. You need to complete the methods classify() and train(). Do NOT change their specification or any of the predefined data structures, or else you might fail automatic marking. You may add new methods and variables to the ID3 class as they are needed.

You are also provided with two sets of test files (each containing the two input files and the correct output file). Test with (replacing the relevant file names):

java ID3 TrainFile.csv TestFile.csv > MyOutput.csv
diff MyOutput.csv OutputFile.csv

The diff command should give no output if your code is correct. Note that the given test files are very simple tests (the data is taken from the questions in Tutorial 5, so you can check the code step by step). You will need to design your own tests to make sure your code functions correctly under all legal input conditions (e.g. 1 class; 3 or more classes; and cases where the training set can not be perfectly classified).

Marks will be allocated as follows (subject to the usual late penalties):

  • 50%: correct functioning of train() method
  • 20%: correct functioning of classify() method
  • 30%: code and report describing how your program works. The report should be MAXIMUM 1 page, PDF format ONLY. Originality, design decisions, code quality and ability to follow these instructions will be assessed.

Using the link on your landing page, submit your assignment as a single zip file containing the following 2 files only:

  • source code (Java source code, filename ID3.java)
  • report (PDF format, filename report.pdf)

Do NOT:

  • put any directory structure in the zip file
  • submit more than one pdf file
  • submit a Word document
  • change the specification of the classify() and train() methods, nor the Tree data structure
  • add any statements that print anything to standard output or standard error (if you use print statements for debugging, then either remove them, comment them out, or disable them before submitting your code)