Creates a classifier based on the given sample data.

Version 1.1
Bundle tools
Categories Classification
Authors Marko Laakso (Marko.Laakso@Helsinki.FI), Sirkku Karinen (sirkku.karinen@helsinki.fi)
Usage Example with default values


Name Type Mandatory Description
data TextFile Mandatory Sample data for the supervised learning.
testdata TextFile Optional Validation data to estimate accuracy of the new classifier.
classifydata TextFile Optional Data for which classes are predicted. NOTE This is not used in training or in validation! Weka requires class-column also for this dataset. You should add a column named with the parameter 'classColumn' to this dataset. It is a good trick to name id-column as 'classColumn', in this case it is also added to the 'predictedClasses' data set.
inClassifier BinaryFile Optional A classifier object that is used instead of building new classifier based on training data. NOTE If this is set parameter 'methodClass' or input 'data' are not used, you should still provide these values (empty values).


Name Type Description
outClassifier BinaryFile A new classifier that has been produced.
report Latex Textual description for the classifier and its performance. The exact content of this report depends on the method selection.
confusion Matrix Confusion matrix with the class prediction frequencies as columns
evaluation CSV Evaluation
predictedClasses CSV If input 'classifydata' is provided, classes are predicted for the data and results are in this output. Otherwise this output is an empty file.


Name Type Default Description
classColumn string (no default) Column name for the column that contains the reference class.
columnsToRemove string "" Comma separated list of names of columns not to be used in classification. Useful if you want to ignore some attribute in the data while teaching the classifier.
crossValidation int 500 Number of folds for the cross-validation
dataType string "CSV" Name for data file type (CSV/arff)
methodClass string (no default) A fully qualified Java class name for the implementation of Weka Classifier.
processMissing boolean false Process NA-values suitable for Weka.
randomSeed int 1 Seed value provided to the pseudo-random generator.
runInternalTests boolean false Flag for running internal tests of the classifier. These tests give information about functionality of classifier and print report at the end of the Latex report.
sectionTitle string "" Title for latex-section
wekaParameters string "" A space separated list of parameters passed to clustering method. See Weka API for possible values.

Test cases

Test case Parameters IN
case1 properties data (missing) classifydata inClassifier outClassifier report confusion evaluation predictedClasses

classColumn = class,
methodClass = weka.classifiers.trees.J48,
crossValidation = 1000,
randomSeed = 1,

case2_NA properties data (missing) (missing) (missing) (missing) (missing) (missing) (missing) (missing)

classColumn = class,
methodClass = weka.classifiers.trees.J48,
crossValidation = 2,
randomSeed = 1,
processMissing = true,

