edu.harvard.seas.iis.abilities.classify
Class ClassifierEvaluator

java.lang.Object
  extended by edu.harvard.seas.iis.abilities.classify.ClassifierEvaluator

public class ClassifierEvaluator
extends java.lang.Object


Constructor Summary
ClassifierEvaluator()
           
 
Method Summary
 DataSet annotateDataSet(DataSet dataSet, weka.classifiers.Classifier c, InstanceFilter filter)
          Fills in the "Prediction probability" and "Predicted class" values in the dataSet using classifier c.
static double[] compareFittsLawModels(double[] baseline, double[] comparison)
           
static double[] computeFittsLawCoefficients(DataSet dataSet)
           
 java.lang.String crossvalidateOverUsers(weka.classifiers.Classifier c, DataSet dataSet, java.lang.String[] users, boolean generateAnnotatedDataSet)
           
 double[] evaluate(weka.classifiers.Classifier c, DataSet testData)
          Evaluate a trained classifier c on a particular test data set
 java.lang.String evaluate(java.lang.String[] features, java.io.File dataFile, boolean generateAnnotatedDataSet)
          Creates a positive and unlabeled classifier based on the globally set baseClassifier; evaluates it using crossvalidation; only specified features are used for classification
 void evaluateDataRequirements(java.lang.String[] users, java.lang.String[] features, java.io.File cleanDataDirectory)
          A draft of a method for testing how the classifier's accuracy changes depending on the amount of training data available
 DataSet getDeliberateInstances(weka.classifiers.Classifier c, DataSet d)
          Returns a copy of the data set that only contains instances positively classified by c
 void getPerUserStDevs(java.io.File dataFile, weka.classifiers.Classifier baseClassifier, java.lang.String featureToComputeStdevsFor, java.lang.String[] featureList, java.lang.String[] users)
          For a given feature, computes the per-user stdevs on experimental, natural, and filtered natural data sets
static void main(java.lang.String[] args)
           
 double[] runStatisticalTests(DataSet baseLine, DataSet testSet, java.lang.String[] users, java.lang.String attributeForTesting, int minNumberRequiredForTesting)
          Runs pairwise statistical tests to look for statistically significant differences across users on several metrics.
 void runStatisticalTests(java.io.File dataFile, weka.classifiers.Classifier baseClassifier, java.lang.String[] featureList, java.lang.String[] users)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ClassifierEvaluator

public ClassifierEvaluator()
Method Detail

computeFittsLawCoefficients

public static double[] computeFittsLawCoefficients(DataSet dataSet)
Parameters:
dataSet -
Returns:
[intercept, slope]

compareFittsLawModels

public static double[] compareFittsLawModels(double[] baseline,
                                             double[] comparison)
Parameters:
baseline - params of the baseline model [intercept, slope]
comparison - params of the comparison model [intercept, slope]
Returns:
mean fraction by which prediction of comparison differ from baseline

getDeliberateInstances

public DataSet getDeliberateInstances(weka.classifiers.Classifier c,
                                      DataSet d)
                               throws java.lang.Exception
Returns a copy of the data set that only contains instances positively classified by c

Parameters:
c -
d -
Returns:
Throws:
java.lang.Exception

evaluate

public double[] evaluate(weka.classifiers.Classifier c,
                         DataSet testData)
                  throws java.lang.Exception
Evaluate a trained classifier c on a particular test data set

Parameters:
c -
testData -
Returns:
Throws:
java.lang.Exception

annotateDataSet

public DataSet annotateDataSet(DataSet dataSet,
                               weka.classifiers.Classifier c,
                               InstanceFilter filter)
                        throws java.lang.Exception
Fills in the "Prediction probability" and "Predicted class" values in the dataSet using classifier c. If filter is not null, then annotation is only done for instances that pass the filter (if filter is null, then all instances get annotated)

Parameters:
dataSet -
c -
filter -
Returns:
Throws:
java.lang.Exception

crossvalidateOverUsers

public java.lang.String crossvalidateOverUsers(weka.classifiers.Classifier c,
                                               DataSet dataSet,
                                               java.lang.String[] users,
                                               boolean generateAnnotatedDataSet)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

evaluate

public java.lang.String evaluate(java.lang.String[] features,
                                 java.io.File dataFile,
                                 boolean generateAnnotatedDataSet)
                          throws java.lang.Exception
Creates a positive and unlabeled classifier based on the globally set baseClassifier; evaluates it using crossvalidation; only specified features are used for classification

Parameters:
features -
dataFile - file containing the data to use for training and validation in the crossvalidation procedure
generateAnnotatedDataSet - if set to true, this method will produce a copy of the data set read from dataFile that has the classifier prediction fields filled in for each instance; the annotated data set is stored in the global variable annotatedDataSet
Returns:
Throws:
java.lang.Exception

evaluateDataRequirements

public void evaluateDataRequirements(java.lang.String[] users,
                                     java.lang.String[] features,
                                     java.io.File cleanDataDirectory)
                              throws java.lang.Exception
A draft of a method for testing how the classifier's accuracy changes depending on the amount of training data available

Parameters:
users -
features -
cleanDataDirectory -
Throws:
java.lang.Exception

runStatisticalTests

public double[] runStatisticalTests(DataSet baseLine,
                                    DataSet testSet,
                                    java.lang.String[] users,
                                    java.lang.String attributeForTesting,
                                    int minNumberRequiredForTesting)
                             throws java.lang.IllegalArgumentException,
                                    org.apache.commons.math.MathException
Runs pairwise statistical tests to look for statistically significant differences across users on several metrics.

Parameters:
baseLine -
testSet -
users -
attributeForTesting -
minNumberRequiredForTesting -
Returns:
Throws:
java.lang.IllegalArgumentException
org.apache.commons.math.MathException

runStatisticalTests

public void runStatisticalTests(java.io.File dataFile,
                                weka.classifiers.Classifier baseClassifier,
                                java.lang.String[] featureList,
                                java.lang.String[] users)
                         throws java.lang.Exception
Throws:
java.lang.Exception

getPerUserStDevs

public void getPerUserStDevs(java.io.File dataFile,
                             weka.classifiers.Classifier baseClassifier,
                             java.lang.String featureToComputeStdevsFor,
                             java.lang.String[] featureList,
                             java.lang.String[] users)
                      throws java.lang.Exception
For a given feature, computes the per-user stdevs on experimental, natural, and filtered natural data sets

Parameters:
dataFile -
baseClassifier -
featureToComputeStdevsFor -
featureList -
users -
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Parameters:
args -
Throws:
java.lang.Exception