edu.harvard.seas.iis.abilities.classify
Class Transform

java.lang.Object
  extended by edu.harvard.seas.iis.abilities.classify.Transform

public class Transform
extends java.lang.Object

This class contains the tool chain for transforming raw parsed data into something that is usable for ML (computing additional features, normalizing, etc)

Author:
kgajos, Charles Herrmann

Field Summary
static java.lang.String[] FEATURES_TO_ADD_FOR_LATER_USE
           
static java.lang.String[] FEATURES_TO_DIVIDE_BY_A
           
static java.lang.String[] FEATURES_TO_DIVIDE_BY_ID
           
static java.lang.String[] FEATURES_TO_DIVIDE_BY_logA
           
static java.lang.String[] FEATURES_TO_DIVIDE_BY_W
           
static java.lang.String[] FEATURES_TO_LOG_TRANSFORM
           
static java.lang.String[] FEATURES_TO_PRESERVE_DURING_NORMALIZATION
           
 
Constructor Summary
Transform()
           
 
Method Summary
static DataSet combineDataSets(java.io.File inputDirectory, java.lang.String[] userNames, java.io.File outputFile)
          Combine data sets for several users into a single file
static void combineGloballyAndIndividuallyNormalizedData(java.io.File globalFile, java.io.File individualFile, java.lang.String[] featuresToCombine, java.lang.String[] usersToInclude, java.lang.String resultFileName)
          Combines features from two files
static DataSet computeAdditonalFeatures(DataSet dataSet)
           
static void computeAdditonalFeatures(java.io.File inputDirectory, java.io.File outputDirectory)
           
static DataSet computeParticipantCodes(DataSet dataSet)
           
static NormalizationConstants createGloballyNormalizedFile(java.io.File cleanDataDir, java.io.File normalizedDataDir, java.lang.String[] usersToInclude, java.lang.String[] usersToUseForComputingNormalizationConstants, java.lang.String globallyNormalizedCombinedDataFileName, java.lang.String[] featuresToNormalize)
           
static java.lang.String getParticipantCode(java.lang.String user)
          Translates study code (e.g., "lemur") into a participant code that can be used in the paper (e.g., "P13")
static void main(java.lang.String[] args)
           
static DataSet normalize(DataSet oldData, java.lang.String[] attributesToNormalize, java.lang.String[] usersToUseForComputingNormalizationConstants, NormalizationConstants normalizationConstants)
          Normalizes listed features (zero mean and unit stdev); unlike the other normalize() method, it does not attempt to separate the users -- if you want normalization per user, feed it separate data sets for each user
static void normalize(java.io.File inputDirectory, java.io.File outputDirectory, java.lang.String[] featuresToNormalize, java.lang.String[] usersToInclude)
          Normalize the data such that values for each feature in each file have a zero mean and a unit stdev (actually, we currently normalize by the natural data --- see DataSetTransform.normalize() for more)
static void normalizeAttribute(DataSet data, int attIndex, double mean, double var)
           
static DataSet normalizeUsingNormalizationConstants(DataSet dataSet, java.lang.String[] attributesToNormalize, NormalizationConstants normalizationConstants)
          This method uses the constants provided in the normalizationConstants to perform the normalization
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

FEATURES_TO_DIVIDE_BY_ID

public static java.lang.String[] FEATURES_TO_DIVIDE_BY_ID

FEATURES_TO_DIVIDE_BY_A

public static java.lang.String[] FEATURES_TO_DIVIDE_BY_A

FEATURES_TO_DIVIDE_BY_logA

public static java.lang.String[] FEATURES_TO_DIVIDE_BY_logA

FEATURES_TO_DIVIDE_BY_W

public static java.lang.String[] FEATURES_TO_DIVIDE_BY_W

FEATURES_TO_ADD_FOR_LATER_USE

public static java.lang.String[] FEATURES_TO_ADD_FOR_LATER_USE

FEATURES_TO_PRESERVE_DURING_NORMALIZATION

public static java.lang.String[] FEATURES_TO_PRESERVE_DURING_NORMALIZATION

FEATURES_TO_LOG_TRANSFORM

public static java.lang.String[] FEATURES_TO_LOG_TRANSFORM
Constructor Detail

Transform

public Transform()
Method Detail

getParticipantCode

public static java.lang.String getParticipantCode(java.lang.String user)
Translates study code (e.g., "lemur") into a participant code that can be used in the paper (e.g., "P13")

Parameters:
user -
Returns:

computeAdditonalFeatures

public static DataSet computeAdditonalFeatures(DataSet dataSet)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

computeParticipantCodes

public static DataSet computeParticipantCodes(DataSet dataSet)
                                       throws java.lang.Exception
Throws:
java.lang.Exception

computeAdditonalFeatures

public static void computeAdditonalFeatures(java.io.File inputDirectory,
                                            java.io.File outputDirectory)
                                     throws java.lang.Exception
Throws:
java.lang.Exception

combineDataSets

public static DataSet combineDataSets(java.io.File inputDirectory,
                                      java.lang.String[] userNames,
                                      java.io.File outputFile)
                               throws java.io.IOException
Combine data sets for several users into a single file

Parameters:
inputDirectory -
userNames -
outputFile - where the combined data should be saved (if set to null, the combined data set does not get saved to a file)
Throws:
java.io.IOException

normalize

public static void normalize(java.io.File inputDirectory,
                             java.io.File outputDirectory,
                             java.lang.String[] featuresToNormalize,
                             java.lang.String[] usersToInclude)
                      throws java.lang.Exception
Normalize the data such that values for each feature in each file have a zero mean and a unit stdev (actually, we currently normalize by the natural data --- see DataSetTransform.normalize() for more)

Parameters:
inputDirectory -
outputDirectory -
featuresToNormalize -
usersToInclude -
Throws:
java.lang.Exception

createGloballyNormalizedFile

public static NormalizationConstants createGloballyNormalizedFile(java.io.File cleanDataDir,
                                                                  java.io.File normalizedDataDir,
                                                                  java.lang.String[] usersToInclude,
                                                                  java.lang.String[] usersToUseForComputingNormalizationConstants,
                                                                  java.lang.String globallyNormalizedCombinedDataFileName,
                                                                  java.lang.String[] featuresToNormalize)
                                                           throws java.io.IOException
Throws:
java.io.IOException

combineGloballyAndIndividuallyNormalizedData

public static void combineGloballyAndIndividuallyNormalizedData(java.io.File globalFile,
                                                                java.io.File individualFile,
                                                                java.lang.String[] featuresToCombine,
                                                                java.lang.String[] usersToInclude,
                                                                java.lang.String resultFileName)
                                                         throws java.lang.Exception
Combines features from two files

Parameters:
globalFile -
individualFile -
featuresToCombine -
resultFileName -
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Parameters:
args -
Throws:
java.lang.Exception

normalize

public static DataSet normalize(DataSet oldData,
                                java.lang.String[] attributesToNormalize,
                                java.lang.String[] usersToUseForComputingNormalizationConstants,
                                NormalizationConstants normalizationConstants)
                         throws java.io.IOException
Normalizes listed features (zero mean and unit stdev); unlike the other normalize() method, it does not attempt to separate the users -- if you want normalization per user, feed it separate data sets for each user

Parameters:
oldData -
attributesToNormalize -
usersToUseForComputingNormalizationConstants - will only use data from these users to compute normalization constants; if set to null, will use all users represented in the data set
normalizationConstants - the contents of this object will be updated with the constants computed during the run of this method
Returns:
Throws:
java.io.IOException

normalizeAttribute

public static void normalizeAttribute(DataSet data,
                                      int attIndex,
                                      double mean,
                                      double var)

normalizeUsingNormalizationConstants

public static DataSet normalizeUsingNormalizationConstants(DataSet dataSet,
                                                           java.lang.String[] attributesToNormalize,
                                                           NormalizationConstants normalizationConstants)
This method uses the constants provided in the normalizationConstants to perform the normalization

Parameters:
dataSet -
attributesToNormalize -
normalizationConstants -
Returns: