|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.harvard.seas.iis.abilities.classify.Transform
public class Transform
This class contains the tool chain for transforming raw parsed data into something that is usable for ML (computing additional features, normalizing, etc)
Field Summary | |
---|---|
static java.lang.String[] |
FEATURES_TO_ADD_FOR_LATER_USE
|
static java.lang.String[] |
FEATURES_TO_DIVIDE_BY_A
|
static java.lang.String[] |
FEATURES_TO_DIVIDE_BY_ID
|
static java.lang.String[] |
FEATURES_TO_DIVIDE_BY_logA
|
static java.lang.String[] |
FEATURES_TO_DIVIDE_BY_W
|
static java.lang.String[] |
FEATURES_TO_LOG_TRANSFORM
|
static java.lang.String[] |
FEATURES_TO_PRESERVE_DURING_NORMALIZATION
|
Constructor Summary | |
---|---|
Transform()
|
Method Summary | |
---|---|
static DataSet |
combineDataSets(java.io.File inputDirectory,
java.lang.String[] userNames,
java.io.File outputFile)
Combine data sets for several users into a single file |
static void |
combineGloballyAndIndividuallyNormalizedData(java.io.File globalFile,
java.io.File individualFile,
java.lang.String[] featuresToCombine,
java.lang.String[] usersToInclude,
java.lang.String resultFileName)
Combines features from two files |
static DataSet |
computeAdditonalFeatures(DataSet dataSet)
|
static void |
computeAdditonalFeatures(java.io.File inputDirectory,
java.io.File outputDirectory)
|
static DataSet |
computeParticipantCodes(DataSet dataSet)
|
static NormalizationConstants |
createGloballyNormalizedFile(java.io.File cleanDataDir,
java.io.File normalizedDataDir,
java.lang.String[] usersToInclude,
java.lang.String[] usersToUseForComputingNormalizationConstants,
java.lang.String globallyNormalizedCombinedDataFileName,
java.lang.String[] featuresToNormalize)
|
static java.lang.String |
getParticipantCode(java.lang.String user)
Translates study code (e.g., "lemur") into a participant code that can be used in the paper (e.g., "P13") |
static void |
main(java.lang.String[] args)
|
static DataSet |
normalize(DataSet oldData,
java.lang.String[] attributesToNormalize,
java.lang.String[] usersToUseForComputingNormalizationConstants,
NormalizationConstants normalizationConstants)
Normalizes listed features (zero mean and unit stdev); unlike the other normalize() method, it does not attempt to separate the users -- if you want normalization per user, feed it separate data sets for each user |
static void |
normalize(java.io.File inputDirectory,
java.io.File outputDirectory,
java.lang.String[] featuresToNormalize,
java.lang.String[] usersToInclude)
Normalize the data such that values for each feature in each file have a zero mean and a unit stdev (actually, we currently normalize by the natural data --- see DataSetTransform.normalize() for more) |
static void |
normalizeAttribute(DataSet data,
int attIndex,
double mean,
double var)
|
static DataSet |
normalizeUsingNormalizationConstants(DataSet dataSet,
java.lang.String[] attributesToNormalize,
NormalizationConstants normalizationConstants)
This method uses the constants provided in the normalizationConstants to perform the normalization |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static java.lang.String[] FEATURES_TO_DIVIDE_BY_ID
public static java.lang.String[] FEATURES_TO_DIVIDE_BY_A
public static java.lang.String[] FEATURES_TO_DIVIDE_BY_logA
public static java.lang.String[] FEATURES_TO_DIVIDE_BY_W
public static java.lang.String[] FEATURES_TO_ADD_FOR_LATER_USE
public static java.lang.String[] FEATURES_TO_PRESERVE_DURING_NORMALIZATION
public static java.lang.String[] FEATURES_TO_LOG_TRANSFORM
Constructor Detail |
---|
public Transform()
Method Detail |
---|
public static java.lang.String getParticipantCode(java.lang.String user)
user
-
public static DataSet computeAdditonalFeatures(DataSet dataSet) throws java.lang.Exception
java.lang.Exception
public static DataSet computeParticipantCodes(DataSet dataSet) throws java.lang.Exception
java.lang.Exception
public static void computeAdditonalFeatures(java.io.File inputDirectory, java.io.File outputDirectory) throws java.lang.Exception
java.lang.Exception
public static DataSet combineDataSets(java.io.File inputDirectory, java.lang.String[] userNames, java.io.File outputFile) throws java.io.IOException
inputDirectory
- userNames
- outputFile
- where the combined data should be saved (if set to null, the
combined data set does not get saved to a file)
java.io.IOException
public static void normalize(java.io.File inputDirectory, java.io.File outputDirectory, java.lang.String[] featuresToNormalize, java.lang.String[] usersToInclude) throws java.lang.Exception
inputDirectory
- outputDirectory
- featuresToNormalize
- usersToInclude
-
java.lang.Exception
public static NormalizationConstants createGloballyNormalizedFile(java.io.File cleanDataDir, java.io.File normalizedDataDir, java.lang.String[] usersToInclude, java.lang.String[] usersToUseForComputingNormalizationConstants, java.lang.String globallyNormalizedCombinedDataFileName, java.lang.String[] featuresToNormalize) throws java.io.IOException
java.io.IOException
public static void combineGloballyAndIndividuallyNormalizedData(java.io.File globalFile, java.io.File individualFile, java.lang.String[] featuresToCombine, java.lang.String[] usersToInclude, java.lang.String resultFileName) throws java.lang.Exception
globalFile
- individualFile
- featuresToCombine
- resultFileName
-
java.lang.Exception
public static void main(java.lang.String[] args) throws java.lang.Exception
args
-
java.lang.Exception
public static DataSet normalize(DataSet oldData, java.lang.String[] attributesToNormalize, java.lang.String[] usersToUseForComputingNormalizationConstants, NormalizationConstants normalizationConstants) throws java.io.IOException
oldData
- attributesToNormalize
- usersToUseForComputingNormalizationConstants
- will only use data from these users to compute normalization
constants; if set to null, will use all users represented in
the data setnormalizationConstants
- the contents of this object will be updated with the constants
computed during the run of this method
java.io.IOException
public static void normalizeAttribute(DataSet data, int attIndex, double mean, double var)
public static DataSet normalizeUsingNormalizationConstants(DataSet dataSet, java.lang.String[] attributesToNormalize, NormalizationConstants normalizationConstants)
dataSet
- attributesToNormalize
- normalizationConstants
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |