geneticWedge.gp.dataAnalysis
Class DataAcquirer

java.lang.Object
  extended by geneticWedge.gp.dataAnalysis.DataAcquirer

public class DataAcquirer
extends java.lang.Object

DataAcquirer allows the performance of various statistical operations on a dataset. Quantities that may be calculated include mean, median, variance, standard deviation, Pearson's R. Information may also be extracted from a results file or from a Population object, including average tree length, number of times a Function or Input is used, etc.


Nested Class Summary
static class DataAcquirer.ScalarData
          ScalarData refer to scalar properties of an Individual.
static class DataAcquirer.VectorData
          VectorData refer to the balance of components within each individual.
 
Constructor Summary
DataAcquirer(Population population)
          Constructor takes a Population object as an argument, allowing the extraction of information from a particular Population.
 
Method Summary
static int getBestProgramLengthFromFile(java.lang.String filename, java.lang.String operatorString, boolean maximiseFitness)
          This method returns the length of the best Individual within a Population, as represented within a results file.
 Component[] getComponents()
          Returns the Components available to members of the Population
 Constant[] getConstants()
          Returns the Constants available to members of the Population
static double getCorrelation(double[][] data)
          Returns the correlation coefficient (R^2) for 2 variables.
static double[] getDiversitiesFromFile(java.lang.String filename)
          This method returns the tree diversity and the fitness diversity for a population.
static java.util.Hashtable<java.lang.String,java.lang.Double> getFractionalInputUseFromFile(java.lang.String filename)
          Returns the Input use within a population, as represented in a results file.
 Function[] getFunctions()
          Returns the Functions available to members of the Population
static java.util.Hashtable<java.lang.String,java.lang.Double> getFunctionUseFromFile(java.lang.String filename)
          Returns the Function use within a population, as represented in a results file.
 Input[] getInputs()
          Returns the Inputs available to members of the Population
static java.util.Hashtable<java.lang.String,java.lang.Double> getInputUseFromFile(java.lang.String filename)
          Returns the Input use within a population, as represented in a results file.
static double getMean(double[] data)
          Returns the mean value of a set of data.
static double getMedian(double[] data)
          Returns the median value of a set of data.
static double[] getMinMax(double[] data)
          Returns the minimum and maximum values of a dataset.
static double[] getPoissonDistribution(double lambda, int min, int max)
          Returns a Poisson distribution as a set of probabilities of values between min and max.
static int[] getProgramLengthsFromFile(java.lang.String filename, java.lang.String operatorString)
          This method returns the lengths of all Individuals within a Population, as represented within a results file.
 double[] getScalarData(DataAcquirer.ScalarData dataType)
          Returns the required ScalarData for all Individuals within the Population
 double[][] getScalarData(DataAcquirer.ScalarData[] dataType, int orderByIndex)
          Returns the required ScalarData for all Individuals within the Population.
static double getStandardDeviation(double[] data)
          Returns the standard deviation of a set of data.
static java.util.Hashtable<java.lang.String,java.lang.Integer> getTerminalSubtreesFromFile(java.lang.String filename)
          Returns the terminal subtree use within a population, as represented in a results file.
static double getVariance(double[] data)
          Returns the variance of a set of data.
 double[][] getVectorData(DataAcquirer.VectorData dataType)
          This method returns data for the Individuals in the same order as in the population.
 double[][] getVectorData(DataAcquirer.VectorData dataType, DataAcquirer.ScalarData sortBy)
          Returns the requested VectorData for all Individuals within the Population.
static java.lang.Object[] partitionData(double[] data, double width)
          This method partitions the data into equal sized partitions.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataAcquirer

public DataAcquirer(Population population)
Constructor takes a Population object as an argument, allowing the extraction of information from a particular Population.

Method Detail

getMinMax

public static double[] getMinMax(double[] data)
Returns the minimum and maximum values of a dataset.


partitionData

public static java.lang.Object[] partitionData(double[] data,
                                               double width)
This method partitions the data into equal sized partitions. The lower end of each partition is open (does not contain the boundary value), but the upper end is closed (contains the boundary value).

Returns:
an Object array containing a double array of the boundary values and an int array of the population count of each category. Note that the second array will have one less member than the first array.

getMean

public static double getMean(double[] data)
Returns the mean value of a set of data.


getMedian

public static double getMedian(double[] data)
Returns the median value of a set of data.


getVariance

public static double getVariance(double[] data)
Returns the variance of a set of data.


getStandardDeviation

public static double getStandardDeviation(double[] data)
Returns the standard deviation of a set of data.


getInputUseFromFile

public static java.util.Hashtable<java.lang.String,java.lang.Double> getInputUseFromFile(java.lang.String filename)
                                                                                  throws java.io.IOException
Returns the Input use within a population, as represented in a results file. Each key within the Hashtable is an Input name. The corresponding element is the average number of times that Input is used per Individual.

Throws:
java.io.IOException

getFractionalInputUseFromFile

public static java.util.Hashtable<java.lang.String,java.lang.Double> getFractionalInputUseFromFile(java.lang.String filename)
                                                                                            throws java.io.IOException
Returns the Input use within a population, as represented in a results file. Each key within the Hashtable is an Input name. The corresponding element is the fraction of all inputs that are the target Input.

Throws:
java.io.IOException

getTerminalSubtreesFromFile

public static java.util.Hashtable<java.lang.String,java.lang.Integer> getTerminalSubtreesFromFile(java.lang.String filename)
                                                                                           throws java.io.IOException
Returns the terminal subtree use within a population, as represented in a results file. Each key within the Hashtable is a subtree name. The corresponding element is the average number of times that subtree is used per Individual. At the moment this method is severely limited: only binary (3-element) subtrees containing a +, -, / or * function are identified.

Throws:
java.io.IOException

getFunctionUseFromFile

public static java.util.Hashtable<java.lang.String,java.lang.Double> getFunctionUseFromFile(java.lang.String filename)
                                                                                     throws java.io.IOException
Returns the Function use within a population, as represented in a results file. Each key within the Hashtable is a Function name. The corresponding element is the average number of times that Function is used per Individual.

Throws:
java.io.IOException

getProgramLengthsFromFile

public static int[] getProgramLengthsFromFile(java.lang.String filename,
                                              java.lang.String operatorString)
                                       throws java.io.IOException
This method returns the lengths of all Individuals within a Population, as represented within a results file. An example of operator string is "+*-/". It is required so that the splits between nodes can be identified

Throws:
java.io.IOException

getBestProgramLengthFromFile

public static int getBestProgramLengthFromFile(java.lang.String filename,
                                               java.lang.String operatorString,
                                               boolean maximiseFitness)
                                        throws java.io.IOException
This method returns the length of the best Individual within a Population, as represented within a results file. An example of operator string is "+*-/". It is required so that the splits between nodes can be identified

Throws:
java.io.IOException

getDiversitiesFromFile

public static double[] getDiversitiesFromFile(java.lang.String filename)
                                       throws java.io.IOException
This method returns the tree diversity and the fitness diversity for a population. Both values are in the range [0,1].

Throws:
java.io.IOException

getCorrelation

public static double getCorrelation(double[][] data)
Returns the correlation coefficient (R^2) for 2 variables.


getScalarData

public double[] getScalarData(DataAcquirer.ScalarData dataType)
Returns the required ScalarData for all Individuals within the Population


getScalarData

public double[][] getScalarData(DataAcquirer.ScalarData[] dataType,
                                int orderByIndex)
Returns the required ScalarData for all Individuals within the Population. orderByIndex determines how the individuals will be sorted before obtaining the ScalarData. If set to less than 0 or greater than or equal to dataType length the individuals will be left in their original order.


getFunctions

public Function[] getFunctions()
Returns the Functions available to members of the Population


getInputs

public Input[] getInputs()
Returns the Inputs available to members of the Population


getConstants

public Constant[] getConstants()
Returns the Constants available to members of the Population


getComponents

public Component[] getComponents()
Returns the Components available to members of the Population


getVectorData

public double[][] getVectorData(DataAcquirer.VectorData dataType,
                                DataAcquirer.ScalarData sortBy)
Returns the requested VectorData for all Individuals within the Population. The ScalarData sortBy determines the order in which this VectorData will be returned. If sortBy is null, the data will be returned in the same order as the individuals in the population.


getVectorData

public double[][] getVectorData(DataAcquirer.VectorData dataType)
This method returns data for the Individuals in the same order as in the population.

See Also:
getVectorData(VectorData, ScalarData)

getPoissonDistribution

public static double[] getPoissonDistribution(double lambda,
                                              int min,
                                              int max)
Returns a Poisson distribution as a set of probabilities of values between min and max.

Parameters:
lambda - The expected value