org.hisee.core
Class Dataset

java.lang.Object
  |
  +--org.hisee.core.Dataset

public class Dataset
extends java.lang.Object

Dataset represents a set of n-dimensional points. Both the low and high dimensional data of the current Projector are instances of this class. Dataset provides methods for working with such sets (e.g. open dataset up, adding points, checking their integrity, finding nearest neighbors of a point, calculating their interpoint distances, etc.). It is assumed that all points in a dataset have the same dimensionality.


Constructor Summary
Dataset()
           
Dataset(java.util.ArrayList data)
           
Dataset(int ndims, int npoints)
           
 
Method Summary
 void addPoint(double[] row)
          Add datapoint without checking whether it is unique or not
 boolean addPoint(double[] row, double tolerance)
          Add a new datapoint to the dataset
 void calculateDistances()
          Calculate inter-point distancese
 boolean checkConsistentDimensions()
          Check that all the vectors in the dataset have the same dimension
 void clear()
          Clear all data, high and low dimensional
 double getClosestDistance(double[] point)
          Returns the point closest to a given point
 int getClosestIndex(double[] point)
          Returns the index of the closest point
 double getComponent(int datapoint_number, int dimension)
          Get a specific coordinate of a specific datapoint.
 double getCovariance(int i, int j)
          Returns the covariance of the ith component of the dataset with respect to the jth component
 Jama.Matrix getCovarianceMatrix()
          Returns a covariance matrix for the dataset
 java.util.ArrayList getDataset()
           
 int getDimensions()
           
 double getDistance(double[] point1, double[] point2)
          Returns tyhe euclidean distance between two points
 double getDistance(int index_1, int index_2)
          Get the distance between two points
 double[][] getDistances()
          Returns a matrix of interpoint distances, between the points in the dataset.
 double[][] getDoubles()
          Returns a matrix of double, one row for each datapoint, representing the dataset.
 java.lang.String[][] getDoubleStrings()
          Returns a matrix of strings, one row for each datapoint, representing the dataset.
 int getKthNearestNeighbor(int k, double[] point)
          Returns the k'th nearest neighbor.
 int getKthVariantDimension(int k)
          Returns the k'th most variant dimesion.
 double getMaximumDistance()
          Get the maximimum interpoint distance between points in the dataset.
 double getMean(int d)
          Returns the mean of the dataset on a given dimension
 double getMinimumDistance()
          Get the minimum interpoint distance between points in the dataset.
 int getNumPoints()
           
 double[] getPoint(int i)
          Get a specificed point in the dataset
 double getSumDistances()
           
 void init()
          Initialize the dataset, setting the main variables to the property values.
 void init(int dims, int numpoints)
          Re-initialize a dataset to a specific number of dimensions and number of points.
 boolean isUniquePoint(double[] point, double tolerance)
          Check that a given point is "new", that is, that it is not already in the dataset.
 void perturbOverlappingPoints(double factor)
          Find repeated points and perturb them slightly so they don't overlap
 void printDataset()
          Print out all points in the dataset Useful for debugging
 void randomize(int upperBound)
          Randomize dataset to a value between 0 and upperBound
 void readData(java.io.File file)
          Read in stored dataset file
 void results_to_maple()
          Print out low dimensional points so maple can plot them Just does low dimension = 2
 void saveData(java.io.File theFile)
          Save the current datast to a stored file
 void setComponent(int datapoint_number, int dimension, double new_value)
          Set a specific coordinate of a specific datapoint.
 void setDataset(java.util.ArrayList list)
           
 void setPoint(int i, double[] point)
          Set a specified point in the dataset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Dataset

public Dataset()

Dataset

public Dataset(java.util.ArrayList data)

Dataset

public Dataset(int ndims,
               int npoints)
Method Detail

init

public void init()
Initialize the dataset, setting the main variables to the property values. Assumes the dataset already exists, but that it has changed.


init

public void init(int dims,
                 int numpoints)
Re-initialize a dataset to a specific number of dimensions and number of points. Populates the dataset with stubs.

Parameters:
dims - Dimensions of the dataset
numpoints - Number of datapoints in the dataset

clear

public void clear()
Clear all data, high and low dimensional


checkConsistentDimensions

public boolean checkConsistentDimensions()
Check that all the vectors in the dataset have the same dimension


randomize

public void randomize(int upperBound)
Randomize dataset to a value between 0 and upperBound


calculateDistances

public void calculateDistances()
Calculate inter-point distancese


getMinimumDistance

public double getMinimumDistance()
Get the minimum interpoint distance between points in the dataset.

Returns:
minimum distance between any two points in the low-d dataset

getMaximumDistance

public double getMaximumDistance()
Get the maximimum interpoint distance between points in the dataset.

Returns:
maximum distance between any two points in the low-d dataset

readData

public void readData(java.io.File file)
Read in stored dataset file


saveData

public void saveData(java.io.File theFile)
Save the current datast to a stored file

Parameters:
theFile - the file where data should be saved

perturbOverlappingPoints

public void perturbOverlappingPoints(double factor)
Find repeated points and perturb them slightly so they don't overlap


results_to_maple

public void results_to_maple()
Print out low dimensional points so maple can plot them Just does low dimension = 2


getPoint

public double[] getPoint(int i)
Get a specificed point in the dataset

Parameters:
i - index of the point to get
Returns:
the n-dimensional datapoint

setPoint

public void setPoint(int i,
                     double[] point)
Set a specified point in the dataset

Parameters:
i - the point to set
point - the new n-dimensional point

getComponent

public double getComponent(int datapoint_number,
                           int dimension)
Get a specific coordinate of a specific datapoint. Say, the second component of the third datapoint in a 5-dimensional dataset with 50 points.

Parameters:
datapoint_number - index of the point to get
dimension - dimension of the desired component
Returns:
the value of of n'th component of the specified datapoint

setComponent

public void setComponent(int datapoint_number,
                         int dimension,
                         double new_value)
Set a specific coordinate of a specific datapoint. Say, the second component of the third datapoint in a 5-dimensional dataset with 50 points.

Parameters:
datapoint_number - index of the point to get
dimension - dimension of the desired component
new_value - the new value of the n'th component of the specified datapoint

addPoint

public boolean addPoint(double[] row,
                        double tolerance)
Add a new datapoint to the dataset

Parameters:
row - A point in the high dimensional space
tolerance - forwarded to isUniquePoint; if -1 then add point regardless of whether it is unique or not
Returns:
true if point added, false otherwise

addPoint

public void addPoint(double[] row)
Add datapoint without checking whether it is unique or not

Parameters:
row - point to be added

isUniquePoint

public boolean isUniquePoint(double[] point,
                             double tolerance)
Check that a given point is "new", that is, that it is not already in the dataset.

Parameters:
point - the point to check
tolerance - distance within which a point is considered old, and outside of which it is considered new
Returns:
true if the point is new, false otherwise

getClosestDistance

public double getClosestDistance(double[] point)
Returns the point closest to a given point

Parameters:
point - the point to check
Returns:
the distance between this point and the closest other point in the dataset

getClosestIndex

public int getClosestIndex(double[] point)
Returns the index of the closest point

Parameters:
point - the point to check
Returns:
the index of the point closest to this one in the dataset

getKthNearestNeighbor

public int getKthNearestNeighbor(int k,
                                 double[] point)
Returns the k'th nearest neighbor.

Parameters:
k - which nearest neighbor (first, second, etc.) to find
point - the point whose neighbors are to be found
Returns:
index of nearest neighbor

getDistance

public double getDistance(int index_1,
                          int index_2)
Get the distance between two points

Parameters:
index_1 - index of point 1
index_2 - index of point 2
Returns:
distance between points 1 and 2

getDistance

public double getDistance(double[] point1,
                          double[] point2)
Returns tyhe euclidean distance between two points

Parameters:
point1 -
point2 -
Returns:
the Euclidean distance between points 1 and 2

getDimensions

public int getDimensions()
Returns:
the dimensionality of the points in the dataset

getDistances

public double[][] getDistances()
Returns a matrix of interpoint distances, between the points in the dataset. Note that the lower triangular duplicates the upper triangular

Returns:
a matrix of interpoint distances

getNumPoints

public int getNumPoints()
Returns:
the number of points in the dataset

getSumDistances

public double getSumDistances()
Returns:
the sum of the distances between points in the dataset

getMean

public double getMean(int d)
Returns the mean of the dataset on a given dimension

Parameters:
d - index of the dimension whose mean to get
Returns:
mean of dataset on dimension d

getCovariance

public double getCovariance(int i,
                            int j)
Returns the covariance of the ith component of the dataset with respect to the jth component

Parameters:
i - first dimension
j - seconnd dimesion
Returns:
covariance of i with respect to j

getCovarianceMatrix

public Jama.Matrix getCovarianceMatrix()
Returns a covariance matrix for the dataset

Returns:
covariance matrix which describes how the data covary along each dimension

getKthVariantDimension

public int getKthVariantDimension(int k)
Returns the k'th most variant dimesion. For example, the most variant dimension (k=1), or the least variant dimension (k=num_dimensions)

Parameters:
k -
Returns:
the k'th most variant dimension

getDataset

public java.util.ArrayList getDataset()
Returns:
a reference to the dataset

setDataset

public void setDataset(java.util.ArrayList list)
Parameters:
list - the dataset

printDataset

public void printDataset()
Print out all points in the dataset Useful for debugging


getDoubleStrings

public java.lang.String[][] getDoubleStrings()
Returns a matrix of strings, one row for each datapoint, representing the dataset.

Returns:
a matrix of strings representing the dataset

getDoubles

public double[][] getDoubles()
Returns a matrix of double, one row for each datapoint, representing the dataset.

Returns:
a matrix of double representing the dataset


for more information see hisee.sourceforge.net