Multi-challenge data set: or data that lies in 10 dimensions.

multi

Format

The data

key: The name of subset
index: The row index of each subet
X1-X10: The values of each dimension from 1 to 10

Source

http://ifs.tuwien.ac.at/dm/dataSets.html

Details

The data has 1000 observations, consisting of five subsets of 200 observations each. The subsets each have different structure in high dimensional space:

Subset A: A Gaussian cluster consisting of three sub clusters in 3-dimensions.
Subset B: Overlapping Gaussian clusters in 3-dimensions. The number of points is skewed, as the first cluster has twice as many points as the second.
Subset C: Two well separated Gaussian clusters in 10-dimensions.
Subset D: Intertwined rings in 3-dimesions.
Subest E: Four piecwise lines produced from a sampling along a curve in 4 dimensions. Each line segment is parallel to an axis in 4-d. As the points get closer to the ends of the curve the the sampling noise increases.

All subsets are normalised to have mean 0 and variance 1.

For more detail see the source.

Multi-challenge data set: or data that lies in 10 dimensions.

Format

Source

Details

Contents