OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS

Shared by Elizabeth Foughty, updated on Oct 13, 2010

Summary

Abstract

OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS

BRADLEY W. JACKSON, JEFFREY D. SCARGLE, AND CHRIS CUSANZA, DAVID BARNES, DENNIS
KANYGIN, RUSSELL SARMIENTO, SOWMYA SUBRAMANIAM, TZU-WANG CHUANG

Abstract. Consider piece-wise constant approximations to a function of several parameters, and
the problem of finding the best such approximation from measurements at a set of points in the
parameter space. We find good approximate solutions to this problem in two steps: (1) partition
the parameter space into cells, one for each of the N data points, and (2) collect these cells into
blocks, such that within each block the function is constant to within measurement uncertainty.
We describe a branch-and-bound algorithm for finding the optimal partition into connected blocks,
as well as an O(N2) dynamic programming algorithm that finds the exact global optimum over this
exponentially large search space, in a data space of any dimension. This second solution relaxes
the connectivity constraint, and requires additivity and convexity conditions on the block fitness
function, but in practice none of these items cause problems. From the wide variety of intelligent
data understanding applications (including cluster analysis, classification, and anomaly detection)
we demonstrate two: partitioning of the State of California (2D) and the Universe (3D).

show more info
Publication Name
N/A
Publication Location
N/A
Year Published
N/A

Files

Paper 8 .pdf
OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS
575.6 KB 491 downloads

Discussions

Add New Comment