## DISC-Distances: Analyzing the Distribution of Distances Between Galaxies

**Contact**: Eugene Fink, Julio López, Garth Gibson

We are working on algorithms for computing correlation functions, which is a cosmological application for analyzing the distribution of matter in the universe. We are also addressing the related problem of determining the fractal dimension of the universe. This project is part of the work on astronomy applications of data-intensive super computing (Astro-DISC).

### PROBLEM

The computation of* two-point correlation functions* involves
representing galaxies as points in three-dimensional space; counting
the pairs of galaxies within different distance ranges from each
other; and building a histogram that shows the dependency of the
number of pairs on the distance. Cosmologists use such histograms as
one of their tools for analyzing large-scale properties of the
universe and the history of its evolution. A straightforward algorithm
for counting galaxy pairs takes quadratic time, which is impractically
slow for massive astronomical surveys and cosmological simulations. We
are working on distributed approximation algorithms for fast accurate
estimation of correlation functions.

### RESULTS

We have developed and evaluated several alternative algorithms,
including the exact computation with O(n^{5/3}) average running time; an
approximate linear-time computation; and another approximate algorithm
with sub-linear time, based on sampling of a dataset and finding
correlations functions for the sample. We have also designed a
top-level control module, which selects an appropriate combination of exact and approximation algorithms for a given dataset size and
accuracy requirements. The resulting system can process trillions of
galaxies and compute correlation functions with 1% precision.

**More details**: Summary of
the algorithms and empirical results

### CHALLENGES

We are working on a distributed version, which will allow control of
the tradeoff among the approximation accuracy, running time, and
number of used cores. We are also developing algorithms for* three-point correlation functions*, which compute the
distribution of galaxy triples by distances among them. A longer-term
challenge is to integrate the developed techniques with other
astronomical applictions and build a general-purpose architecture for
the analysis of massive astronomical data.

### PEOPLE

**FACULTY**

Eugene Fink

Garth Gibson

Julio López

Christos Faloutsos

**GRADUATE STUDENTS**

Bin Fu

**EXTERNAL COLLABORATORS**

Tiziana Di Matteo (Physics, Carnegie Mellon University)

Rupert Croft (Physics, Carnegie Mellon University)