DISC-Distances: Analyzing the Distribution of Distances Between Galaxies
We are working on algorithms for computing correlation functions, which is a cosmological application for analyzing the distribution of matter in the universe. We are also addressing the related problem of determining the fractal dimension of the universe. This project is part of the work on astronomy applications of data-intensive super computing (Astro-DISC).
The computation of two-point correlation functions involves representing galaxies as points in three-dimensional space; counting the pairs of galaxies within different distance ranges from each other; and building a histogram that shows the dependency of the number of pairs on the distance. Cosmologists use such histograms as one of their tools for analyzing large-scale properties of the universe and the history of its evolution. A straightforward algorithm for counting galaxy pairs takes quadratic time, which is impractically slow for massive astronomical surveys and cosmological simulations. We are working on distributed approximation algorithms for fast accurate estimation of correlation functions.
We have developed and evaluated several alternative algorithms, including the exact computation with O(n5/3) average running time; an approximate linear-time computation; and another approximate algorithm with sub-linear time, based on sampling of a dataset and finding correlations functions for the sample. We have also designed a top-level control module, which selects an appropriate combination of exact and approximation algorithms for a given dataset size and accuracy requirements. The resulting system can process trillions of galaxies and compute correlation functions with 1% precision.
More details: Summary of the algorithms and empirical results
We are working on a distributed version, which will allow control of
the tradeoff among the approximation accuracy, running time, and
number of used cores. We are also developing algorithms for three-point correlation functions, which compute the
distribution of galaxy triples by distances among them. A longer-term
challenge is to integrate the developed techniques with other
astronomical applictions and build a general-purpose architecture for
the analysis of massive astronomical data.