PARALLEL DATA LAB

DISC-DISTANCES CODE DISTRIBUTION:

Analyzing the Distribution of Distances Between Galaxies

Sequential and distributed implementations of exact and approximate computation of two-point correlation functions.

The DISC-Distances system computes two-point correlation functions for a given set of astronomical objects, such as galaxies. That is, it builds a histogram of pairwise distances between objects. This histogram allows determining the correlation between the distance and the probability that two randomly selected objects are within this distance from each other.


ZIP ARCHIVE:
Sequential Version

ZIP ARCHIVE:
Distributed Version (Hadoop)

ZIP ARCHIVE:
Parallel Version (OpenMP)


MAIN FEATURES

  • Fast distributed computation on a "shared nothing" cluster, where compute nodes do not have shared memory or disk space.
  • Support for both exact and approximate computation, with control of the trade-off between speed and accuracy.
  • Automated selection of appropriate algorithms and parameters depending on data properties and target accuracy.

The exact computation scales to datasets with hundreds of thousands of astronomical objects. The approximate computation scales to sets with billions of objects and provides high accuracy with error under 0.1%.

LIMITATIONS

  • The current system does not support computation of three-point and four-point correlation functions. We aim to implement the related algorithms by the end of Fall 2011.

Please review our software license.