PARALLEL DATA LAB

DISC-FINDER CODE DISTRIBUTION:

Identifying Clusters Of Astronomical Objects

A distributed implementation of the Friends-of-Friends algorithm under Hadoop.

The DISC-Finder system is a fast parallel version of the classical Friends-of-Friends algorithm, which identifies clusters of astronomical objects, such as galaxies, based on their three-dimensional Cartesian coordinates.

   download
   ZIP ARCHIVE

MAIN FEATURES

  • Fast distributed computation on a "shared nothing" cluster, where compute nodes do not have shared memory or disk space.
  • Scalability suitable for processing datasets with tens of billions of astronomical objects.

LIMITATIONS

The current implementation is the basic Friends-of-Friends algorithm, which accounts only for the locations of astronomical objects and for the given "linking distance", that is, the given maximal distance of gravitational interaction between objects. It does not account for object masses and velocities. We plan to implement a more general algorithm in the next version of the system.

Please review our software license.