Data-Intensive Super Computing (DISC)

    THIS PAGE HAS MOVED. PLEASE UPDATE YOUR BOOKMARKS. IF YOU ARE NOT REDIRECTED IN A FEW SECONDS, PLEASE CLICK HERE TO GO TO OUR NEW PAGE.

    Overview | Applications | Research | Challenges | News | Associated Projects | People | Publications

    Contact: Julio López, Garth Gibson

    The leading Internet search providers have created a new class of large-scale computer systems to support their businesses. We are formulating a plan for a research project that extends the type of computing systems used for Internet search to a larger range of applications. We refer to such systems as "Data-Intensive Super Computing" (DISC) systems. DISC systems differ from conventional supercomputers in their focus on data: they acquire and maintain continually changing data sets, in addition to performing large-scale computations over the data. With the massive amounts of data arising from such diverse sources as telescope imagery, numerical simulations, medical records, online transaction records, and web pages, DISC systems have the potential to achieve major advances in science, health care, business efficiencies, and information access. DISC opens up many important research topics in system design, resource management, programming models, parallel algorithms, and applications. By engaging the academic research community in these issues, we can more systematically and in a more open forum explore fundamental aspects of a societally important style of computing.

    Applications

    • Web search without language barriers.
    • Inferring biological function from genomic sequences
    • Predicting and modeling the effects of earthquakes
    • Discovering new astronomical phenomena from telescope imagery data
    • Synthesizing realistic graphic animations
    • Understanding the spatial and temporal patterns of brain behavior based on MRI data

    Research Areas

    • Programming models for DISC systems
    • Methodologies and tools for supporting software development in DISC systems
    • Runtime software support for DISC systems
    • Resource management and sharing
    • Hardware and processor design for DISC systems.

    Challenges

    • How should the processors be designed for use in cluster machines?
    • How can we effectively support different scientific communities in their data management and applications?
    • Can we radically reduce the energy requirements for large-scale systems?
    • How do we build large-scale computing systems with an appropriate balance of performance and cost?
    • How can very large systems be constructed given the realities of component failures and repair times?
    • Can we support a mix of long-running data-intensive jobs with ones requiring interactive response?
    • How do we control access to the system while enabling sharing?
    • Can we deal with bad or unavailable data in a systematic way?
    • Can high performance systems be build from heterogeneous components?

    News

    Yahoo! press releases:

    Associated Projects

    People

    FACULTY

    GRADUATE STUDENTS

    EXTERNAL COLLABORATORS

    • Steve Schlosser (Intel)
    • Gary Grider (LANL)
    • James Nunez (LANL)
    • Jay Kistler (Yahoo!)
    • Chris Olston (Yahoo!)

    Publications and Presentations

    • Data-Intensive Supercomputing: The Case for DISC. Randal E. Bryant. Carnegie Mellon University School of Computer Science Tech Report CMU-CS-07-128. May 10, 2007.
      PDF
    • Data-Intensive Supercomputing: Presentation to the 2007 Federated Computing Research Conference (FCRC)
      V1 | Revised Version

     

    PDL Home

    © 2016.
    Last updated on 6 October, 2009