THE DCO

The Data Center Observatory (DCO) is a centerpiece of Carnegie Mellon’s attack on ever-growing data center operational costs. As a data center, it will provide a computation and storage utility to resource-hungry research activities such as data mining, design simulation, network intrusion detection, and visualization. As an observatory, it will provide invaluable real data to systems researchers seeking to understand the sources of operational costs and to evaluate novel solutions. Combining the two builds on Carnegie Mellon's tradition of actively using and show-casing new computing approaches, even as we invent them, allowing us to push the frontiers and stay at the forefront of technology.

WHY IS THE DCO NEEDED?

Operational costs are spiraling out of control. Power and cooling now costs as much computer acquisition, and human administration costs are 4–7X higher than capital expenditures. Worse, both of those costs grow with every generation of computing equipment—for example, the power demand of a computer grows with its speed. Worse still, many of the costs are poorly understood—in particular, it remains something of a mystery where all of that human administration time goes, which makes it difficult to address well with new technologies.

HOW DOES THE DCO HELP?

The DCO enables us to target and attack the real challenges of data centers. First, it provides us a live environment to analyze, exposing where the time goes and where the costs are. The DCO has been designed, from the beginning, with detailed instrumentation at every level, from the power/cooling to the software systems to the human administration time. Researchers will be able to create detailed breakdowns over the long-term and at particular points in time (e.g., during failure events), as well as drive automated problem detection and diagnosis research. Second, it provides us a real environment in which to test reasonably mature new technologies and measure how well they work. Without the DCO, researchers are left with little evidence to corroborate new ideas beyond argumentation, leading many to work on other problems. The DCO allows us to tackle one of the largest IT challenges of our time: data center operational costs.

EARLY RESEARCH ACTIVITIES

The DCO enables a broad range of research activities, beyond “simply” measuring and understanding operational costs. A few initial thrusts include: (1) Adaptive power management. Carnegie Mellon teamed with APC to take advantage of their novel hot air containment approach, achieving energy savings from the beginning. We are also teaming with APC to explore new approaches to dynamically controlling which computers are on and off, based on application demands, saving energy at every level of the system. (2) Automated storage management. Carnegie Mellon’s Parallel Data Lab (PDL) has been developing new architectures and tools for mitigating storage administration costs, and the Self-* Storage system that they are building will provide the DCO’s storage. (3) Finger pointing in large-scale systems. Diagnosing problems is a notoriously difficult problem in data center environments, and we are combining the DCO’s detailed instrumentation with on-line and off-line machine learning tools to automate the process of identifying the root causes of problems that arise

SOME DETAILS

PARTICIPANTS

The DCO is a huge, long-term, collaborative endeavor that includes many organizations across Carnegie Mellon and elsewhere. Carnegie Mellon participants include: the Carnegie Institute of Technology College of Engineering, the Department of Electrical & Computer Engineering, CyLab, the School of Computer Science, Facilities Management Services and the Parallel Data Lab. Industry partners include: APC, EMC, EqualLogic, HP, Hitachi, IBM, Intel, Microsoft, Network Appliance, Oracle, Panasas, Seagate, Sun and Symantec. Government sponsors include: the Air Force Office of Scientific Research, the Army Research Office, the Defense Advanced Research Projects Agency, and the National Science Foundation.

PUBLICATIONS

InteMon: Continuous Mining of Sensor Data in Large-scale Self-* Infrastructures. Evan Hoke, Jimeng Sun, John D. Strunk, Gregory R. Ganger, and Christos Faloutsos. ACM SIGOPS Operating Systems Review. Vol 40 Issue 3. July, 2006. ACM Press.
Abstract / PDF [573K]

CONTACTS

, PDL Director
(412) 268-1297

, PDL Executive Director
(412) 268-5485

, PDL Business Administrator
(412) 268-6716

Mailing Address:
Carnegie Mellon University
5000 Forbes Avenue
CIC Building, Second Floor
Pittsburgh, PA 15213-3891

©2008 Parallel Data Lab | Updated 13-Aug-2008