Recent Publications
- An Analysis of Traces from a Production MapReduce Cluster. Soila Kavulya, Jiaqi Tan, Rajeev Gandhi and Priya Narasimhan. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-107. December 2009.
Abstract / PDF [961K]
- ...And eat it too: High read performance in write-optimized HPC I/O middleware file formats.
Milo Polte, Jay Lofstead, John Bent, Garth Gibson, Scott A. Klasky, Qing Liu, Manish Parashar, Norbert Podhorszki, Karsten Schwan, Meghan Wingate, Matthew Wolf. 4th Petascale Data Storage Workshop held in conjunction with Supercomputing '09, November 15, 2009. Portland, Oregon. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-111, November 2009.
Abstract / PDF [388K]
- PLFS: A Checkpoint Filesystem for Parallel Applications. John Bent, Garth Gibson, Gary Grider, Ben McClelland,
Paul Nowoczynski, James Nunez, Milo Polte, Meghan Wingate. Supercomputing '09, November 15, 2009. Portland, Oregon.
Abstract / PDF [388K]
- DiskReduce: RAID for Data-Intensive Scalable Computing.
Bin Fan, Wittawat Tantisiriroj, Lin Xiao, Garth Gibson.
4th Petascale Data Storage Workshop held in conjunction with Supercomputing '09, November 15, 2009. Portland, Oregon. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-112, November 2009.
Abstract / PDF [304K]
- Access Control for Home Data Sharing: Attitudes, Needs and Practices.
Michelle L. Mazurek, J.P. Arsenault, Joanna Bresee, Nitin Gupta, Iulia Ion, Christina Johns, Daniel Lee, Yuan Liang, Jenny Olsen, Brandon Salmon, Richard Shay, Kami Vaniea, Lujo Bauer, Lorrie Faith Cranor, Gregory R. Ganger, Michael K. Reiter. Carnegie Mellon University Parallel Data Lab Technical Report
CMU-PDL-09-110.
October 2009.
Abstract / PDF [250K]
- Understanding and Maturing the Data-Intensive Scalable Computing Storage Substrate. Garth Gibson, Bin Fan, Swapnil Patil, Milo Polte, Wittawat Tantisiriroj, Lin Xiao.
Microsoft Research eScience Workshop 2009, Pittsburgh, PA, October 16-17, 2009.
Abstract / PDF [520K]
- FAWN: A Fast Array of Wimpy Nodes.
David Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan. Proc. 22nd ACM Symposium on Operating Systems Principles (SOSP 2009), Big Sky, MT. October 2009. BEST PAPER AWARD!
Abstract / PDF [332K]
- Co-scheduling of Disk Head Time in Cluster-based Storage.
Matthew Wachs, Gregory R. Ganger. 28th International Symposium On Reliable Distributed Systems September 27-30, 2009. Niagara Falls, New York, U.S.A. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report
CMU-PDL-08-113.
October 2008.
Abstract / PDF [245K]
- Delayed Instantiation Bulk Operations for Management of Distributed, Object-based Storage Systems.
Andrew J. Klosterman. Ph.D. Dissertation. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-108, August 2009.
Abstract / PDF [2M]
- Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication. Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, Brian Mueller. SIGCOMM’09, August 17–21, 2009, Barcelona, Spain.
Abstract / PDF [755K]
- Efficient Byzantine Fault Tolerance for Scalable Storage and Services. James Hendricks. Carnegie Mellon School of Computer Science Ph.D. Dissertation CMU-CS-09-146. July 2009.
Abstract / PDF [1.1M]
- No Downtime for Data Conversions: Rethinking Hot Upgrades. Tudor Dumitraş, Priya Narasimhan. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-106. July 2009.
Abstract / PDF [855K]
- In Search of an API for Scalable File Systems: Under the table or above it? Swapnil Patil, Garth A. Gibson, Gregory R. Ganger, Julio Lopez, Milo Polte, Wittawat Tantisiroj, and Lin Xiao. USENIX HotCloud Workshop 2009. June 2009, San Diego CA.
Abstract / PDF [260K]
- Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop. Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan. Workshop on Hot Topics in Cloud Computing (HotCloud '09), San Diego, CA, on June 15, 2009. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-103, May 2009.
Abstract / PDF [373K]
- System-Call Based Problem Diagnosis for PVFS. Michael P. Kasick, Keith A. Bare, Eugene E. Marinelli III, Jiaqi Tan, Rajeev Gandhi, Priya Narasimhan. Proceedings of the 5th Workshop on Hot Topics in System Dependability (HotDep '09). Lisbon, Portugal. June 2009.
Abstract / PDF [117K]
- Tashi: Location-aware Cluster Management. Michael A. Kozuch, Michael P. Ryan, Richard Gass, Steven W. Schlosser, David O’Hallaron, James Cipar, Elie Krevat, Julio López, Michael Stroucken, Gregory R. Ganger.
First Workshop on Automated Control for Datacenters and Clouds (ACDC'09), Barcelona, Spain, June 2009.
Abstract / PDF [160K]
- Directions for Shingled-Write and Two-Dimensional Magnetic Recording System Architectures: Synergies with Solid-State Disks. Garth Gibson, Milo Polte. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-104. May 2009.
Abstract / PDF [70K]
- Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop. Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-103. May 2009.
Abstract / PDF [1.2M]
- File System Virtual Appliances: Portable File System Implementations.
Michael Abd-El-Malek, Matthew Wachs, James Cipar, Karan Sanghi, Gregory R. Ganger, Garth A. Gibson, Michael K. Reiter. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-102. May 2009.
Abstract / PDF [486K]
- FAWNdamentally Power-efficient Clusters. Vijay Vasudevan, Jason Franklin, David Andersen,
Amar Phanishayee, Lawrence Tan, Michael Kaminsky, Iulian Moraru.
12th Workshop on Hot Topics in Operating Systems (HotOS XII). May 2009.
Abstract / PDF [236K]
- Perspective: Semantic Data Management for the Home. Brandon Salmon, Steven W. Schlosser, Lorrie Faith Cranor, Gregory R. Ganger. 7th USENIX Conference on File and Storage Technologies (FAST '09). Feb. 24-27, 2008. San Francisco, CA. Supercedes Carnegie
Mellon University Parallel Data Lab Technical Report CMU-PDL-08-105, May
2008.
Abstract / PDF [275KM]
- Relative Fitness Modeling.
Michael P. Mesnier, Matthew Wachs, Raja R. Sambasivan, Alice X. Zheng, and Gregory R. Ganger.
Communications of the ACM, Vol. 52 No. 4, April 2009.
Abstract / PDF [775K]
- Solving TCP Incast in Cluster Storage Systems. Vijay Vasudevan, Hiral Shah, Amar Phanishayee, Elie Krevat, David Andersen, Greg Ganger, Garth Gibson. FAST 2009 Work in Progress Report. 7th USENIX Conference on File and Storage Technologies. Feb 24-27, 2009, San Francisco, CA.
PDF [70K]
- A (In)Cast of Thousands: Scaling Datacenter TCP to Kiloservers and Gigabits. Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat,
David G. Andersen, Gregory R. Ganger, Garth A. Gibson.
Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-101, Feb. 2009.
Abstact / PDF [317K]
- A Fault Model for Upgrades in Distributed Systems. Tudor Dumitraş, Soila Kavulya, Priya Narasimhan. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-08-115, December 2008.
Abstract / PDF [275K]