PDL PROJECTS

Hadoop Workload Analysis

Contact: Kai Ren, Garth Gibson

We have analyzed Hadoop workloads from three different research clusters from a user-centric perspective. The goal is to better understand data scientists' use of the system and how well the use of the system matches its design.

Overall, our analysis suggests that Hadoop usage is still in its adolescence. We do see good use of Hadoop: all workloads are dominated by data transformations that Hadoop handles well; users leverage Hadoop's ability to process massive-scale datasets; customizations are used in a visible fraction of jobs for correctness or performance reasons. However, we also find uses that go beyond what Hadoop has been designed to handle:

In summary, we find that users today make good use of their Hadoop clusters, but there is also significant room for improvement in how users interact with them:

OpenCloud Log Statistics (Total User: 78)

Distribution of Job Structures and Application Frameworks in OpenCloud Logs

 

People

FACULTY

Garth Gibson

GRAD STUDENTS

Kai Ren

Publications

Acknowledgements

We thank N. Balasubramanian and M. Schmitz for helpful comments and discussions. We also thank the owners of the logs from the three Hadoop clusters for graciously sharing these logs with us. This research is supported in part by The Gordon and Betty Moore Foundation, National Science Foundation under awards, SCI-0430781, CCF-1019104. Qatar National Research Foundation 09-1116-1-172, DOE/Los Alamos National Laboratory, under contract number DE-AC52- 06NA25396/161465-1, by Intel as part of ISTC-CC.

We thank the members and companies of the PDL Consortium: Actifio, American Power Conversion, EMC Corporation, Facebook, Google, Hewlett-Packard Labs, Hitachi, Huawei Technologies Co., Intel Corporation, Microsoft Research, NEC Laboratories, NetApp, Inc., Oracle Corporation, Panasas, Samsung Information Systems America, Seagate Technology, Symantec Corporation, VMware, Inc., and Western Digital for their interest, insights, feedback, and support.

^TOP

 

 

© 2014. Last updated 3 September, 2013