PARALLEL DATA LAB 

PDL Abstract

Mochi: Composing Data Services for High-Performance Computing Environments

Journal of Computer Science and Technology 35(1): 121–144 Jan. 2020.

Robert B. Ross1, George Amvrosiadis2, Philip Carns1, Charles D. Cranor2, Matthieu Dorier1, Kevin Harms1, Gregory R. Ganger2, Garth A. Gibson3, Samuel K. Gutierrez4, Robert Latham1, Bob Robey4, Dana Robinson5, Bradley Settlemyer4, Galen Shipman4, Shane Snyder1, Jerome Soumagne5, Qing Zheng2

1 Argonne National Laboratory, Lemont, IL 60439, U.S.A.
2 Parallel Data Laboratory, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A.
3 Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
4 Los Alamos National Laboratory, Los Alamos NM, U.S.A.
5 The HDF Group, Champaign IL, U.S.A.

http://www.pdl.cmu.edu/

Technology enhancements and the growing breadth of application workflows running on high-performance computing (HPC) platforms drive the development of new data services that provide high performance on these new platforms, provide capable and productive interfaces and abstractions for a variety of applications, and are readily adapted when new technologies are deployed. The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices. Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration, Mochi allows each application to use a data service specialized to its needs and access patterns. This paper introduces the Mochi framework and methodology. The Mochi core components and microservices are described. Examples of the application of the Mochi methodology to the development of four specialized services are detailed. Finally, a performance evaluation of a Mochi core component, a Mochi microservice, and a composed service providing an object model is performed. The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.

KEYWORDS: storage and I/O, data-intensive computing, distributed services, high-performance computing
FULL PAPER: pdf