Speaker: Khalil Amiri, Carnegie Mellon University
Date: February 24, 2000
Dynamic function placement for data-intensive cluster computing
The dropping cost of magnetic storage, now cheaper than paper and film, together with the explosion of Internet data servers and electronic commerce continues to fuel a rapid growth in the size of on-line data sets and in the number of important data-intensive applications. These applications require resources beyond a single node and are often distributed across many nodes. Since remote communication is much more expensive than local communication, the performance of such distributed applications is sensitive not only to load imbalance across processors but also to the amount of remote data access they perform. Remote data access is in turn dictated by the partitioning, or the placement, of their functions across the network. Optimal function placement for such applications is hard, however, because it depends on several dynamically varying workload and system characteristics, such as the amount of data moved between functions and storage servers, node load,
In this talk, I describe a dynamic approach to function placement in which "mobile" functions automatically gravitate to the source and sink storage servers they are accessing, or to other functions they are closely communicating with. Preliminary experiments demonstrate that dynamic and automatic function placement is feasible, results in dramatic benefits for data-intensive and storage management applications, ranging from 2-10X, and can resolve dynamic competition over shared server
Khalil is a fifth year ECE Ph.D. student. His interests include distributed systems, filesystems and databases, and high-performance servers. He plans to graduate some time this summer. This is the first iteration of his job talk.