TIP: Transparent Informed Prefetching and Caching
The I/O BottleneckDisk arrays eliminate the I/O bottleneck, right? Wrong!
Many applications have serial I/O workloads that don't benefit from a disk array any more than single-threaded applications benefit from a parallel processor. Read latency dominates I/O performance for such serial I/O workloads, and disk arrays don't reduce latency. How can we help applications leverage disk array parallelism for low access latency?
In a larger context, the growth of distributed file systems, wide-area networks, and, yes, the Web has moved users farther from their data and added latency to data accesses. How can we help applications take full advantage of the available network bandwidth to minimize latency?
We propose that applications should issue hints which disclose their future I/O accesses. Prefetching aggressively based on application disclosures could do more harm than good if it caused valuable pages to be prematurely evicted from the cache. Therefore, we need to determine when cache buffers should be used to hold prefetched data instead of data for reuse. To address this issue, we developed a framework for resource management based on cost-benefit analysis. It uses a system performance model to estimate the benefit of using a buffer for prefetching and the cost of taking a buffer from the cache. We implemented a system that computes these estimates dynamically and reallocates a buffer from the cache for prefetching when the benefit is greater than the cost. Look here for more information about TIP, our informed prefetching and caching system.
Integrating Disk Management into the Cost-Benefit Analysis
The cost-benefit analysis depends on accurate estimators of the benefit of initiating an I/O, and the cost of evicting data from a buffer. We have developed a set of estimators that take into account the layout of data on the disks, the current state of the buffer cache, and the per-process upcoming I/O load (determined by hints if available, or by recent activity levels otherwise). These new estimators prefetch and cache more aggressively for disks that will be overloaded in the future, and more conservatively for disks whose bandwidth is sufficient to meet all demands. The resulting algorithm is called TIPTOE: TIP with Temporal Overload Estimators.
TIP evolved out of a desire to reduce read latency. When storage is behind a network interface (either a traditional networked file system or a NASD), there is even more latency for TIP to hide. We are investigating several variants of remote TIP: A client-only version that treats remote storage as if it were a disk with higher and potentially variable latency, a mostly-server version that runs the TIP system at the storage and attempts to insure that all fetches from the client hit in the storage's cache, and a cooperative version, that exploits intelligence at both client and server.
Automatic Hint Generation through Speculative ExecutionThe other half of the problem is figuring out how to modify applications so that they generate hints disclosing their future I/O accesses. To demonstrate the effectiveness of our system for informed resource management, we manually modified a suite of I/O-intensive applications to issue hints. Manual modification is not ideal, however, because it requires source code, and can require significant programming effort to ensure that hints are issued in a timely manner. Instead, we propose that a wide range of disk-bound applications could dynamically discover their own future data needs by opportunistically exploiting any unused processing cycles to perform speculative execution, an eager pre-execution of application code using the available, incomplete data state. Look here for more information about our speculative execution approach.
We thank the members and companies of the PDL Consortium: Amazon, Google, Hewlett Packard Enterprise, Hitachi Ltd., Intel Corporation, IBM, Meta, Microsoft Research, NetApp, Inc., Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Seagate Technology, Two Sigma, and Western Digital for their interest, insights, feedback, and support.