Intelligent Storage Systems Exploiting Prefetching Contacts: Prof. G.A. Gibson, Prof. M. Satyanarayanan, R. Hugo Patterson Goals. The design of storage system architecture is central to this thrust area's global goal of efficient integration of storage systems and high-performance computing environments. It is also the project that has experienced the most striking evolution over the last three years. Our earliest specification for the goals of this project said that we sought hardware and software architectures for well-integrated and efficient storage systems. In particular, we saw a need for a model or mechanism for representing low level system characteristics at higher levels or for passing high level information to lower levels. As our computer systems architects interacted with our storage technologists we learned that our initial approach, focussing on disk controller functionality specifically, was not likely to greatly improve storage system performance because controllers frequently have no work to do. We realized that a broader strategy was needed to expose parallelism in much higher levels of software so that storage devices can be more efficiently used and application programs' actual needs can be better met. With this in mind we sought a new model for the interaction of storage and processors while still focussing on mechanisms to export information from high to low levels of the system. The result is our paradigm for optimizing overall systems performance without defeating the important concepts of layering and modularity. In this paradigm all levels of a system should have an interface to "disclose" their future, dynamic behavior to lower levels which are then free to optimize for a "known" future. The problem with a controller-centric approach to storage system efficiency is its very indirect effect on key problems such as the growing access gap between the access time of primary and secondary storage technologies. While today's technologies improve the capacity of both storage media at a furious pace, they are not as successful at reducing access times. Instead, the speed of computations using only primary memory is increasing five to ten times faster than the speed with which blocks on magnetic disk can be accessed [Katz89, Patterson88]. To overcome this relative deterioration of secondary storage performance, Input/Output systems research has turned to highly parallel disk arrays [Gibson92, Patterson88], high bandwidth networks such as Nectar, and caching file servers such as AFS and Coda. Disk arrays increase secondary storage throughput by spreading data over many disks so that large accesses are striped for parallel operation and so that many small accesses can operate concurrently. Fast networks preserve throughput improvements derived from disk arrays on file servers as data is delivered to increasingly more distant client systems. Secondary storage latency, on the other hand, is primarily addressed by large client and server file caches. Whenever data to be read is found in the cache or data to be written can be delayed in the cache, latency is as short as if secondary storage were constructed with semiconductor memory. Unfortunately, read accesses into a cache all to infrequently fails to find requested data already there [Baker91]. The key to overcoming cache misses and the long read latencies that result is aggressive, correct prefetching of data needed in the immediate future. Transparent Informed Prefetching (TIP) To be most successful, prefetching must be based on knowledge of future I/O accesses, not inferences. Such knowledge is often available at high levels of the system. For example, the file access patterns of many programs are determined by simple, single-flow control structures well understood by their programmers. These programmers can give hints about their programs' accesses to the file system. Thus informed, the file system can transparently prefetch needed data and optimize resource utilization. We call this Transparent Informed Prefetching (TIP). TIP is a powerful and flexible mechanism for reducing application execution time in two ways. 1. By exposing concurrency in the I/O workload, TIP can a) service multiple I/O requests concurrently; thereby, converting the high throughput of disk arrays and other new peripheral technologies to low read latency, b) overlap I/O with computation or user think time, and c) optimize I/O accesses over the larger number of outstanding requests. 2. With knowledge of future I/O requests, TIP can make informed cache management decisions that lower cache miss rates. In the short term, we are developing a TIP prototype with access to a disk array. Such a platform will be used to experiment with the success of our approach across a wide range of applications. Over the long term, our work targets the following knowledge-based advances: o appropriate abstractions that facilitate the integration of hint optimizations into increasingly layered and standardized systems, o matching, execution, and retirement of hints in a timely and efficient manner, o robust reactions to imprecise and incorrect hints, o memory resource management given hints that call for a very large amount of data for future accesses, and o peripheral interface and disk controller designs that fully and efficiently exploit anticipatory prefetching. We also anticipate the following technology-based advances to be arise from this project: o a Unix implementation of an intelligent prefetching system, and o hint extraction tools including compiler-based hint extraction and hint generating enhancements to important applications. Accomplishments. Much of the evolution of this project's goals are its most important achievement. Transparent Informed Prefetching is a powerful model for the interaction between high, application levels of system software and lower levels of system software or storage system hardware [Gibson92]. It provides more than our goal of passing information from high to low levels of the system; it provides a paradigm for system wide optimization that remains consistent with sound software engineering principles such as modularity and layering. The implications of this vertical optimizing paradigm go far beyond our storage systems perspective. Already synergistic projects in distributed file systems, operating systems, and compilation techniques for extracting hints have been initiated. A key aspect of this paradigm is that the "hints" that cross system interfaces should disclose knowledge of higher level behavior rather than advise lower levels how to make policy decisions. Such disclosing hints pass information rather than speculation in a manner that allows lower levels to respond dynamically and appropriately. A simple rule of thumb distinguishes between good and bad hints. Good hints are specified using the same abstractions and semantics that an application later uses to demand access to its files, whereas bad hints are stated in terms meaningful only to the system's implementation. We believe this disclosing hints approach to optimization in modern computer systems has great potential. More specific to our progress investigating Transparent Informed Prefetching, we have performed preliminary experiments intended to demonstrate potential benefits and obstacles [Patterson93]. In these local disk and network file server experiments, a separate user-level process prefetched files while the program under test ran unmodified in its own process. These experiments show a 13% reduction in the execution time of the compilation orchestrating program make applied to an X windows application with ideal hints accessing a single local disk, and a 20% reduction in the same program's execution time when accessing data remotely in the Coda distributed file system [Satyanarayanan90]. A second example, the grep text search for a simple pattern in 58 files stored remotely in Coda, achieves a 30% reduction in its execution time when the shell issues the command arguments as a hint in parallel with initiating the search. These results were obtained on systems with only a single disk! Because the primary advantage of this prefetching approach is to convert the high throughput of highly parallel disk arrays into low latency for application programs, we look forward to testing TIP on the Nectar-attached disk array under development. Among the performance pits these preliminary tests demonstrated were high software overhead, terrible disk scheduling, and runaway prefetching that overran the cache and starved target application programs. To avoid such pitfalls, TIP must be able to track consumption to throttle prefetching and reuse buffers, and to prefetch efficiently without delaying the user. That is, these tests demonstrated that we must integrate TIP with low-level resource management. To this end we have begun implementing TIP in our Mach/Unix kernel research environment and instrumenting important applications, including the ones already tested, to provide dynamic hints. At this time this implementation effort allows us to request the complete and sequential prefetching of one sequence of files. Currently a fixed static number of buffers are used for prefetching by the cache manager and new prefetch requests are issued whenever buffers containing previously prefetched data are read by an application or aged out of the cache. Plans. First on our list of plans is to complete this implementation in our local Mach/Coda Unix environment then instrument a number of applications that exploit different hint mechanisms and are likely to benefit substantially. The first two target applications are the search tool "grep", the compilation tool "make" and a large scientific program (to be determined). Hints from "make" are about the files accessed by other programs, primarily compilers and loaders (and sed and awk). On the other hand, hints from large scientific programs are given by the programmer or compiler and apply to a few large files possibly accessed many times. These applications will provide an early test of the opportunity and utility of our ideas. We expect to have developed substantial experience with this implementation by the end of this year. As our experience grows, we expect to apply our ideas to a much wider domain of systems and applications. We plan to modify compilers to automatically extract access patterns and issue hints from application code. Eventually, we plan to integrate the extraction and exploitation of hints into an efficient, hierarchical interface for optimizing secondary storage performance. In the long term, we expect to require and develop new interfaces to and functions in storage control units. We have already identified preemption with later resumption as a candidate for inclusion. We will evaluate this and other additions after our system is instantiated.