Intelligent Storage Systems Exploiting Prefetching

Contacts: Prof. G.A. Gibson, Prof. M. Satyanarayanan, R. Hugo Patterson

Goals.

The design of storage system architecture is central to this thrust area's
global goal of efficient integration of storage systems and high-performance
computing environments. It is also the project that has experienced the most
striking evolution over the last three years.

Our earliest specification for the goals of this project said that we sought
hardware and software architectures for well-integrated and efficient storage
systems. In particular, we saw a need for a model or mechanism for
representing low level system characteristics at higher levels or for passing
high level information to lower levels. As our computer systems architects
interacted with our storage technologists we learned that our initial
approach, focussing on disk controller functionality specifically, was not
likely to greatly improve storage system performance because controllers
frequently have no work to do. We realized that a broader strategy was needed
to expose parallelism in much higher levels of software so that storage
devices can be more efficiently used and application programs' actual needs
can be better met. With this in mind we sought a new model for the interaction
of storage and processors while still focussing on mechanisms to export
information from high to low levels of the system. The result is our paradigm
for optimizing overall systems performance without defeating the important
concepts of layering and modularity. In this paradigm all levels of a system
should have an interface to "disclose" their future, dynamic behavior to lower
levels which are then free to optimize for a "known" future. 

The problem with a controller-centric approach to storage system efficiency is
its very indirect effect on key problems such as the growing access gap
between the access time of primary and secondary storage technologies. While
today's technologies improve the capacity of both storage media at a furious
pace, they are not as successful at reducing access times. Instead, the speed
of computations using only primary memory is increasing five to ten times
faster than the speed with which blocks on magnetic disk can be accessed
[Katz89, Patterson88].

To overcome this relative deterioration of secondary storage performance,
Input/Output systems research has turned to highly parallel disk arrays
[Gibson92, Patterson88], high bandwidth networks such as Nectar, and caching
file servers such as AFS and Coda. Disk arrays increase secondary storage
throughput by spreading data over many disks so that large accesses are
striped for parallel operation and so that many small accesses can operate
concurrently. Fast networks preserve throughput improvements derived from disk
arrays on file servers as data is delivered to increasingly more distant
client systems. Secondary storage latency, on the other hand, is primarily
addressed by large client and server file caches. Whenever data to be read is
found in the cache or data to be written can be delayed in the cache, latency
is as short as if secondary storage were constructed with semiconductor
memory. Unfortunately, read accesses into a cache all to infrequently fails to
find requested data already there [Baker91]. The key to overcoming cache
misses and the long read latencies that result is aggressive, correct
prefetching of data needed in the immediate future.

Transparent Informed Prefetching (TIP)

To be most successful, prefetching must be based on knowledge of future I/O
accesses, not inferences. Such knowledge is often available at high levels of
the system. For example, the file access patterns of many programs are
determined by simple, single-flow control structures well understood by their
programmers. These programmers can give hints about their programs' accesses
to the file system. Thus informed, the file system can transparently prefetch
needed data and optimize resource utilization. We call this Transparent
Informed Prefetching (TIP). TIP is a powerful and flexible mechanism for
reducing application execution time in two ways.

1. By exposing concurrency in the I/O workload, TIP can
a) service multiple I/O requests concurrently; thereby, converting the high
throughput of disk arrays and other new peripheral technologies to low read
latency, 
b) overlap I/O with computation or user think time, and
c) optimize I/O accesses over the larger number of outstanding requests.

2. With knowledge of future I/O requests, TIP can make informed cache
management decisions that lower cache miss rates. 

In the short term, we are developing a TIP prototype with access to a disk
array. Such a platform will be used to experiment with the success of our
approach across a wide range of applications.
Over the long term, our work targets the following knowledge-based advances:

o appropriate abstractions that facilitate the integration of hint
optimizations into increasingly layered and standardized systems,

o matching, execution, and retirement of hints in a timely and efficient
manner,

o robust reactions to imprecise and incorrect hints,

o memory resource management given hints that call for a very large amount of
data for future accesses, and 

o peripheral interface and disk controller designs that fully and efficiently
exploit anticipatory prefetching.

We also anticipate the following technology-based advances to be arise from
this project:

o a Unix implementation of an intelligent prefetching system, and 

o hint extraction tools including compiler-based hint extraction and hint
generating enhancements to important applications.

Accomplishments.

Much of the evolution of this project's goals are its most important
achievement. Transparent Informed Prefetching is a powerful model for the
interaction between high, application levels of system software and lower
levels of system software or storage system hardware [Gibson92]. It provides
more than our goal of passing information from high to low levels of the
system; it provides a paradigm for system wide optimization that remains
consistent with sound software engineering principles such as modularity and
layering. 

The implications of this vertical optimizing paradigm go far beyond our
storage systems perspective. Already synergistic projects in distributed file
systems, operating systems, and compilation techniques for extracting hints
have been initiated. A key aspect of this paradigm is that the "hints" that
cross system interfaces should disclose knowledge of higher level behavior
rather than advise lower levels how to make policy decisions. Such disclosing
hints pass information rather than speculation in a manner that allows lower
levels to respond dynamically and appropriately. A simple rule of thumb
distinguishes between good and bad hints. Good hints are specified using the
same abstractions and semantics that an application later uses to demand
access to its files, whereas bad hints are stated in terms meaningful only to
the system's implementation. We believe this disclosing hints approach to
optimization in modern computer systems has great potential.

More specific to our progress investigating Transparent Informed Prefetching,
we have performed preliminary experiments intended to demonstrate potential
benefits and obstacles [Patterson93]. In these local disk and network file
server experiments, a separate user-level process prefetched files while the
program under test ran unmodified in its own process. These experiments show a
13% reduction in the execution time of the compilation orchestrating program
make applied to an X windows application with ideal hints accessing a single
local disk, and a 20% reduction in the same program's execution time when
accessing data remotely in the Coda distributed file system
[Satyanarayanan90]. A second example, the grep text search for a simple
pattern in 58 files stored remotely in Coda, achieves a 30% reduction in its
execution time when the shell issues the command arguments as a hint in
parallel with initiating the search. These results were obtained on systems
with only a single disk! Because the primary advantage of this prefetching
approach is to convert the high throughput of highly parallel disk arrays into
low latency for application programs, we look forward to testing TIP on the
Nectar-attached disk array under development.

Among the performance pits these preliminary tests demonstrated were high
software overhead, terrible disk scheduling, and runaway prefetching that
overran the cache and starved target application programs. To avoid such
pitfalls, TIP must be able to track consumption to throttle prefetching and
reuse buffers, and to prefetch efficiently without delaying the user. That is,
these tests demonstrated that we must integrate TIP with low-level resource
management. To this end we have begun implementing TIP in our Mach/Unix kernel
research environment and instrumenting important applications, including the
ones already tested, to provide dynamic hints. At this time this
implementation effort allows us to request the complete and sequential
prefetching of one sequence of files. Currently a fixed static number of
buffers are used for prefetching by the cache manager and new prefetch
requests are issued whenever buffers containing previously prefetched data are
read by an application or aged out of the cache.

Plans.

First on our list of plans is to complete this implementation in our local
Mach/Coda Unix environment then instrument a number of applications that
exploit different hint mechanisms and are likely to benefit substantially. The
first two target applications are the search tool "grep", the compilation tool
"make" and a large scientific program (to be determined). Hints from "make"
are about the files accessed by other programs, primarily compilers and
loaders (and sed and awk). On the other hand, hints from large scientific
programs are given by the programmer or compiler and apply to a few large
files possibly accessed many times. These applications will provide an early
test of the opportunity and utility of our ideas. We expect to have developed
substantial experience with this implementation by the end of this year.
As our experience grows, we expect to apply our ideas to a much wider domain
of systems and applications. We plan to modify compilers to automatically
extract access patterns and issue hints from application code. Eventually, we
plan to integrate the extraction and exploitation of hints into an efficient,
hierarchical interface for optimizing secondary storage performance. 
In the long term, we expect to require and develop new interfaces to and
functions in storage control units. We have already identified preemption with
later resumption as a candidate for inclusion. We will evaluate this and other
additions after our system is instantiated.