Objectives of Thrust Area for Storage and Computer Systems Integration The main objective of this area is to explore efficient integration of storage systems into high-performance computing environments. The raw performance of storage devices is simply one component affecting the overall performance of data access in a computer system. To achieve high performance at the system level, one must consider the performance and interaction of multiple layers of hardware and software above the storage medium. Far too frequently storage performance is squandered by poor interfaces and unintelligent software. Our emphasis, therefore, is on understanding how computer systems use storage, on developing new mechanisms allowing systems to exploit storage, and on integrating storage into high-bandwidth networked environments. Our view of a system encompasses more than a disk subsystem attached to a host computer; we focus on distributed systems where multiple computers and storage systems of various types are interconnected by a computer network. We believe distributed systems will be the norm in future computing environments. To have efficient access to data in a distributed computing environment, many components of the system, including storage systems, file systems, and computer networks must all be well integrated. As much as possible we intend to use an experimental approach basing our analysis on measurements of actual systems. We will construct testbeds and prototypes to experiment with new ideas. We believe the DSSC provides an excellent opportunity to bring together researchers specializing in various levels of the system hierarchy, including drive, network, and file system designers, to influence one another toward designing well-integrated systems. This thrust area has experienced considerable evolution in its first three years. It, unlike other thrust areas, was not well staffed and active before the DSSC was formed. While some of this evolution can be attributed to changes in research faculty, most should be attributed to our openness and creativity in defining the tractable problem areas with the greatest potential for improving future systems. We began by pursuing three projects: measurement and analysis of data access patterns, disk controller design for more efficient, well integrated storage, and the integration of storage systems and high-speed networks. After examining existing controller-based solutions, we chose a more aggressive approach: expand our scope to operating systems and applications interfaces and find ways to extract knowledge of future storage accesses that can be initiated early. We also recognized that disk arrays were a key architecture for future computing systems by starting two projects: highly available disk arrays and interfaces and architectures for small disks. Finally, with this third year we have recognized that disk arrays are a common focus of many of our research efforts. We have merged these efforts into a highly interactive, multifaceted project on the design and evaluation of highly parallel, reliable, and available disk arrays. In our first and most complete effort, we have captured and characterized gigabytes of data access patterns from a distributed environment. This data has been invaluable for the design and evaluation of storage system architectures and their integration into a distributed environment. This project is largely complete; our remaining effort is focussed constructing a synthetic trace generator capable of compactly and flexibly creating traces with characteristics matching our traces. In addition to the characteristics of our local environment, we are seeking assistance from our corporate partners for the compilation of diverse data access patterns, particularly those workloads that feature I/O intensive scientific computing and database processing. To date, we have arranged to obtain traces of HP Lab's research environment and some of Nasa Ames' supercomputing applications. As already mentioned, the second thrust of our research has undergone substantial change. We believe that an important approach to efficiently integrating storage systems into high-performance computing is to develop mechanisms that will allow and encourage high-level software to more fully exploit the capabilities of their underlying storage systems. Toward this goal we have begun to examine high-level tools for prefetching application data based on widely collected hints about data access patterns. Exploiting these hints (for example, non-sequential stride access or the complete list of files needed to compile an application system), will provide a storage system with a good opportunity to improve efficiency because deep queues of large, prioritized transfers will be frequently experienced without negative effects on response time. This research has been explored in a preliminary experiment, is currently being implemented into our UNIX/Mach operating system, and we expect it to lead to new storage system interface designs in a few years. Our third research area focuses on today's most promising storage architecture: disk arrays. This area began with a project to develop a storage server efficiently interfaced to a high performance network. Separate projects for highly available disk arrays and small disk architectures and interfaces were added last year. This year we have added a new, very promising disk array project focussed on removing performance bottlenecks in disk arrays for on-line transaction processing. Because of this proliferation of disk array projects, we have formed a single project to shepherd our various approaches to parallel disk systems.