PARALLEL DATA LAB

Active Storage Networks

From The PDL Packet, Newsletter on Parallel Data Systems, Fall 1998.

Seeking to leverage the synergy that Network-attached Secure Disks (NASD) creates between storage and networking technologies, researchers in the Parallel Data Lab are participating in a DARPA funded research program called Active Storage Networks. The project's goal is to enable flexible construction of sophisticated storage and file-system functionality that can migrate to the most appropriate location in the system (e.g., client, router, or NASD).

For the last several years, the PDL's work on NASD has examined how to exploit computational cycles in storage devices. The results have shown that NASD's high-level storage interface and self-management capability creates highly-scalable storage systems.

More recently, work at MIT, the University of Pennsylvania and other universities has focused on exploiting excess cycles in the network to create Active Nets. Emphasizing both specialized hardware-based function in switches and routers and software-based functionality, preliminary work has shown that Active Nets can enable the rapid deployment of new networking technologies and increase scalability of networks through sophisticated traffic management and caching policies.

We believe Active Storage Networks can play a significant role in the deployment of NASD through the integration of the two technologies. For example, video places significant real-time constraints on data movement through the network and within storage. An Active Network that understands the scheduling requirements should be able to ensure timely data delivery. However, if the data source (i.e. storage) is not ready to transmit or receive the data, then scheduling will fail. Likewise, NASD attempts to solve this problem by integrating real-time scheduling into storage and will fail if the network is unable to deliver data. An integrated Active Storage Network allows the NASDs and Network to cooperate to provide end-to-end delivery guarantees.

Further, if Active Networks can understand the NASD object model, the network will be able to make intelligent policy decisions based on an entire object (or set of objects) and not just on individual packets. This will allow current network-based caching technologies to cache complete NASD objects, or to cache enough of an object to hide initial fetch latencies. Further, integrating NASD's security model with the network will allow caching nodes to provide the same degree of security as storage, creating a complete end-to-end security model while enabling network-based caching.

Active Storage Nets can also enable SAN-, LAN-, and WAN-based communication between client and storage. Instead of relying on the NASD to provide both a highly-optimized SAN protocol and highly robust, wide-area protocol (e.g., TCP/IP), an active network component can serve as a protocol converter. Relying on information about physical network characteristics and traffic patterns, the client and protocol converter will adapt the protocol to minimize the load on the network node while enabling direct client-storage communication.

Active Storage Networks leverage a close integration between the network and NASD devices, to enable storage-based functionality that can migrate to the most appropriate location including client, storage and the network components.

Acknowledgements

We thank the members and companies of the PDL Consortium: Amazon, Google, Hitachi Ltd., Honda, Intel Corporation, IBM, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.