Internet DRAFT - draft-gibson-pnfs-reqs

Network Working Group                                          G. Gibson
Internet-Draft                            Panasas Inc. & Carnegie Mellon
Expires: April 18, 2005                                         B. Welch
                                                            Panasas Inc.
                                                              G. Goodson
                                                              P. Corbett
                                                  Network Appliance Inc.
                                                        October 18, 2004

          Parallel NFS Requirements and Design Considerations

Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of section 3 of RFC 3667.  By submitting this Internet-Draft, each
   author represents that any applicable patent or other IPR claims of
   which he or she is aware have been or will be disclosed, and any of
   which he or she become aware will be disclosed, in accordance with
   RFC 3668.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on April 18, 2005.

Copyright Notice

   Copyright (C) The Internet Society (2004).


   This draft specifies the requirements that should be satisfied in the
   definition of a parallel NFS protocol and the considerations
   recommended for its designs.  It responds to the scalable bandwidth

Gibson, et al.           Expires April 18, 2005                 [Page 1]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

   problem described in the pNFS Problem Statement,
   draft-gibson-pnfs-problem-statement-01.txt.  In the interest of a
   timely adoption of scalable bandwidth file service, parallel NFS is
   proposed to be a NFSv4 minor extension for communicating file layout
   available through existing and future storage subsystem protocols
   such as other NFSv4 file servers (NFS), block-based SCSI subsystems
   (SBC), and object-based SCSI (OSD) subsystems.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  NFSv4 Minor Extension  . . . . . . . . . . . . . . . . . . . .  5
   3.  Scalability  . . . . . . . . . . . . . . . . . . . . . . . . .  6
     3.1   Scalable Bandwidth . . . . . . . . . . . . . . . . . . . .  6
     3.2   Scalable Capacity  . . . . . . . . . . . . . . . . . . . .  6
   4.  Interoperability . . . . . . . . . . . . . . . . . . . . . . .  7
     4.1   NFSv4 Interoperability . . . . . . . . . . . . . . . . . .  7
     4.2   Storage Protocol Interoperability  . . . . . . . . . . . .  7
     4.3   Separability of Storage Protocols  . . . . . . . . . . . .  7
   5.  Concurrent Sharing . . . . . . . . . . . . . . . . . . . . . .  8
     5.1   Shared Direct Access to Storage  . . . . . . . . . . . . .  8
     5.2   Attribute Updates  . . . . . . . . . . . . . . . . . . . .  8
     5.3   Client caching . . . . . . . . . . . . . . . . . . . . . .  8
   6.  Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 11
     7.1   File Storage Access Protocols  . . . . . . . . . . . . . . 11
     7.2   Object Storage Access Protocols  . . . . . . . . . . . . . 11
     7.3   Block Storage Access Protocols . . . . . . . . . . . . . . 11
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
   10.   References . . . . . . . . . . . . . . . . . . . . . . . . . 13
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 13
       Intellectual Property and Copyright Statements . . . . . . . . 15

Gibson, et al.           Expires April 18, 2005                 [Page 2]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

1.  Introduction

   In many application areas, single system servers are rapidly being
   replaced by clusters of inexpensive commodity computers.  As
   clustering technology has improved, the barriers to running
   application codes on very large clusters have been lowered.  Examples
   of application areas that are seeing the rapid adoption of scalable
   client clusters are data intensive applications such as genomics,
   seismic processing, data mining, content and video distribution, and
   high performance computing.  The aggregate storage I/O requirements
   of a cluster can scale proportionally to the number of computers in
   the cluster.  It is not unusual for clusters today to make bandwidth
   demands that far outstrip the capabilities of traditional file
   servers.  A natural solution to this problem is to enable file
   service to scale as well, by increasing the number of server nodes
   that are able to service a single file system to a cluster of

   Scalable bandwidth can be claimed by simply adding multiple
   independent servers to the network.  Unfortunately, this leaves to
   file system users the task of spreading data across these independent
   servers.  Because the data processed by a given data-intensive
   application is usually logically associated, users routinely
   co-locate this data in a single file system, directory or even a
   single file.  The NFSv4 protocol currently requires that all the data
   in a single file system be accessible through a single exported
   network endpoint, constraining access to be through a single NFS

   A better way of increasing the bandwidth to a single file system is
   to enable access to be provided through multiple endpoints in a
   coordinated or coherent fashion.  Separation of control and data
   flows provides a straightforward framework to accomplish this, by
   allowing transfers of data to proceed in parallel from many clients
   to many data storage endpoints.  Control and file management
   operations, inherently more difficult to parallelize, can remain the
   province of a single NFS server, inheriting the simple management of
   today's NFS file service, while offloading data transfer operations
   allows bandwidth scalability.  Data transfer may be done using NFS or
   other protocols, such as iSCSI, under the control of an NFSv4 server
   with parallel NFS extensions.  Such an approach protects the
   industry's large investment in NFS, since the bandwidth bottleneck no
   longer needs to drive users to adopt a proprietary alternative
   solution, and leverages SAN storage infrastructures, all within a
   common architectural framework.

   This document sets requirements for extensions to the NFSv4 protocol,
   the parallel NFS extensions, to enable the extended NFSv4 server to

Gibson, et al.           Expires April 18, 2005                 [Page 3]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

   manage clients that are enabled to directly access storage.

Gibson, et al.           Expires April 18, 2005                 [Page 4]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

2.  NFSv4 Minor Extension

   This document includes the definition of the requirements for
   protocol extensions to implement Parallel NFS.

   It is believed that this extension can fit within the
   minor-versioning of the NFSv4 protocol framework presented in RFC
   3050.  NFSv4's minor-versioning requirement specifies that no changes
   are to be made to an existing operation's arguments or results (with
   the exception of GETATTR4).  Also, new operations may only be added
   to the COMPOUND and CB_COMPOUND procedures.

   Minor-versioning also requires that the Parallel NFS extension is
   compatible with all preceding NFSv4 minor versions.  Accordingly,
   until a minor extension is accepted, its requirements may be impacted
   by the approval of another minor extension, although an impact like
   this by one minor extension on another is typically to be avoided.

Gibson, et al.           Expires April 18, 2005                 [Page 5]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

3.  Scalability

3.1  Scalable Bandwidth

   A principle purpose for parallel NFS is to enable clients of an NFS
   service to achieve individual and aggregate file and file system
   bandwidths that can scale with storage device, storage networking and
   client resources.  The core point in the parallel NFS problem
   statement [1] is that bandwidth scaling is not provided by the
   existing NFS approach of forwarding all data through a single network
   endpoint associated with the NFS file server.

   Parallel NFS must enable high bandwidth access by single clients and
   aggregates of clients, especially clusters of clients, into one file
   system, into possibly small and arbitrary collections of files, and
   into just one file.

   Moreover, a parallel NFS solution for scalable bandwidth must enable
   an NFS client to directly and in parallel access a file, possibly
   small and arbitrary collection of files or a file system that is
   spread over multiple distinct network endpoints.  That is, it must be
   possible for single files and collections of related files to be
   "striped" over physically different storage subsystems each with its
   own network endpoint.

3.2  Scalable Capacity

   Parallel NFS must enable the capacity of a single file, a possibly
   small and arbitrary collection of files and a single file system to
   grow in proportion to the available storage resources.

   This reflects a recognition that when bandwidth scales, the size of
   the file(s) accessed should be expected to grow proportionately, and
   that striping over network endpoints is not required to be effective
   with arbitrarily small amounts of data residing at a single network

   This requirement does not supersede file and file system limitations
   on the size of an individual file or file system.

Gibson, et al.           Expires April 18, 2005                 [Page 6]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

4.  Interoperability

4.1  NFSv4 Interoperability

   Parallel NFS is a optional minor extension of NFSv4.  Accordingly,
   any client capable of using the parallel NFS extensions must also be
   able to interoperate with an NFSv4 server that is not capable of
   using the parallel NFS extensions, and any NFSv4 server that is
   capable of using the parallel NFS extensions must also be able to
   provide full service for an NFSv4 client that is not capable of using
   the parallel NFSv4 extensions.

4.2  Storage Protocol Interoperability

   The protocols used by parallel NFS capable clients to directly access
   storage must be well defined, standards-based storage protocols.

   In the interest of wider applicability of parallel NFS, the
   extensions to NFSv4 that enable and manage a client's opportunity to
   directly access storage subsystems must be agnostic to actual storage
   protocol employed, and that it be possible for new storage protocols
   to be added to the set that a parallel NFS server supports.

   It is anticipated that parallel NFS storage protocols will be defined
   using (possibly) non-parallel NFSv4 as a storage protocol, using
   block-based SCSI (SBC) as a storage protocol and using object-based
   SCSI (OSD) as a storage protocol.  SBC and OSD SCSI storage
   protocols, in at least some implementations, are anticipated to
   employ an iSCSI storage transport protocol.

4.3  Separability of Storage Protocols

   The interpretation of a layout, the bits a parallel NFS server gives
   to a parallel NFS client to enable the client to know how and where
   to directly access a file or file system striped over multiple
   storage network endpoints, is not needed for correct execution of the
   parallel NFS extension operations.

   At least one instance of a parallel NFS layout format and storage
   access protocol must be fully specified and multiply implemented.

Gibson, et al.           Expires April 18, 2005                 [Page 7]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

5.  Concurrent Sharing

5.1  Shared Direct Access to Storage

   The parallel NFS extension should support shared access to storage by
   many clients.  This includes access to the same storage devices by
   multiple clients, as well as access to the same files stored on one
   or more storage devices.  The result extends the basic shared file
   system abstraction provided by NFS giving clients direct access to
   storage devices under the overall control of an NFS server
   responsible for authorizing such direct access and delimiting its
   scope and duration.

   The parallel NFS extension should allow clients to specify points in
   time at which updates must be made visible to other clients.  This
   requirement is more conducive to optimizations that can lead to high
   performance.  It also complements the programming model used by
   parallel applications.

   In this model, individual clients compute independently, generate
   results, and then synchronize with the overall computation.  When
   storing results to shared storage, it may be necessary to communicate
   with the NFS server to ensure that updates are visible to other
   clients.  When making these updates visible, it is important for
   efficiency to limit the need for separate interactions with the
   server to those points that are truly required by the demands of the

5.2  Attribute Updates

   File updates include changes to associated attributes that include
   the file size (i.e., end-of-file position), file modify time, file
   access time, and file change time.  The parallel NFS extension allows
   that updates to these attributes follow the same model as data
   updates where updates are only guaranteed to be visible to other
   clients in response to explicit operations performed by the modifying
   client.  The values of these attributes at other times may not be
   strictly defined.

   The parallel NFS extension acknowledges that some implementations may
   provide looser semantics for file access time.  As well, the
   extension does not mandate strict implementation of the file access
   time attribute.

5.3  Client caching

   The parallel NFS extension does not address issues around client
   caching and the coherency of data stored in different client caches.

Gibson, et al.           Expires April 18, 2005                 [Page 8]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

   The extension assumes that the existing mechanisms that NFS clients
   use to manage their cached data apply equally when they use parallel
   NFS.  Likewise, the this extension should not prevent the
   implementation of a richer/stronger set of caching and coherency

Gibson, et al.           Expires April 18, 2005                 [Page 9]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

6.  Recovery

   Error recovery is often the most difficult aspect of a protocol to
   achieve interoperability.  For this reason these requirements place
   the most stringent demands on parallel NFS servers.  But in the
   interests of performance and scalability, these requirements leave it
   open for client implementations to more fully participate in error

   Specifically, it should be possible for client implementations using
   parallel NFS extensions to have very simple recovery actions, albeit
   probably lowered performance, when coping with errors on the storage
   access protocols.

   Simple clients are envisioned to respond to storage access protocols
   by immediately notifying the managing parallel NFS server of the
   error.  Upon completion of the NFS server's recovery, simple clients
   should be able to complete the action causing the error by
   re-execution.  To make this especially simple, it must be possible
   for a simple parallel NFS client to re-execute using only NFSv4

   As a consequence of this recovery model, an operation, composed of
   one ore more component actions, applied by parallel NFS clients
   directly on storage must be idempotent at the client level.  This is
   not a requirement for atomicity or transactions of the storage access
   protocol, only that it be possible to re-execute the client-level
   operation that experienced error, possibly using different component
   operations directly on storage or through the parallel NFS server,
   and achieve the same transformation on stored information.

Gibson, et al.           Expires April 18, 2005                [Page 10]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

7.  Security Considerations

   The parallel NFS extension must provide a level of security that is
   comparable to that defined in the NFSv4 specification.  NFSv4
   mandates end to end mutual authentication.  All existing NFSv4
   security mechanisms apply to the operations introduced by the
   parallel NFS extension.  In all cases, this extension allows use of
   the direct NFSv4 path of sending both metadata and data requests
   through the metadata server.

   The security model provided by all specified parallel NFS storage
   access protocols must be well documented.  Various storage access
   protocols will have different security mechanisms that protect
   against different types of attacks.  Access protocols that rely on
   trusted environments should not be foreclosed.  However, protocols
   that provide strong security guarantees will be available.

7.1  File Storage Access Protocols

   A file storage access protocol may have the same security mechanism
   between the client and metadata server as between the client and data
   server.  ACLs set at the metadata server are effective at the data
   servers and need not be visible (via getattr) at the data servers.

7.2  Object Storage Access Protocols

   An object storage access protocol may rely on a cryptographically
   secure capability to control accesses at the data servers.  These
   capabilities can be generated by the metadata server after it checks
   access control for a client.  They are returned to the client and
   passed to the object storage device, which verifies that the
   capability allows the requested operation.

7.3  Block Storage Access Protocols

   A block storage access protocol would rely on SAN-based security, and
   the trust that clients will only access the blocks they have been
   directed to use.  There are LUN masking/unmapping and zone-based
   security schemes that can be manipulated to fence clients from each
   other's data.  Block storage access protocols may provide no
   guarantee of data integrity, since any client can modify any data
   block to which it has physical access.

Gibson, et al.           Expires April 18, 2005                [Page 11]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

8.  IANA Considerations

   The parallel NFS protocol extension provides for the naming of the
   specific storage access protocol.  The storage access protocol's name
   is used by the client to interpret the layout information it receives
   from the metadata server.  As well, the name specifies the storage
   access protocol to be used for accessing the data servers.

   The namespace is separated into (at least) three ranges.  First, a
   range of names reserved for future standards-based storage protocol
   specifications (e.g., a block, file, and object storage protocol
   standard).  Second, a range of names reserved for vendor proprietary
   protocols.  Third, a range of names that are reserved for
   non-approved protocols (e.g., custom in-house protocols or for

   Similar to NFSv4 named attributes, the parallel NFS protocol does not
   define the specific assignment of names to storage access protocols
   (nor does it define any specific storage access protocols).  However,
   an IANA registry should be created for the registration of names in
   order to prevent collisions within the namespace.  Along with the
   name, the format of the data layout and the storage access protocol
   should be well defined.  The goal is to promote the interoperability
   of parallel NFS clients and servers.

Gibson, et al.           Expires April 18, 2005                [Page 12]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

9.  Acknowledgements

   Many members of the pNFS informal working group have helped
   considerably.  The authors would like to thank Andy Adamson, David
   Black, Gary Grider, Benny Halevy, Dean Hildebrand, Peter Honeyman,
   Dave Noveck, Julian Satran, and Tom Talpey.

10  References

   [1]  Gibson et. al, "pNFS Problem Statement", July 2004,

Authors' Addresses

   Garth Gibson
   Panasas Inc. & Carnegie Mellon
   1501 Reedsdale Street
   Pittsburgh, PA  15233

   Phone: +1 412 323 3500

   Brent Welch
   Panasas Inc.
   6520 Kaiser Drive
   Fremont, CA  94555

   Phone: +1 510 608 7770

   Garth Goodson
   Network Appliance Inc.
   495 East Java Drive
   Sunnyvale, CA  94089

   Phone: +1 408 822 6847

Gibson, et al.           Expires April 18, 2005                [Page 13]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

   Peter Corbett
   Network Appliance Inc.
   375 Totten Pond Road
   Waltham, MA  02451

   Phone: +1 781 768 5343

Gibson, et al.           Expires April 18, 2005                [Page 14]
Internet-Draft    pNFS Requirements and Design Considerations  October 2004

Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at

Disclaimer of Validity

   This document and the information contained herein are provided on an

Copyright Statement

   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


   Funding for the RFC Editor function is currently provided by the
   Internet Society.

Gibson, et al.           Expires April 18, 2005                [Page 15]