Re:iSCSI/iWARP drafts and flow control

To: ips@ece.cmu.edu
Subject: Re:iSCSI/iWARP drafts and flow control
From: Caitlin Bestler <cait@asomi.com>
Date: Sat, 26 Jul 2003 00:49:55 -0500
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; Charset=US-ASCII
Delivered-To: ips-outgoing@sos.ece.cmu.edu
Delivered-To: ips-outgoing@ece.cmu.edu
Delivered-To: ips@sos.ece.cmu.edu
Delivered-To: ips@ece.cmu.edu
In-Reply-To: <009e01c3530c$42803fc0$2f8b170f@rose.hp.com>
Sender: owner-ips@ece.cmu.edu

The proposed mapping of iSCSI onto iWARP offers an
inadequate solution to the problem of flow control.

iWARP shifts responsibility for flow control to the ULP. In
doing so, it allows ULP-specific pacing based upon number
of requests-in-flight rather than relying the bottleneck of
transport buffering to flow control the application. The
session is no longer throttled by the availability of
buffers suitable for any message. This topic is covered in
section 4.5 of the RDMAP/DDP Applicability statement
(http://www.ietf.org/draft-ietf-rddp-applicability-00.txt)

There are two excellent examples of ULP solutions to pacing
untagged messages: DAFS and the mapping of RPC over iWARP
for NFS. The latter offers the following section on flow
control:
   
3.3.  Flow Control

    It is critical to provide flow control for an RDMA
    connection.  RDMA receive operations will fail if a
    pre-posted receive buffer is not available to accept
    an incoming RDMA Send.  Such errors are fatal to the
    connection. This is a departure from conventional
    TCP/IP networking where buffers are allocated
    dynamically on an as-needed basis, and pre-posting is
    not required.
    
    It is not practical to provide for fixed credit limits
    at the RPC server.  Fixed limits scale poorly, since
    posted buffers are dedicated to the associated
    connection until consumed by receive operations. 
    Additionally for protocol correctness, the server must
    be able to reply whether or not a new buffer can be
    posted to accept future receives.
    
    Flow control is implemented as a simple request/grant
    protocol in the transport header associated with each
    RPC message.  The transport header for RPC CALL
    messages contains a requested credit value for the
    server, which may be dynamically adjusted by the
    caller to match its expected needs.  The transport
    header for the RPC REPLY messages provide the granted
    result, which may have any value except it may not be
    zero when no in-progress operations are present at the
    server, since such a value would result in deadlock. 
    The value may be adjusted up or down at each
    opportunity to match the server's needs or policies.
    
    While RPC CALLs may complete in any order, the current
    flow control limit at the RPC server is known to the
    RPC client from the Send ordering properties.  It is
    always the most recent server granted credits minus
    the number of requests in flight.
   
   
   
   
I believe this is quite a contrast with the iSCSI/iWARP proposal:

10.1 Flow Control for RDMA Send Message Types 

    RDMAP Send Message Types are used by the iSER Layer to
    transfer iSCSI control-type PDUs.  Each RDMAP Send
    Message Type consumes an Untagged Buffer at the Data
    Sink.  However, neither the RDMAP layer nor the iSER
    Layer provides an explicit flow control mechanism for
    the RDMAP Send Message Types.  Therefore, the iSER
    Layer SHOULD provision enough Untagged buffers for
    handling incoming RDMAP Send Message Types to prevent
    a buffer underrun condition at the RDMAP layer. If a
    buffer underrun happens, it may result in the
    termination of the connection.  An implementation may
    choose to satisfy this requirement by using a common
    buffer pool shared across multiple connections, with
    usage limits on a per connection basis and usage
    limits on the buffer pool itself.  In such an
    implementation, exceeding the buffer usage limit for a
    connection or the buffer pool itself may trigger
    interventions from the iSER Layer to replenish the
    buffer pool and/or to isolate the connection causing
    the problem.
    
    
Stating that the iSER Layer "SHOULD" provision enough
Untagged buffers is an interesting use of the IETF
"SHOULD". Implementations are *guaranteed* to have a
valid reason to break the "SHOULD", they do not have
enough information to comply. The Upper Layer Protocol
has failed to provide it.

How is the target supposed to estimate how many
untagged messages the initiator will presume it is
capable of handling? Or vise versa? How? Provision
enough buffers to match your physical line rate under
the worst case scenarios? Even if you're an economy
model? Guess? Keep a table by model number? Limit
yourself to one untagged message in flight? Even if
you are supposed to be a high performance model?
Keep trying until you crash the connection?

True interoperability is not based upon tweaking or
fine-tuning to match the peers. Peers work together
because the protocol has enabled any peer to work
with any other compliant peer. Period. Guestimating
has nothing to do with it.

Fortunately, establishing a credit protocol that is
compatible with normal iSCSI interactions is easily
done. Generically an RDMA-capable ULP flow control
strategy requires three things:

1) An initial credit level. This can be established
   during connection/stream establishment just as
   is proposed for RDMA Read Credits.
   
2) A credit is consumed for each untagged message
   sent, exactly as sending each RDMA Read Request
   consumes an RDMA Read credit.
   
3) The ULP reply restores credits. With RDMA Reads
   this is a simple one-to-one process. DAFS also
   uses has each reply replenish the credit that
   the request it is responding to drained. The
   NFS/RPC protocol allows the RPC layer to
   explicitly vary the number of credits
   restored in each untagged message.
   
The only special requirement that I can see is that
there may be a sequence of untagged messages that are
not individually acknowledged. That can be taken care
of by the following rules:

-- A ULP response to a ULP request implies that all
   prior ULP requests have been processed, even if
   they did not warrant an explicit response.
   
-- A ULP response restores credits for itself and
   for any other "phantom" responses that it implies.
   
-- If a ULP needs to send a sequence of untagged
   messages that will not be acknowledge which will
   drain the credits, it needs to insert an untagged
   message that will be acknowledge. Any form of
   echoed NOP or Ping could be used.
   

   

   
   


Caitlin Bestler - cait@asomi.com - http://asomi.com/

References:
- draft-chadalapaka-iwarp-da-00.txt,.pdf
  - From: "Mallikarjun C." <cbm@rose.hp.com>

Prev by Date: Re: Boot through the SAN disk ?
Next by Date: Re:iSCSI/iWARP drafts and flow control
Prev by thread: draft-chadalapaka-iwarp-da-00.txt,.pdf
Next by thread: Re: Boot through the SAN disk ?
Index(es):
- Date
- Thread

Home

Last updated: Tue Aug 05 12:46:10 2003
12771 messages in chronological order