Re:iSCSI/iWARP drafts and flow control

To: Caitlin Bestler <cait@asomi.com>
Subject: Re:iSCSI/iWARP drafts and flow control
From: Mike Ko <mako@almaden.ibm.com>
Date: Sat, 26 Jul 2003 13:57:40 -0700
Cc: ips@ece.cmu.edu
Content-Type: text/plain; charset="us-ascii"
Delivered-To: ips-outgoing@sos.ece.cmu.edu
Delivered-To: ips-outgoing@ece.cmu.edu
Delivered-To: ips@sos.ece.cmu.edu
Delivered-To: ips@ece.cmu.edu
Importance: Normal
Sender: owner-ips@ece.cmu.edu

In iSER, we expect the flow control to be regulated by the Command 
Numbering mechanism in iSCSI.  In other words, since the queuing capacity 
of the receiving iSCSI layer is MaxCmdSN - ExpCmdSN + 1, the receiving 
iSER layer can use this information to determine the minimum number of 
untagged buffers.  In addition, it needs to provision a sufficient number 
of untagged buffers to allow enough time for the iSER layer to respond to 
incoming immediate commands, asynchronous messages, etc., and replenish 
the buffers.  The use of a buffer pool shared across multiple connections 
will allow the iSER layer to replenish the buffers on a statistical basis.

Mike Ko
IBM Almaden Research
San Jose, CA 95120

Sent by:        owner-ips@ece.cmu.edu
To:     ips@ece.cmu.edu
cc: 
Subject:        Re:iSCSI/iWARP drafts and flow control



The proposed mapping of iSCSI onto iWARP offers an
inadequate solution to the problem of flow control.

iWARP shifts responsibility for flow control to the ULP. In
doing so, it allows ULP-specific pacing based upon number
of requests-in-flight rather than relying the bottleneck of
transport buffering to flow control the application. The
session is no longer throttled by the availability of
buffers suitable for any message. This topic is covered in
section 4.5 of the RDMAP/DDP Applicability statement
(http://www.ietf.org/draft-ietf-rddp-applicability-00.txt)

There are two excellent examples of ULP solutions to pacing
untagged messages: DAFS and the mapping of RPC over iWARP
for NFS. The latter offers the following section on flow
control:

3.3.  Flow Control

It is critical to provide flow control for an RDMA
connection.  RDMA receive operations will fail if a
pre-posted receive buffer is not available to accept
an incoming RDMA Send.  Such errors are fatal to the
connection. This is a departure from conventional
TCP/IP networking where buffers are allocated
dynamically on an as-needed basis, and pre-posting is
not required.

It is not practical to provide for fixed credit limits
at the RPC server.  Fixed limits scale poorly, since
posted buffers are dedicated to the associated
connection until consumed by receive operations.
Additionally for protocol correctness, the server must
be able to reply whether or not a new buffer can be
posted to accept future receives.

Flow control is implemented as a simple request/grant
protocol in the transport header associated with each
RPC message.  The transport header for RPC CALL
messages contains a requested credit value for the
server, which may be dynamically adjusted by the
caller to match its expected needs.  The transport
header for the RPC REPLY messages provide the granted
result, which may have any value except it may not be
zero when no in-progress operations are present at the
server, since such a value would result in deadlock.
The value may be adjusted up or down at each
opportunity to match the server's needs or policies.

While RPC CALLs may complete in any order, the current
flow control limit at the RPC server is known to the
RPC client from the Send ordering properties.  It is
always the most recent server granted credits minus
the number of requests in flight.




I believe this is quite a contrast with the iSCSI/iWARP proposal:

10.1 Flow Control for RDMA Send Message Types

RDMAP Send Message Types are used by the iSER Layer to
transfer iSCSI control-type PDUs.  Each RDMAP Send
Message Type consumes an Untagged Buffer at the Data
Sink.  However, neither the RDMAP layer nor the iSER
Layer provides an explicit flow control mechanism for
the RDMAP Send Message Types.  Therefore, the iSER
Layer SHOULD provision enough Untagged buffers for
handling incoming RDMAP Send Message Types to prevent
a buffer underrun condition at the RDMAP layer. If a
buffer underrun happens, it may result in the
termination of the connection.  An implementation may
choose to satisfy this requirement by using a common
buffer pool shared across multiple connections, with
usage limits on a per connection basis and usage
limits on the buffer pool itself.  In such an
implementation, exceeding the buffer usage limit for a
connection or the buffer pool itself may trigger
interventions from the iSER Layer to replenish the
buffer pool and/or to isolate the connection causing
the problem.


Stating that the iSER Layer "SHOULD" provision enough
Untagged buffers is an interesting use of the IETF
"SHOULD". Implementations are *guaranteed* to have a
valid reason to break the "SHOULD", they do not have
enough information to comply. The Upper Layer Protocol
has failed to provide it.

How is the target supposed to estimate how many
untagged messages the initiator will presume it is
capable of handling? Or vise versa? How? Provision
enough buffers to match your physical line rate under
the worst case scenarios? Even if you're an economy
model? Guess? Keep a table by model number? Limit
yourself to one untagged message in flight? Even if
you are supposed to be a high performance model?
Keep trying until you crash the connection?

True interoperability is not based upon tweaking or
fine-tuning to match the peers. Peers work together
because the protocol has enabled any peer to work
with any other compliant peer. Period. Guestimating
has nothing to do with it.

Fortunately, establishing a credit protocol that is
compatible with normal iSCSI interactions is easily
done. Generically an RDMA-capable ULP flow control
strategy requires three things:

1) An initial credit level. This can be established
during connection/stream establishment just as
is proposed for RDMA Read Credits.

2) A credit is consumed for each untagged message
sent, exactly as sending each RDMA Read Request
consumes an RDMA Read credit.

3) The ULP reply restores credits. With RDMA Reads
this is a simple one-to-one process. DAFS also
uses has each reply replenish the credit that
the request it is responding to drained. The
NFS/RPC protocol allows the RPC layer to
explicitly vary the number of credits
restored in each untagged message.

The only special requirement that I can see is that
there may be a sequence of untagged messages that are
not individually acknowledged. That can be taken care
of by the following rules:

-- A ULP response to a ULP request implies that all
prior ULP requests have been processed, even if
they did not warrant an explicit response.

-- A ULP response restores credits for itself and
for any other "phantom" responses that it implies.

-- If a ULP needs to send a sequence of untagged
messages that will not be acknowledge which will
drain the credits, it needs to insert an untagged
message that will be acknowledge. Any form of
echoed NOP or Ping could be used.








Caitlin Bestler - cait@asomi.com - http://asomi.com/

Follow-Ups:
- Re: iSCSI/iWARP drafts and flow control
  - From: "Mallikarjun C." <cbm@rose.hp.com>
- Re: iSCSI/iWARP drafts and flow control
  - From: Caitlin Bestler <cait@asomi.com>

Prev by Date: Re:iSCSI/iWARP drafts and flow control
Next by Date: Re: iSCSI/iWARP drafts and flow control
Prev by thread: Re: Boot through the SAN disk ?
Next by thread: Re: iSCSI/iWARP drafts and flow control
Index(es):
- Date
- Thread

Home

Last updated: Tue Aug 05 12:46:09 2003
12771 messages in chronological order