Re: iSCSI/iWARP drafts and flow control

To: "Mallikarjun C." <cbm@rose.hp.com>
Subject: Re: iSCSI/iWARP drafts and flow control
From: Caitlin Bestler <cait@asomi.com>
Date: Wed, 30 Jul 2003 11:07:22 -0500
Cc: <ips@ece.cmu.edu>, rddp@ietf.org
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed
Delivered-To: ips-outgoing@sos.ece.cmu.edu
Delivered-To: ips-outgoing@ece.cmu.edu
Delivered-To: ips@sos.ece.cmu.edu
Delivered-To: ips@ece.cmu.edu
In-Reply-To: <016301c3563e$0bf013b0$18a4f40f@rose.hp.com>
Sender: owner-ips@ece.cmu.edu

On Tuesday, July 29, 2003, at 08:58 PM, Mallikarjun C. wrote:

> As Mike points out, the CmdSN-based flow control
> in iSCSI is relevant here.  Let me note that the design
> team behind the current iSER draft considered this topic
> in great detail, but I can now clearly see that the draft
> unfortunately does not capture the design rationale very well.
>

It could be clearer. But even if it were clearer, it would
not change the fact that it fails to provide ULP-level
flow control for untagged messages.

The requirement here is that the ULP provide *flow control*
for untagged messages. Control means that the Data Source
either has permission to send an untagged message, or it
does not. There is an identical flow control issue for
RDMA Reads. You either are allowed to send one or you
are not.

If you are allowed to send an untagged message, you have
an expectation that the other side has the resources to
handle it. Bugs, under-provisioning and hardware faults
are all facts of life. So robust applications are prepared
to deal with faults. But faults reflect a *failure*.

If an untagged message is allowed, then the Data Source
has ever reason to expect that the Data Sink will handle
it properly. Failure to do so if a fault on the Data Sink.

If an untagged message is not allowed, then the Data Source
had no right to send it. It cannot complain if the Data
Sink terminates the stream. In this case the Data Source
is the one committing the fault.

> iSCSI does not provide a PDU-level positive flow control
> but instead relies on the CmdSN feature, from which most
> of the iSCSI (what DA/iSER call as the) "control-type" PDU traffic
> can be precisely estimated (note that only control-type PDUs
> are candidates for Send Messages and thus relevant to this
> discussion).  However, it turns out that there are certain
> opcode types that are used very rarely that are not governed
> by the CmdSN-based flow control - immediate commands,
> SNACK, unsolicited NOP-In, Reject, and Async Messages.

There is no requirement that there be an explicit wire-level
protocol. Merely that the ULP establishes a mechanism by
which the sender knows whether it can send a given untagged
message.

iSCSI CmdSN flow control already provides this flow control
for most iSER packets. So the only issue is establishing
rules for the remaining packets.

Controlling the flow of *most* packets is *not* flow control.
It is somewhat akin to having a strictly balanced budget
except for these three funds which are unrestricted.

>
>
> Note that the above does not include the unsolicited Data-out
> PDUs since the worst case number of these is precisely known from
> CmdSN, but the worst case buffer provisioning for these would
> be both unnecessary and extremely expensive in reality.
>

Under-provisioning of buffers is a local issue, with the caveat
that doing it improperly is a fault on the Data Sink's part. There
can also be faults from exhaustion of CPU power, hardware faults
and plain old software errors.The server is obviously expected to
keep these to a minimum.

The key distinction that must be made is between granting credits,
providing buffers and matching buffers.

The classic simple ordered Receive Queue is the one interface that
I believe everyone agrees must be supported. With it the Data Sink
ULP posts a receive buffer, and thereby grants a credit and pre-assigns
a buffer to the QN/MSN.

The Shared Receive Queue (proposed in draft-hilland) shares both
buffers and credits across a pool. Buffers are assigned to the
QN/MSN on an as needed basis. The implementation has an option of
filling in buffers for the gap when a high MSN is received, otherwise
the buffer is allocated when a portion of it is first received.
Credits are consumed when buffers are allocated.

Note that Shared Receive Queues only apply to the RDMA Send queue,
the RDMA Read queue is not documented, but given that a fixed limit
is configured would presumably be a simple ordered Receive Queue.

Shared Buffer Pools place buffers in a pool, but assign credits on
a per stream basis. If an MSN exceeds the range implied by the credits
it is rejected as invalid whether there is a buffer available or not.

iSER seems to call for the ability to pool credits across all streams
in a session. But it would not necessarily be the same set of streams
that you would want to share buffers over. There could be advantages
of pooling buffers between sessions, while still tracking credits on
a per session basis.

In any event, these are all *local* questions. The only *wire* question
is whether the Data Source can know whether or not it is legal for it
to send a given untagged message.

Stating that for message types "x" it is legal as long as the Data
Source thinks it has a reason to send "x" is NOT flow control.

For each of the "exceptional" types, what is required that a rule be
derived on how many of them can be outstanding, and how the sender
knows when they are no longer outstanding.

If, as claimed, it is a trivial matter for the Data Sink to make
these calculations, then it should be easy to enumerate these rules.

> The iSER design team thus believed that most storage implementations
> will use buffer pools to deal with this reality (as they have always
> been), and the rare "fringe" opcode types mentioned above could
> easily be dealt with in the statistical provisioning scheme of things, 
> being
> so rare and infrequent.

It is totally incorrect for an Upper Layer Protocol to be designed with
presumptions as to implementation of the lower layers. If you believe
buffer pools are required for the correct functioning of an application
using iWARP then you should be arguing for that change to iWARP.

Otherwise, the Upper Layer Protocol must be defined so as to rely upon
the published protocol and nothing else.

iWARP requires the ULP to take responsibility for flow control of
untagged messages. Period.

> Despite this belief (in fact, even before we are convinced of this 
> approach),
> we did a diligent analysis of a Send Message flow control protocol for 
> iSER
> - the ultimate conclusion was that it's way too much overhead to run 
> this
> protocol, it's slow-to-respond to changing I/O loads, reclaiming of 
> credits
> is a burdensome process, requires RTT delays to announce new credits 
> etc.
>

That is based upon the assumption that iSER flow control requires
iSER flow control messages. This is not a requirement. A requirement
that the Data Source MUST NOT submit more than one connection 
termination
notice upon any given connection would fully flow control that type of
message -- with no wire protocol messages being exchanged.

> I believe the approach adopted in the current iSER draft is 
> appropriate,
> we do however need to polish the flow control discussion to include
> some of the design rationale.

Rationale are not constraints upon the sender of untagged
messages. Flow *control*, by definition, is a constraint on the
sender. The constraint does not have to take the form of dynamically
exchanged messages, or even per-session negotiated limits. But it
does require that a limit be unambiguously identified.

Otherwise it is not flow *control*.

Again, this has nothing to do with how many buffers the Data Sink
must provision and when. Dynamic binding of buffers is a totally
valid strategy, especially if the Data Sink has "low water mark"
warnings and processes responsible for responding to those alarms
to restock the buffer pool.

The point is that failure to provide true flow control *requires*
that *all* implementations build such an infrastructure. It is
taking a feature that is desirable for high volume servers and
making it a de facto requirement for *all* servers. Even those
who only intend to support a single client.

Caitlin Bestler - cait@asomi.com - http://asomi.com/

Follow-Ups:
- Re: iSCSI/iWARP drafts and flow control
  - From: "Mallikarjun C." <cbm@rose.hp.com>

References:
- Re: iSCSI/iWARP drafts and flow control
  - From: "Mallikarjun C." <cbm@rose.hp.com>

Prev by Date: iSCSI NAA naming: next steps
Next by Date: Re: iSCSI/iWARP drafts and flow control
Prev by thread: Re: iSCSI/iWARP drafts and flow control
Next by thread: Re: iSCSI/iWARP drafts and flow control
Index(es):
- Date
- Thread

Home

Last updated: Tue Aug 05 12:46:08 2003
12771 messages in chronological order