Re: iSCSI/iWARP drafts and flow control

To: <ips@ece.cmu.edu>, <rddp@ietf.org>
Subject: Re: iSCSI/iWARP drafts and flow control
From: "Mallikarjun C." <cbm@rose.hp.com>
Date: Wed, 30 Jul 2003 17:49:44 -0700
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Delivered-To: ips-outgoing@sos.ece.cmu.edu
Delivered-To: ips-outgoing@ece.cmu.edu
Delivered-To: ips@sos.ece.cmu.edu
Delivered-To: ips@ece.cmu.edu
References: <E7065CC0-C2A7-11D7-81E7-003065D48EE0@asomi.com>
Sender: owner-ips@ece.cmu.edu

Responding to Caitlin's two messages.....

First off, thanks for the review.

At a high level, I view Caitlin's comments as stressing
two points -
    a) The iSCSI/iSER peers must each know the precise
         number of control-type messages to expect at any
         given instant.
     b) Either a ULP has *positive* (absolutely deterministic)
         Send Message flow control or it doesn't have any.

I think several iSCSI practitioners (myself included) disagree
with (a) - iSCSI is not designed to allow each iSCSI endnode
to precisely know the number of inbound PDUs it can expect 
(for good reasons).  Now, iSER can make an attempt to "fix" this 
(what the authors considered a) non-issue (for < 1% of the PDUs) 
with expensive protocol.  The iSER designers chose not to do this.

On (b), I find it very hard to agree with this "positive flow control, 
else you fail the test" argument.  I don't know if it might make sense 
for IPC apps, but it doesn't make any sense for storage in my mind -
particularly when the returns are negative in making it "positive".

The iSER-assisted iSCSI *does* provide ULP-level flow control 
for 99% - in volume - of the Untagged messages (control-type 
PDUs).  But yes, it does not currently give you a "positive" flow control. 

Now on to specifics -

>There is an identical flow control issue for
> RDMA Reads.

Not true.  There's a built-in credit renewal in an RDMA Read.
The Peer issuing the RDMA Read knows it can reuse the 
Read credit when it receives an RDMA Read Response.
Send Messages carrying the "fringe" iSCSI PDUs need both
new wire protocol and cross-layer chit-chat within an end-node
between iSCSI and iSER - in order to renew credits.

>Wouldn't the appropriate rule for SNACK messages be "don't"?

As you point out, SNACK isn't useful in an iSCSI/iSER context.
That's why section 9.3.11 says that SNACKs "SHOULD NOT" be
issued.  We felt that a "MUST NOT" is too severe since SNACK
is an iSCSI-to-iSCSI PDU that would be properly dealt with 
at the iSCSI level even when used on an iSCSI/iSER connection.
--
Mallikarjun

Mallikarjun Chadalapaka
Networked Storage Architecture
Network Storage Solutions
Hewlett-Packard MS 5668 
Roseville CA 95747
cbm@rose.hp.com

----- Original Message ----- 
From: "Caitlin Bestler" <cait@asomi.com>
To: "Mallikarjun C." <cbm@rose.hp.com>
Cc: <ips@ece.cmu.edu>; <rddp@ietf.org>
Sent: Wednesday, July 30, 2003 9:07 AM
Subject: Re: iSCSI/iWARP drafts and flow control


> 
> On Tuesday, July 29, 2003, at 08:58 PM, Mallikarjun C. wrote:
> 
> > As Mike points out, the CmdSN-based flow control
> > in iSCSI is relevant here.  Let me note that the design
> > team behind the current iSER draft considered this topic
> > in great detail, but I can now clearly see that the draft
> > unfortunately does not capture the design rationale very well.
> >
> 
> It could be clearer. But even if it were clearer, it would
> not change the fact that it fails to provide ULP-level
> flow control for untagged messages.
> 
> The requirement here is that the ULP provide *flow control*
> for untagged messages. Control means that the Data Source
> either has permission to send an untagged message, or it
> does not. There is an identical flow control issue for
> RDMA Reads. You either are allowed to send one or you
> are not.
> 
> If you are allowed to send an untagged message, you have
> an expectation that the other side has the resources to
> handle it. Bugs, under-provisioning and hardware faults
> are all facts of life. So robust applications are prepared
> to deal with faults. But faults reflect a *failure*.
> 
> If an untagged message is allowed, then the Data Source
> has ever reason to expect that the Data Sink will handle
> it properly. Failure to do so if a fault on the Data Sink.
> 
> If an untagged message is not allowed, then the Data Source
> had no right to send it. It cannot complain if the Data
> Sink terminates the stream. In this case the Data Source
> is the one committing the fault.
> 
> 
> > iSCSI does not provide a PDU-level positive flow control
> > but instead relies on the CmdSN feature, from which most
> > of the iSCSI (what DA/iSER call as the) "control-type" PDU traffic
> > can be precisely estimated (note that only control-type PDUs
> > are candidates for Send Messages and thus relevant to this
> > discussion).  However, it turns out that there are certain
> > opcode types that are used very rarely that are not governed
> > by the CmdSN-based flow control - immediate commands,
> > SNACK, unsolicited NOP-In, Reject, and Async Messages.
> 
> There is no requirement that there be an explicit wire-level
> protocol. Merely that the ULP establishes a mechanism by
> which the sender knows whether it can send a given untagged
> message.
> 
> iSCSI CmdSN flow control already provides this flow control
> for most iSER packets. So the only issue is establishing
> rules for the remaining packets.
> 
> Controlling the flow of *most* packets is *not* flow control.
> It is somewhat akin to having a strictly balanced budget
> except for these three funds which are unrestricted.
> 
> 
> >
> >
> > Note that the above does not include the unsolicited Data-out
> > PDUs since the worst case number of these is precisely known from
> > CmdSN, but the worst case buffer provisioning for these would
> > be both unnecessary and extremely expensive in reality.
> >
> 
> Under-provisioning of buffers is a local issue, with the caveat
> that doing it improperly is a fault on the Data Sink's part. There
> can also be faults from exhaustion of CPU power, hardware faults
> and plain old software errors.The server is obviously expected to
> keep these to a minimum.
> 
> The key distinction that must be made is between granting credits,
> providing buffers and matching buffers.
> 
> The classic simple ordered Receive Queue is the one interface that
> I believe everyone agrees must be supported. With it the Data Sink
> ULP posts a receive buffer, and thereby grants a credit and pre-assigns
> a buffer to the QN/MSN.
> 
> The Shared Receive Queue (proposed in draft-hilland) shares both
> buffers and credits across a pool. Buffers are assigned to the
> QN/MSN on an as needed basis. The implementation has an option of
> filling in buffers for the gap when a high MSN is received, otherwise
> the buffer is allocated when a portion of it is first received.
> Credits are consumed when buffers are allocated.
> 
> Note that Shared Receive Queues only apply to the RDMA Send queue,
> the RDMA Read queue is not documented, but given that a fixed limit
> is configured would presumably be a simple ordered Receive Queue.
> 
> Shared Buffer Pools place buffers in a pool, but assign credits on
> a per stream basis. If an MSN exceeds the range implied by the credits
> it is rejected as invalid whether there is a buffer available or not.
> 
> iSER seems to call for the ability to pool credits across all streams
> in a session. But it would not necessarily be the same set of streams
> that you would want to share buffers over. There could be advantages
> of pooling buffers between sessions, while still tracking credits on
> a per session basis.
> 
> In any event, these are all *local* questions. The only *wire* question
> is whether the Data Source can know whether or not it is legal for it
> to send a given untagged message.
> 
> Stating that for message types "x" it is legal as long as the Data
> Source thinks it has a reason to send "x" is NOT flow control.
> 
> For each of the "exceptional" types, what is required that a rule be
> derived on how many of them can be outstanding, and how the sender
> knows when they are no longer outstanding.
> 
> If, as claimed, it is a trivial matter for the Data Sink to make
> these calculations, then it should be easy to enumerate these rules.
> 
> > The iSER design team thus believed that most storage implementations
> > will use buffer pools to deal with this reality (as they have always
> > been), and the rare "fringe" opcode types mentioned above could
> > easily be dealt with in the statistical provisioning scheme of things, 
> > being
> > so rare and infrequent.
> 
> It is totally incorrect for an Upper Layer Protocol to be designed with
> presumptions as to implementation of the lower layers. If you believe
> buffer pools are required for the correct functioning of an application
> using iWARP then you should be arguing for that change to iWARP.
> 
> Otherwise, the Upper Layer Protocol must be defined so as to rely upon
> the published protocol and nothing else.
> 
> iWARP requires the ULP to take responsibility for flow control of
> untagged messages. Period.
> 
> 
> > Despite this belief (in fact, even before we are convinced of this 
> > approach),
> > we did a diligent analysis of a Send Message flow control protocol for 
> > iSER
> > - the ultimate conclusion was that it's way too much overhead to run 
> > this
> > protocol, it's slow-to-respond to changing I/O loads, reclaiming of 
> > credits
> > is a burdensome process, requires RTT delays to announce new credits 
> > etc.
> >
> 
> That is based upon the assumption that iSER flow control requires
> iSER flow control messages. This is not a requirement. A requirement
> that the Data Source MUST NOT submit more than one connection 
> termination
> notice upon any given connection would fully flow control that type of
> message -- with no wire protocol messages being exchanged.
> 
> 
> > I believe the approach adopted in the current iSER draft is 
> > appropriate,
> > we do however need to polish the flow control discussion to include
> > some of the design rationale.
> 
> Rationale are not constraints upon the sender of untagged
> messages. Flow *control*, by definition, is a constraint on the
> sender. The constraint does not have to take the form of dynamically
> exchanged messages, or even per-session negotiated limits. But it
> does require that a limit be unambiguously identified.
> 
> Otherwise it is not flow *control*.
> 
> Again, this has nothing to do with how many buffers the Data Sink
> must provision and when. Dynamic binding of buffers is a totally
> valid strategy, especially if the Data Sink has "low water mark"
> warnings and processes responsible for responding to those alarms
> to restock the buffer pool.
> 
> The point is that failure to provide true flow control *requires*
> that *all* implementations build such an infrastructure. It is
> taking a feature that is desirable for high volume servers and
> making it a de facto requirement for *all* servers. Even those
> who only intend to support a single client.
> 
> 
> 
> 
> Caitlin Bestler - cait@asomi.com - http://asomi.com/
> 
>

Follow-Ups:
- Re: iSCSI/iWARP drafts and flow control
  - From: Caitlin Bestler <cait@asomi.com>

References:
- Re: iSCSI/iWARP drafts and flow control
  - From: Caitlin Bestler <cait@asomi.com>

Prev by Date: iSCSI boot and naming drafts
Next by Date: Re: iSCSI/iWARP drafts and flow control
Prev by thread: Re: iSCSI/iWARP drafts and flow control
Next by thread: Re: iSCSI/iWARP drafts and flow control
Index(es):
- Date
- Thread

Home

Last updated: Tue Aug 05 12:46:08 2003
12771 messages in chronological order