Re: iSCSI/iWARP drafts and flow control

To: <pat_thaler@agilent.com>
Subject: Re: iSCSI/iWARP drafts and flow control
From: Caitlin Bestler <cait@asomi.com>
Date: Thu, 31 Jul 2003 15:25:15 -0500
Cc: <cbm@rose.hp.com>, <ips@ece.cmu.edu>, <rddp@ietf.org>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed
Delivered-To: ips-outgoing@sos.ece.cmu.edu
Delivered-To: ips-outgoing@ece.cmu.edu
Delivered-To: ips@sos.ece.cmu.edu
Delivered-To: ips@ece.cmu.edu
In-Reply-To: <1BEBA5E8600DD4119A50009027AF54A0132E0465@axcs04.cos.agilent.com>
Sender: owner-ips@ece.cmu.edu

On Thursday, July 31, 2003, at 02:42 PM, <pat_thaler@agilent.com> wrote:

> All, sorry for the empty reply - I'm not sure how that happened.
>
> Caitlin,
>
> You asked some questions about how the other messages are flow 
> controlled in iSCSI over TCP. The answer is that they aren't flow 
> controlled. If iSCSI gets a PDU it cannot handle, it drops it and 
> there are provisions to trigger it to be resent depending on the kind 
> of recovery level supported. The only control for PDUs to the target 
> is on non-immediate commands (both SCSI Command and Task Management 
> Function Requeset PDUs). Note that when unsolicited non-immediate data 
> is permitted, iSCSI allows the command to generate a command PDU plus 
> an unknown number of SCSI Data-out PDUs to carry the unsolicted data. 
> For iSER, we require that the unsolicted SCSI Data-out PDUs be full 
> when there is enough unsolicted data to fill them (and we created a 
> key to negotiate that size). Therefore, when operating over iSER the 
> target does know the maximum number of PDUs that the initiator might 
> send per SCSI command.
>
> There is no deadlock in existing iSCSI because there is no flow 
> control on NOP-In and the target can always send a NOP-In to advance 
> MaxCmdSN.
>
> To summarize, in current iSCSI, each opening in CmdSN window allows 
> from 1 to ? PDUs while in iSCSI over iSER, each opening in CmdSN 
> window allows from 1 to n PDUs where n is the amount of unsolicited 
> data divided by data per PDU (rounded up of course).

On the contrary, existing iSCSI has buffer flow
control. It runs over TCP.

The receiving TCP stack declares a buffer window which
the sending TCP MUST comply with. (And it SHOULD have
enough buffers to match its promises, but that's a
separate issue, a TCP stack can under-provision for
the same reasons that the ULP finds it valuable).

Even if a TCP segment will be recognized by a
rototilled receiver, and its payload placed directly
into a user buffer, the sending TCP is still flow
controlled by the buffer window.

The TCP window advertisement is not conditional. It is
"I will accept N bytes". Not "n bytes as long as 90%
of them can be directly placed.". This results in
head-of-line blocking. A limited supply of general
purpose buffering can prevent messages from being sent
that would have bypassed those buffers.

In order to allow DDP to be implemented efficiently,
it must be able to assume that it will be able to
place data as soon as it accepts a segment/chunk from
the LLP for placement. The DDP layer does not do
buffering.

In order for this to work, the role of SCTP/TCP buffer
windows MUST be replaced by ULP flow control. SCTP/TCP
buffer windows are designed to ensure that there is a
place to accept each received buffer (and to slow down
the sender so that this condition can be maintained).

Tagged messages have a valid target, or the stream is
terminated. There is no condition where a valid tagged
message will lack a target buffer.

Untagged messages, however consume resources. Without
flow control the sender can send messages which will
not have a buffer to receive them. A reliable protocol
prevents this with flow control. The only change from
iSCSI directly over TCP and iSER is that this  flow
control has been refined to avoid false head-of-line
blocking. But doing that requires shifting the
mechanics of the flow control from the LLP to the ULP.

There is no reason for iSER flow control to stall
transmission of any untagged message that would not
have been stall by SCTP/TCP buffer windows. In fact,
it should be able to avoid false blocking.

If iSCSI really required a command to be sent *now*,
it would not work over TCP. Since it does, there is
obviously a solution where the iSER layer would on
occassion stall an untagged message on the transmit
side.

Your analysis consistently focuses on the receive
side. Flow control is not about the receive side, it
is about limiting transmit side based upon feedback
from the receive side.

What has to be done is to accept that constraint, and
then determine the most efficient form of feedback
available. It works over TCP, which is fairly crude in
terms of feedback. Therefore a solution is possible.

If you do not want to rely upon implicit buffer freeing,
a simple flag could request an explicit ack. It would
only be required under special circumstances. If it
were required more often then the whole idea that
this could have been estimated on the receiving side
would be suspect. So far, I haven't questioned that
receiver estimation would not work most of the time
-- just that doing so is not flow control. It is not
a reliable protocol, which means that in the *long run*
it will not be robust. Unreliable protocols can be
made to work quite well, with amazingly few drops
and high performance -- until somebody changes
one end radically and/or the network topology.
Reliable protocols are supposed to prevent that.

> Note also that the CmdSN window is across a session.
> If you have connections in a session that are running
> over separate RNICs and are using CmdSN for flow control,
> each RNIC will have to have access to enough buffers
> for the whole window to land on it.
>

This is a valid reason why the credits cannot always
be enforced by the DDP layer. I have already agreed
that the DDP layer cannot enforce credit limits if it
does not know them, and that there are specialized
cases where the ULP would not find it
desirable/convenient to share this information.

But the *existence* of a limit is independent of
whether the receiving DDP is involved in its
enforcement. The critical factor is that the Data
Source ULP is aware of the limit.

> Between these two factors, CmdSN flow control will require over
> provisioning buffers much of the time. Perhaps memory is cheap
> enough that for an RNIC with a small number of connections this
> is acceptable in exchange for using an existing mechanism. On the
> other hand, we will have to create a mechanism to handle immediate
> commands and other PDUs that aren't covered by CmdSN so it isn't
> clear to me whether this is the right answer. The downside is
> overprovisioning buffers because of sessions spanning adapters and
> because each command might be a write with unsolicited data but many
> commands are reads. The upside is that CmdSN window can be managed
> to respond to changes in load while one has a less responsive simple
> mechanism to deal with the rest of the traffic.
>

Just as with TCP/SCTP, actual provisioning of buffers
is independent of the advertised flow control. With
the caveat that the advertised flow control is
expected to be reasonably reliable. But buffers can be
under-provisioned with amazing accurately at any
protocol layer.

> What isn't flow controlled by iSCSI:
> initiator to target:
> immediate command PDUs - existing iSCSI allows for the target to
> drop these if it gets more than it can handle and the initiator
> can only count on buffering for two, but the initiator can send
> more than that and hope the target has buffering. One can't count
> on how many of these there might be.

An iSCSI can drop these under a properly flow
controlled iSER as well. But it has to receive the
requests first. Are they being delivered over a
reliable protocol or not?

Deciding to "drop" a command at the ULP layer just
means that the buffer is returned to the pool quickly.
It does not mean that there didn't need to be a buffer
to receive the command.
>
> Is there a mechanism to disable flow control when the
> receiver doesn't require it, e.g. large shared buffer
> pool with statistical provisioning?
>
That would be an argument for allowing a session to
explicitly negotiate these "extraneous" credits.
If you have a large shared buffer, simply grant
more credits.

References:
- RE: iSCSI/iWARP drafts and flow control
  - From: <pat_thaler@agilent.com>

Prev by Date: RE: iSCSI/iWARP drafts and flow control
Next by Date: Re: [rddp] Re: iSCSI/iWARP drafts and flow control
Prev by thread: RE: iSCSI/iWARP drafts and flow control
Next by thread: Re: iSCSI/iWARP drafts and flow control
Index(es):
- Date
- Thread

Home

Last updated: Thu Aug 07 14:19:22 2003
12787 messages in chronological order