[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI/iWARP drafts and flow control

    On Thursday, July 31, 2003, at 02:42 PM, <> wrote:
    > All, sorry for the empty reply - I'm not sure how that happened.
    > Caitlin,
    > You asked some questions about how the other messages are flow 
    > controlled in iSCSI over TCP. The answer is that they aren't flow 
    > controlled. If iSCSI gets a PDU it cannot handle, it drops it and 
    > there are provisions to trigger it to be resent depending on the kind 
    > of recovery level supported. The only control for PDUs to the target 
    > is on non-immediate commands (both SCSI Command and Task Management 
    > Function Requeset PDUs). Note that when unsolicited non-immediate data 
    > is permitted, iSCSI allows the command to generate a command PDU plus 
    > an unknown number of SCSI Data-out PDUs to carry the unsolicted data. 
    > For iSER, we require that the unsolicted SCSI Data-out PDUs be full 
    > when there is enough unsolicted data to fill them (and we created a 
    > key to negotiate that size). Therefore, when operating over iSER the 
    > target does know the maximum number of PDUs that the initiator might 
    > send per SCSI command.
    > There is no deadlock in existing iSCSI because there is no flow 
    > control on NOP-In and the target can always send a NOP-In to advance 
    > MaxCmdSN.
    > To summarize, in current iSCSI, each opening in CmdSN window allows 
    > from 1 to ? PDUs while in iSCSI over iSER, each opening in CmdSN 
    > window allows from 1 to n PDUs where n is the amount of unsolicited 
    > data divided by data per PDU (rounded up of course).
    On the contrary, existing iSCSI has buffer flow
    control. It runs over TCP.
    The receiving TCP stack declares a buffer window which
    the sending TCP MUST comply with. (And it SHOULD have
    enough buffers to match its promises, but that's a
    separate issue, a TCP stack can under-provision for
    the same reasons that the ULP finds it valuable).
    Even if a TCP segment will be recognized by a
    rototilled receiver, and its payload placed directly
    into a user buffer, the sending TCP is still flow
    controlled by the buffer window.
    The TCP window advertisement is not conditional. It is
    "I will accept N bytes". Not "n bytes as long as 90%
    of them can be directly placed.". This results in
    head-of-line blocking. A limited supply of general
    purpose buffering can prevent messages from being sent
    that would have bypassed those buffers.
    In order to allow DDP to be implemented efficiently,
    it must be able to assume that it will be able to
    place data as soon as it accepts a segment/chunk from
    the LLP for placement. The DDP layer does not do
    In order for this to work, the role of SCTP/TCP buffer
    windows MUST be replaced by ULP flow control. SCTP/TCP
    buffer windows are designed to ensure that there is a
    place to accept each received buffer (and to slow down
    the sender so that this condition can be maintained).
    Tagged messages have a valid target, or the stream is
    terminated. There is no condition where a valid tagged
    message will lack a target buffer.
    Untagged messages, however consume resources. Without
    flow control the sender can send messages which will
    not have a buffer to receive them. A reliable protocol
    prevents this with flow control. The only change from
    iSCSI directly over TCP and iSER is that this  flow
    control has been refined to avoid false head-of-line
    blocking. But doing that requires shifting the
    mechanics of the flow control from the LLP to the ULP.
    There is no reason for iSER flow control to stall
    transmission of any untagged message that would not
    have been stall by SCTP/TCP buffer windows. In fact,
    it should be able to avoid false blocking.
    If iSCSI really required a command to be sent *now*,
    it would not work over TCP. Since it does, there is
    obviously a solution where the iSER layer would on
    occassion stall an untagged message on the transmit
    Your analysis consistently focuses on the receive
    side. Flow control is not about the receive side, it
    is about limiting transmit side based upon feedback
    from the receive side.
    What has to be done is to accept that constraint, and
    then determine the most efficient form of feedback
    available. It works over TCP, which is fairly crude in
    terms of feedback. Therefore a solution is possible.
    If you do not want to rely upon implicit buffer freeing,
    a simple flag could request an explicit ack. It would
    only be required under special circumstances. If it
    were required more often then the whole idea that
    this could have been estimated on the receiving side
    would be suspect. So far, I haven't questioned that
    receiver estimation would not work most of the time
    -- just that doing so is not flow control. It is not
    a reliable protocol, which means that in the *long run*
    it will not be robust. Unreliable protocols can be
    made to work quite well, with amazingly few drops
    and high performance -- until somebody changes
    one end radically and/or the network topology.
    Reliable protocols are supposed to prevent that.
    > Note also that the CmdSN window is across a session.
    > If you have connections in a session that are running
    > over separate RNICs and are using CmdSN for flow control,
    > each RNIC will have to have access to enough buffers
    > for the whole window to land on it.
    This is a valid reason why the credits cannot always
    be enforced by the DDP layer. I have already agreed
    that the DDP layer cannot enforce credit limits if it
    does not know them, and that there are specialized
    cases where the ULP would not find it
    desirable/convenient to share this information.
    But the *existence* of a limit is independent of
    whether the receiving DDP is involved in its
    enforcement. The critical factor is that the Data
    Source ULP is aware of the limit.
    > Between these two factors, CmdSN flow control will require over
    > provisioning buffers much of the time. Perhaps memory is cheap
    > enough that for an RNIC with a small number of connections this
    > is acceptable in exchange for using an existing mechanism. On the
    > other hand, we will have to create a mechanism to handle immediate
    > commands and other PDUs that aren't covered by CmdSN so it isn't
    > clear to me whether this is the right answer. The downside is
    > overprovisioning buffers because of sessions spanning adapters and
    > because each command might be a write with unsolicited data but many
    > commands are reads. The upside is that CmdSN window can be managed
    > to respond to changes in load while one has a less responsive simple
    > mechanism to deal with the rest of the traffic.
    Just as with TCP/SCTP, actual provisioning of buffers
    is independent of the advertised flow control. With
    the caveat that the advertised flow control is
    expected to be reasonably reliable. But buffers can be
    under-provisioned with amazing accurately at any
    protocol layer.
    > What isn't flow controlled by iSCSI:
    > initiator to target:
    > immediate command PDUs - existing iSCSI allows for the target to
    > drop these if it gets more than it can handle and the initiator
    > can only count on buffering for two, but the initiator can send
    > more than that and hope the target has buffering. One can't count
    > on how many of these there might be.
    An iSCSI can drop these under a properly flow
    controlled iSER as well. But it has to receive the
    requests first. Are they being delivered over a
    reliable protocol or not?
    Deciding to "drop" a command at the ULP layer just
    means that the buffer is returned to the pool quickly.
    It does not mean that there didn't need to be a buffer
    to receive the command.
    > Is there a mechanism to disable flow control when the
    > receiver doesn't require it, e.g. large shared buffer
    > pool with statistical provisioning?
    That would be an argument for allowing a session to
    explicitly negotiate these "extraneous" credits.
    If you have a large shared buffer, simply grant
    more credits.


Last updated: Thu Aug 07 14:19:22 2003
12787 messages in chronological order