[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI/iWARP drafts and flow control

    On Tuesday, July 29, 2003, at 08:58 PM, Mallikarjun C. wrote:
    > As Mike points out, the CmdSN-based flow control
    > in iSCSI is relevant here.  Let me note that the design
    > team behind the current iSER draft considered this topic
    > in great detail, but I can now clearly see that the draft
    > unfortunately does not capture the design rationale very well.
    It could be clearer. But even if it were clearer, it would
    not change the fact that it fails to provide ULP-level
    flow control for untagged messages.
    The requirement here is that the ULP provide *flow control*
    for untagged messages. Control means that the Data Source
    either has permission to send an untagged message, or it
    does not. There is an identical flow control issue for
    RDMA Reads. You either are allowed to send one or you
    are not.
    If you are allowed to send an untagged message, you have
    an expectation that the other side has the resources to
    handle it. Bugs, under-provisioning and hardware faults
    are all facts of life. So robust applications are prepared
    to deal with faults. But faults reflect a *failure*.
    If an untagged message is allowed, then the Data Source
    has ever reason to expect that the Data Sink will handle
    it properly. Failure to do so if a fault on the Data Sink.
    If an untagged message is not allowed, then the Data Source
    had no right to send it. It cannot complain if the Data
    Sink terminates the stream. In this case the Data Source
    is the one committing the fault.
    > iSCSI does not provide a PDU-level positive flow control
    > but instead relies on the CmdSN feature, from which most
    > of the iSCSI (what DA/iSER call as the) "control-type" PDU traffic
    > can be precisely estimated (note that only control-type PDUs
    > are candidates for Send Messages and thus relevant to this
    > discussion).  However, it turns out that there are certain
    > opcode types that are used very rarely that are not governed
    > by the CmdSN-based flow control - immediate commands,
    > SNACK, unsolicited NOP-In, Reject, and Async Messages.
    There is no requirement that there be an explicit wire-level
    protocol. Merely that the ULP establishes a mechanism by
    which the sender knows whether it can send a given untagged
    iSCSI CmdSN flow control already provides this flow control
    for most iSER packets. So the only issue is establishing
    rules for the remaining packets.
    Controlling the flow of *most* packets is *not* flow control.
    It is somewhat akin to having a strictly balanced budget
    except for these three funds which are unrestricted.
    > Note that the above does not include the unsolicited Data-out
    > PDUs since the worst case number of these is precisely known from
    > CmdSN, but the worst case buffer provisioning for these would
    > be both unnecessary and extremely expensive in reality.
    Under-provisioning of buffers is a local issue, with the caveat
    that doing it improperly is a fault on the Data Sink's part. There
    can also be faults from exhaustion of CPU power, hardware faults
    and plain old software errors.The server is obviously expected to
    keep these to a minimum.
    The key distinction that must be made is between granting credits,
    providing buffers and matching buffers.
    The classic simple ordered Receive Queue is the one interface that
    I believe everyone agrees must be supported. With it the Data Sink
    ULP posts a receive buffer, and thereby grants a credit and pre-assigns
    a buffer to the QN/MSN.
    The Shared Receive Queue (proposed in draft-hilland) shares both
    buffers and credits across a pool. Buffers are assigned to the
    QN/MSN on an as needed basis. The implementation has an option of
    filling in buffers for the gap when a high MSN is received, otherwise
    the buffer is allocated when a portion of it is first received.
    Credits are consumed when buffers are allocated.
    Note that Shared Receive Queues only apply to the RDMA Send queue,
    the RDMA Read queue is not documented, but given that a fixed limit
    is configured would presumably be a simple ordered Receive Queue.
    Shared Buffer Pools place buffers in a pool, but assign credits on
    a per stream basis. If an MSN exceeds the range implied by the credits
    it is rejected as invalid whether there is a buffer available or not.
    iSER seems to call for the ability to pool credits across all streams
    in a session. But it would not necessarily be the same set of streams
    that you would want to share buffers over. There could be advantages
    of pooling buffers between sessions, while still tracking credits on
    a per session basis.
    In any event, these are all *local* questions. The only *wire* question
    is whether the Data Source can know whether or not it is legal for it
    to send a given untagged message.
    Stating that for message types "x" it is legal as long as the Data
    Source thinks it has a reason to send "x" is NOT flow control.
    For each of the "exceptional" types, what is required that a rule be
    derived on how many of them can be outstanding, and how the sender
    knows when they are no longer outstanding.
    If, as claimed, it is a trivial matter for the Data Sink to make
    these calculations, then it should be easy to enumerate these rules.
    > The iSER design team thus believed that most storage implementations
    > will use buffer pools to deal with this reality (as they have always
    > been), and the rare "fringe" opcode types mentioned above could
    > easily be dealt with in the statistical provisioning scheme of things, 
    > being
    > so rare and infrequent.
    It is totally incorrect for an Upper Layer Protocol to be designed with
    presumptions as to implementation of the lower layers. If you believe
    buffer pools are required for the correct functioning of an application
    using iWARP then you should be arguing for that change to iWARP.
    Otherwise, the Upper Layer Protocol must be defined so as to rely upon
    the published protocol and nothing else.
    iWARP requires the ULP to take responsibility for flow control of
    untagged messages. Period.
    > Despite this belief (in fact, even before we are convinced of this 
    > approach),
    > we did a diligent analysis of a Send Message flow control protocol for 
    > iSER
    > - the ultimate conclusion was that it's way too much overhead to run 
    > this
    > protocol, it's slow-to-respond to changing I/O loads, reclaiming of 
    > credits
    > is a burdensome process, requires RTT delays to announce new credits 
    > etc.
    That is based upon the assumption that iSER flow control requires
    iSER flow control messages. This is not a requirement. A requirement
    that the Data Source MUST NOT submit more than one connection 
    notice upon any given connection would fully flow control that type of
    message -- with no wire protocol messages being exchanged.
    > I believe the approach adopted in the current iSER draft is 
    > appropriate,
    > we do however need to polish the flow control discussion to include
    > some of the design rationale.
    Rationale are not constraints upon the sender of untagged
    messages. Flow *control*, by definition, is a constraint on the
    sender. The constraint does not have to take the form of dynamically
    exchanged messages, or even per-session negotiated limits. But it
    does require that a limit be unambiguously identified.
    Otherwise it is not flow *control*.
    Again, this has nothing to do with how many buffers the Data Sink
    must provision and when. Dynamic binding of buffers is a totally
    valid strategy, especially if the Data Sink has "low water mark"
    warnings and processes responsible for responding to those alarms
    to restock the buffer pool.
    The point is that failure to provide true flow control *requires*
    that *all* implementations build such an infrastructure. It is
    taking a feature that is desirable for high volume servers and
    making it a de facto requirement for *all* servers. Even those
    who only intend to support a single client.
    Caitlin Bestler - -


Last updated: Tue Aug 05 12:46:08 2003
12771 messages in chronological order