[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re:iSCSI/iWARP drafts and flow control

    The proposed mapping of iSCSI onto iWARP offers an
    inadequate solution to the problem of flow control.
    iWARP shifts responsibility for flow control to the ULP. In
    doing so, it allows ULP-specific pacing based upon number
    of requests-in-flight rather than relying the bottleneck of
    transport buffering to flow control the application. The
    session is no longer throttled by the availability of
    buffers suitable for any message. This topic is covered in
    section 4.5 of the RDMAP/DDP Applicability statement
    There are two excellent examples of ULP solutions to pacing
    untagged messages: DAFS and the mapping of RPC over iWARP
    for NFS. The latter offers the following section on flow
    3.3.  Flow Control
        It is critical to provide flow control for an RDMA
        connection.  RDMA receive operations will fail if a
        pre-posted receive buffer is not available to accept
        an incoming RDMA Send.  Such errors are fatal to the
        connection. This is a departure from conventional
        TCP/IP networking where buffers are allocated
        dynamically on an as-needed basis, and pre-posting is
        not required.
        It is not practical to provide for fixed credit limits
        at the RPC server.  Fixed limits scale poorly, since
        posted buffers are dedicated to the associated
        connection until consumed by receive operations. 
        Additionally for protocol correctness, the server must
        be able to reply whether or not a new buffer can be
        posted to accept future receives.
        Flow control is implemented as a simple request/grant
        protocol in the transport header associated with each
        RPC message.  The transport header for RPC CALL
        messages contains a requested credit value for the
        server, which may be dynamically adjusted by the
        caller to match its expected needs.  The transport
        header for the RPC REPLY messages provide the granted
        result, which may have any value except it may not be
        zero when no in-progress operations are present at the
        server, since such a value would result in deadlock. 
        The value may be adjusted up or down at each
        opportunity to match the server's needs or policies.
        While RPC CALLs may complete in any order, the current
        flow control limit at the RPC server is known to the
        RPC client from the Send ordering properties.  It is
        always the most recent server granted credits minus
        the number of requests in flight.
    I believe this is quite a contrast with the iSCSI/iWARP proposal:
    10.1 Flow Control for RDMA Send Message Types 
        RDMAP Send Message Types are used by the iSER Layer to
        transfer iSCSI control-type PDUs.  Each RDMAP Send
        Message Type consumes an Untagged Buffer at the Data
        Sink.  However, neither the RDMAP layer nor the iSER
        Layer provides an explicit flow control mechanism for
        the RDMAP Send Message Types.  Therefore, the iSER
        Layer SHOULD provision enough Untagged buffers for
        handling incoming RDMAP Send Message Types to prevent
        a buffer underrun condition at the RDMAP layer. If a
        buffer underrun happens, it may result in the
        termination of the connection.  An implementation may
        choose to satisfy this requirement by using a common
        buffer pool shared across multiple connections, with
        usage limits on a per connection basis and usage
        limits on the buffer pool itself.  In such an
        implementation, exceeding the buffer usage limit for a
        connection or the buffer pool itself may trigger
        interventions from the iSER Layer to replenish the
        buffer pool and/or to isolate the connection causing
        the problem.
    Stating that the iSER Layer "SHOULD" provision enough
    Untagged buffers is an interesting use of the IETF
    "SHOULD". Implementations are *guaranteed* to have a
    valid reason to break the "SHOULD", they do not have
    enough information to comply. The Upper Layer Protocol
    has failed to provide it.
    How is the target supposed to estimate how many
    untagged messages the initiator will presume it is
    capable of handling? Or vise versa? How? Provision
    enough buffers to match your physical line rate under
    the worst case scenarios? Even if you're an economy
    model? Guess? Keep a table by model number? Limit
    yourself to one untagged message in flight? Even if
    you are supposed to be a high performance model?
    Keep trying until you crash the connection?
    True interoperability is not based upon tweaking or
    fine-tuning to match the peers. Peers work together
    because the protocol has enabled any peer to work
    with any other compliant peer. Period. Guestimating
    has nothing to do with it.
    Fortunately, establishing a credit protocol that is
    compatible with normal iSCSI interactions is easily
    done. Generically an RDMA-capable ULP flow control
    strategy requires three things:
    1) An initial credit level. This can be established
       during connection/stream establishment just as
       is proposed for RDMA Read Credits.
    2) A credit is consumed for each untagged message
       sent, exactly as sending each RDMA Read Request
       consumes an RDMA Read credit.
    3) The ULP reply restores credits. With RDMA Reads
       this is a simple one-to-one process. DAFS also
       uses has each reply replenish the credit that
       the request it is responding to drained. The
       NFS/RPC protocol allows the RPC layer to
       explicitly vary the number of credits
       restored in each untagged message.
    The only special requirement that I can see is that
    there may be a sequence of untagged messages that are
    not individually acknowledged. That can be taken care
    of by the following rules:
    -- A ULP response to a ULP request implies that all
       prior ULP requests have been processed, even if
       they did not warrant an explicit response.
    -- A ULP response restores credits for itself and
       for any other "phantom" responses that it implies.
    -- If a ULP needs to send a sequence of untagged
       messages that will not be acknowledge which will
       drain the credits, it needs to insert an untagged
       message that will be acknowledge. Any form of
       echoed NOP or Ping could be used.
    Caitlin Bestler - -


Last updated: Tue Aug 05 12:46:10 2003
12771 messages in chronological order