SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: ISCSI: flow control



    Mike,
    
    You will find a similar flow control proposal in the current fc-sctp-ip
    draft.
    http://www.ietf.org/internet-drafts/draft-otis-fc-sctp-ip-01.txt
    This proposal assumes the SCTP<->FC interface will receive credits and relay
    them as part of the general header.  In other words, within every response,
    credit may be updated for the individual connector that terminates each
    stream.  The SCTP agent may also act to accommodate additional credit owing
    to interconnecting FIFOs to these connectors to assist in allowing greater
    network distance than normally supported by FC.  Should the end point not be
    a connector but rather a port to a controller, then fabric login and credits
    would be handled directly by the SCTP agent and additional credits may be
    retracted and made available to other ports upon acknowledgement of the
    retraction.
    
    The proposal advisory was included to illustrate use of the FC header to
    allow a comparison to that of alternative proposals.  You will find it
    relatively easy to implement.  As FC starts out with limited credit due to
    expected low latency, a means to extend credit beyond login is essential.
    An Ordered Set Sequence or credit message can be sent without affecting
    credit as well.  Once flow-control is offered, the FC structures do not
    require change as these structures become independent of transport speed.
    Each stream would originate as an initiator.  The use of CRC is optional for
    native IP access and the entire non-FC frame information is placed within a
    prefix for easy firmware manipulation.  The intent of this proposal was to
    strike a balance between bridge-only encapsulation and native access.
    
    The solutions found for FC access to controllers or devices remain unchanged
    by this proposal.  The communication structures are defined by existing
    standards which also remain unchanged.  All streams, ports and connectors
    are able to share a common flow.  There would be no blocking or credit
    uncertainty as credit would remain defined as a FC frame carried as an SCTP
    chunk.
    
    Doug
    
    
    
    > -----Original Message-----
    > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
    > Michael Krause
    > Sent: Tuesday, September 26, 2000 3:54 PM
    > To: Black_David@emc.com
    > Cc: ips@ece.cmu.edu
    > Subject: RE: ISCSI: flow control
    >
    >
    > At 11:05 AM 9/25/00 -0400, Black_David@emc.com wrote:
    > >Mike,
    > >
    > > > In essence, this is what InfiniBand does and others have been
    > > > advocating.  When the ACK (SCSI response) is returned it
    > encodes a credit
    > > > to inform the sender of how many receives buffers (available
    > command queue
    > >
    > > > slots) have been posted.
    > >
    > >Could you post, or provide a pointer to a self-contained specification of
    > >that
    > >mechanism?  If this is a pointer to InfiniBand specs, a heads-up on any
    > >intellectual property issues is in order.
    >
    > The V1.0 InfiniBand spec is about to be made public and as such, I would
    > refer people to it to understand the specifics of that architecture.  The
    > problem being addressed in InfiniBand and here is rather generic
    > in nature
    > - how to avoid overflowing a receive queue using a credit
    > scheme.  InfiniBand's scheme is unique in terms of the specifics
    > (encoding,
    > ACK message formats, etc.) to it but the essence is the same.
    > I'll try to
    > paraphrase the scheme here in more general purpose terms - if a RFC draft
    > is required, let me know.
    >
    > This credit scheme is implemented as follows:
    >
    > (1) Responder encodes a N-bit credit within the ACK (iSCSI response)
    > message.  Credits are absolute values, i.e. one "snapshots" and
    > encodes the
    > current responder's credit value to return in the ACK message.  If the
    > endnode does not support credits the requester shall assume an infinite
    > value.
    >
    > (2) Credits are on a per connection or per session basis.  Simplicity
    > favors the per connection basis but if the session layer is load
    > balancing
    > commands across multiple connections and given the completion processing
    > and resource management for commands is at the session layer, it
    > may not be
    > a performance / implementation inhibitor to implement this within the
    > session layer itself.  In general, this can be implemented across
    > multiple
    > ports or multiple NICs, entirely in software or hardware or a mix with
    > minimal overhead.
    >
    > (3) Requester maintains a current credit count and decrements this value
    > for each outstanding request.  When new credit is received, the requester
    > updates its credit window and determines whether new requests may be
    > injected into the network.
    >
    > (4) If a requester does not receive any credits for a period of time and
    > there are no outstanding requests, it may probe the responder by
    > issuing a
    > single request.  The responder may respond with a RNR NAK or an
    > ACK with a
    > credit update.  This prevents deadlock.  Ideally, one would allow an
    > unsolicited ACK to be sent by the responder when new credit arrives and
    > there are no outstanding requests being processed.  The advantage for
    > unsolicited ACKs is simplicity - the requester never generates an
    > operation
    > without credit and the responder only returns credit thus making the
    > implementation simpler for both sides.
    >
    > (5) Responder increments its credit value each time a receive
    > descriptor /
    > command queue element is posted / available.  Again this value may be per
    > connection or per session depending upon the resource / coherency
    > strategy
    > pursued.
    >
    > (6) To support long-distance implementation, one would like to
    > stretch the
    > number of credits under the assumption that a number of responses
    > are also
    > in-flight at a given time.  If this is implemented, then a RNR
    > NAK / QUEUE
    > FULL algorithm is needed as is an unsolicited ACK / grant credit
    > message.  An implementation would need to understand the dynamic rate of
    > commands completions and perform optimistic calculations for what this
    > stretched "credit" window is.  When it receives a RNR NAK / QUEUE FULL
    > message, it would reduce the injection rate by a moderate amount (avoid
    > large oscillations) - some modeling would be needed to understand
    > what this
    > reduction would be.
    >
    > (7) Requester's can transmit requests that do not consume responder
    > resources, e.g. RDMA READ, RDMA WRITE without immediate data, etc.
    >
    > >A concern that has been raised in this discussion is how credit
    > >information relates to the concurrency and ordering (esp. lack
    > thereof) of
    > >transmission and processing of SCSI commands and the transmission of
    > >responses.  My understanding of the FCP approach to buffer
    > management (and
    > >I assume InfiniBand is similar) is that traffic cannot be sent
    > unless the
    > >sender knows that there is space in the receiver's buffer to accommodate
    > >it (i.e., the sender has a credit or credits indicating space in the
    > >receiver's buffer).
    >
    > In general, this is correct for InfiniBand - one cannot initiate a SEND
    > operation unless credit is available.  It should be kept in mind that
    > InfiniBand was designed for the data center, i.e. 300 meters for a given
    > link instance.  As such, some optimizations were made that may not be
    > acceptable w.r.t. this workgroup's focus.
    >
    > >This implies is that if for some reason the receiver stalled, all the
    > >in-flight commands and data could be successfully received.  In
    > contrast,
    > >I've seen discussion on this list of long distance connections in which
    > >there is potentially more traffic in flight than the receiver could
    > >accommodate if the receiver stopped.  I believe that whether to
    > allow this
    > >is an open issue, but the underlying cause is valid - there is a
    > desire to
    > >use iSCSI in situations where the initiator to target coupling is looser
    > >(in this case, due to distance) than is typical for SCSI and
    > Fibre Channel.
    >
    >
    > Mike
    >
    
    


Home

Last updated: Tue Sep 04 01:07:03 2001
6315 messages in chronological order