SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: Status summary on multiple connections -- iSCSI flow control



    > From: Robert Snively <rsnively@Brocade.COM>
    > To: "'David Robinson'" <David.Robinson@EBay.Sun.COM>, Robert Snively
    > <snip>  When there is no more dynamic space left and all the
    > pre-allocated locations for a particular ITL nexus are also full,
    > the next command gets a queue full indication returned.  Because of
    > the dynamic assignment area, this will typically be rare in a properly
    > configured system.  The initiator then resends the command and all
    > subsequent commands after at least one command comes back completed,
    > indicating that at least one (and probably a whole stack more) slots
    > are again available.  Note that there is a possibility that commands
    > that are inflight and have ordering constraints may be accepted out of
    > order, a question that has caused lots of agonizing, but is apparently
    > reasonably well managed by most file systems today by the selective
    > use of ordering only for blocking boundaries of a particular logical
    > stream of commands.
    
    Bob,
    
    I liked everything you said.  The SCSI protocol itself provides lots of
    mechanisms for a target device to manage its resources.  An initiator must
    commit its resources for all the requests whether they are in queue,
    inflight, or being processed by a target.  However, your description of
    out-of-order execution begs more clarification of retransmission and the
    flow control to minimize it.
    
    While it is true that the file systems ensure all outstanding commands can
    be executed out of order -- because if not careful a target device may
    inadvertently sort the commands to improve performance -- there is no
    out-of-order execution problem in a 1394 or SCSI adapter.  This issue of
    commands inflight and executed out-of-order is new to fibre channel and
    iSCSI only.  Having said that, the retransmission is more important to this
    discussion.
    
    For a SCSI adapter, the queue-full status prevents any new command being
    sent until at least an outstanding one is completed.  There is no command
    inflight on SCSI bus.  On 1394, the target will never fetch another ORB when
    there is no room.  If the fetch fails, a target device has the option to
    retry or terminate.  A 1394 target can prefetch multiple ORBs and decide to
    sort them or execute them in order.
    
    For fibre channel and iSCSI, no flow control can prevent a command being
    executed out of order due to a non-zero probability of having a command not
    arriving at the target.  This is due to busy network, packet CRC error, the
    NIC receive-buffer full, or the TCP window closed.  Although not necessary,
    if one insists, a CmdRN can be placed in the PDU to ensure sequentially.
    Fibre channel uses BB-credit to eliminate the possibility of the adapter
    receive-buffer full.  But, it can not prevent a frame from having a CRC
    error and being dropped by a target.  Only timeout by an initiator will
    detect the loss of a command.  As you have said, out-of-order execution is
    NOT a problem.
    
    The SCSI command can be transmitted by a single frame or PDU.  The
    retransmission is trivial once the loss of frame or PDU is detected.  This
    is not the case for data PDUs.  The key question is what is the granularity
    of the retransmitted data.
    
    For iSCSI, long latency delay and loss of PDU is a matter of life.  There is
    a statistical possibility for hundreds of initiators sending PDUs to a
    target at the same time.   No flow control can prevent the PDU loss due to
    busy host bus that causes receive-buffer full. In addition to busy host bus,
    a data PDU can be lost due to, CRC error, TCP window closed or the switch
    and router being too busy.  The lost PDU must be retransmitted.  In TCP/IP,
    an ACK is needed for every PDU, the granularity of retransmit is the
    smallest. However, the ACK traffic is the highest which causes even more
    congestion with the highest possibility of the missing of an ACK.  The wait
    of an ACK on a network with long delay is very costly.  To stream a large
    amount of data without a huge buffer on a long-delayed network requires
    DMA/RDMA.   The tradeoff between a greater granularity of retransmit and les
    s ACK traffic is truly the challenge of this WG.  The ACK-0 concept from the
    fibre channel and the retransmit of a single sequence could be helpful to
    iSCSI.
    
    This WG assumes TCP/IP is such a reliable and proven technology that its
    flow and congestion control will solve all the problems magically.  It
    spends all its effort trying to fit iSCSI into a stream-oriented TCP/IP
    implementation.  I believe it would be helpful if this WG discusses the need
    of executing 100,000 IOs per second on an iSCSI adapter and the issue of the
    granularity of data retransmission on long-delayed network with a high
    probability of loss of a PDU.
    
    Y.P. Cheng, CTO, ConnectCom Solutions Corp.
    
    


Home

Last updated: Tue Sep 04 01:06:58 2001
6315 messages in chronological order