SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: Notes of 06/21 meeting



    At 12:04 AM 6/29/00 -0700, Costa Sapuntzakis wrote:
    
    >It was pointed out that the command reference number
    >as spec'ed was not long-lived enough to provide error
    >recovery. However, the task tag could be used to do
    >error recovery.
    
    This is why it should be at least a 32-bit value and possibly a 48-bit 
    value.  It should also be done on all commands to simplify the problems 
    later described, i.e. multiple TCP op support, the ability to deal with 
    overflows (only receive what is within the window of support ops and let a 
    SACK-like error recovery deal anything lost), simplifies the hardware 
    (always present and can be used to retain ordering without much overhead), 
    simplifies mirroring since one can immediately forward the ops in the order 
    the initiator wanted without stalls, etc.
    
    
    >--------------
    >
    >There was then a discussion about whether the command
    >reference number should be per LU or per session.
    
    Large value per session and then it does not matter.
    
    
    >There was a lot of talk about whether we want
    >to support multiple TCP connections/session.
    >
    >John Hufferd pointed out that SCSI load balancers already exist
    >that take advantage of multiple sessions (multiple SCSI busses)
    >to stripe commands to a target. He argued that multiple
    >TCP connections are unnecessary. He also argued that no applications
    >make effective use of SCSI ORDERED attribute, because the
    >interface are not there.
    
    Very simple implementation can be built with multiple TCP connections / per 
    session.  With the command reference numbers always sent on operation, the 
    start / stop problem is mitigated because one is receiving / processing the 
    operations in the order they were received.  In addition, one can develop 
    the hooks that separate specs provide for arbitration policies, QoS, etc. 
    to deal with different link bandwidth / etc. attributes.
    
    
    >However, they have to stop and wait for ordered commands.
    >One application where stop and wait hurts is tape (where
    >all writes are ordered), so some tape applications write
    >self-describing blocks to tape which can be written in any order.
    >
    >Remote asynchronous mirroring can also be done with ordered
    >writes. Hufferd argued that remote asynchronous mirroring must
    >be solved at a higher layer and is being solved today.
    
    Not that difficult to do with what I described above.
    
    
    >Most of those arguing for multiple TCP connection said that
    >     - it isn't that hard
    >     - it would make iSCSI better than other SCSI transports
    >     - it would make high-perf apps easier to write
    
    Add in
       - Multi-path support is much easier to implement.
       - Higher performance can be achieved
       - Implementations are fairly simple - minimal state
       - Application transparent ability to take advantage / recover from 
    hot-plug / removal of fabric components
    
    
    
    >-------------
    >Deadlock:
    >
    >Luciano pointed out that it is possible to run out of
    >buffers and deadlock with multiple TCP connections.
    >
    >The source of the problem is
    >         1) receive too many out-of-order commands
    >         2) receiving too much unsolicited (immediate)
    >            data
    >
    >The solution to 1) is to either
    >    - limit the number of out-of-order commands that
    >      are read from each TCP pipe to 1 (requires NIC
    >      to know that command is out-of-order) and then
    >      stop reading from the connections (deskewing)
    >    - have a windowing mechanism on the command
    >      ordering queue in target
    >    - have a separate TCP pipe for emergency
    >      recovery commands
    >    - Nuspeed aborts command with SCSI status TASK QUEUE FULL
    >
    >The consensus seems to have resulted in windowing
    >being adopted.
    
    The NIC does not have to track this per se.  If the NIC has the SGL for the 
    target buffer it can perform the DMA.  If the SGL does not exist, then it 
    can drop the message without issuing a TCP ACK (Issue is whether one wants 
    to slow this down at the TCP level or allow it to complete but have the NIC 
    still drop the buffers w.r.t. the DMA targeting - preference is to complete 
    from TCP point of view but drop the DMA operation).  The operation target 
    and buffers are locally posted so the rate can be controlled quite easily.
    
    The windowing proposal will work well as a control point for SGL posting to 
    individual commands - again with minimal if any complexity.  If the command 
    reference number is always present, life can be further simplified.
    
    
    >The consensus solution to 2) was to allow the
    >target to drop immediate data and request it be
    >retransmited via ready-to-transmit (RTT).
    >
    >--------------
    >
    >Should task management commands be ordered with respect to tasks?
    >
    >Those against feared that ordering task mangement commands
    >would prevent their timely delivery.
    >
    >Those for feared that not ordering task management commands
    >would lead to surprising behaviors (like ABORT TASK SET
    >overtaking and not aborting all previously issued tasks).
    >
    >----------------
    >
    >Can a single iSCSI TCP connection use multiple paths in the network
    >simultaneously?
    >
    >Answer: Most networks keep a flow on one path to help ensure
    >minimal re-ordering, so no in that case. Of course, this being IP,
    >people could design a network that sprays packets of a flow across
    >multiple paths and it would still work...
    
    Most of us would prefer to not have a single connection flow through 
    different paths - the complexity to the hardware for what is nominally a 
    rare event would be increased.  A well-behaved environment is possible to 
    implement but then one is asking for IP to do this and creating additional 
    specification work.
    
    Mike
    
    


Home

Last updated: Tue Sep 04 01:08:12 2001
6315 messages in chronological order