SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    iSCSI: SCSI timeout handling change



    All:
    
    Currently, if a command is not acknowledged by the ULP 
    timeout, iSCSI mandates the initiators to tear up the session.
    The rationale behind this is that if the initiator could not
    get the command through (in possibly multiple retries) even
    by the ULP timeout, there's a serious problem with the session.
    But there are some drawbacks to this approach -
    
            - tearing up a session due to a NIC failure is 
              disruptive to potentially several other active tasks
              on other NICs.
            - this puts those initiator implementations not wanting
              to do within-connection recovery (i.e. no retries) at
              a disadvantage, since one digest error would cause 
              potentially several active I/Os to be terminated.
            - (albeit not very serious, ) this behavior is different 
              from today's storage stacks' expectations - of being 
              able to selectively abort one I/O on a timeout (with 
              no command retransmissions).
    
    To address these issues, and also to simplify the current Task
    Management request PDU, I propose the following changes to handling 
    SCSI timeouts -
    
    Following changes to section 3.5:
    
    - Abort Task MUST always be sent immediate. 
    
    - Abort Task task management function request MUST be sent 
      with its CmdSN equal to the CmdSN of the task to be aborted, 
      and the Referenced Task Tag initialized to the ITT of the 
      task to be aborted.
    
    - Consequent to the above, drop the RefCmdSN field in the 
      Task Management command payload that is currently only 
      used by the Abort Task function.
    
    Following changes to section 8.6:
    
    Propose the following text to replace the current -
    
    An iSCSI initiator MAY attempt to plug a command sequence gap on
    the target end (in the absence of an acknowledgement of the command
    by way of ExpCmdSN) before the ULP timeout by retrying the
    unacknowledged command, as described in section 8.1.
    
    On a ULP timeout for a command that carried a CmdSN of n, if the
    ExpCmdSN is still less than (n+1) on ULP timeout, the iSCSI initiator
    MUST abort the command using the Abort Task task management function
    request.  In this process, the target may see the abort request 
    before the original command itself due to one of the three reasons -
    	- the original command was dropped due to digest error, or 
      	- the Abort Task request was shipped out-of-order 
              on the same connection, or
    	- the connection the original command sent on was
              successfully logged out.
    
    If the abort request is received prior to the original command, 
    targets MUST consider the original command with that CmdSN to 
    be received and discard the original command if and when received - 
    i.e. treating it as a duplicate CmdSN.  Initiators desirous of 
    maintaining command ordering while maintaining the same session 
    MUST NOT issue Abort Task on an unacknowledged command because 
    of this reason.
    
    Following changes to section 2.2.2.1:
    - The above approach exposes the possibility that some stale
      (aborted from target's perspective) commands could be stuck
      in the TCP connection long enough for the CmdSN wrap - similar
      to the issue we dealt with for command retries.  So, aborting
      unacknowledged commands should require the same flushing
      actions described for command retries. [ I almost would 
      prefer at this point to require flushing all connections
      every 2^31 -1 commands starting from InitCmdSN, than enumerating 
      these cases individually...]
    
    Comments?
    -- 
    Mallikarjun 
    
    
    Mallikarjun Chadalapaka
    Networked Storage Architecture
    Network Storage Solutions Organization
    MS 5668	Hewlett-Packard, Roseville.
    cbm@rose.hp.com
    


Home

Last updated: Thu Nov 15 02:18:05 2001
7820 messages in chronological order