SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI Reqts: In-Order Delivery



    Charles,
    
    Your solution requires a fair amount of tracking of commands based solely on
    their Client Tags.  These Tags are randomly generated but will need to
    retain sequential order for your scheme.  The transport must remember the
    type of command sent together with their relative placement based only on
    the Client Tag.  In addition, these commands will need to be placed into
    different categories.  Those commands executed out of sequence by means of a
    bypass flag, those commands that are Task Management commands, and commands
    affected by these other types of commands.  It seems that in large part,
    these concerns can be met with proper handing of the transport without such
    laborious sorting of the Client Tags.  The out-of-sequence or bypass flag
    also depends on the transport sorting the Client Tag.  In addition to
    disabling flow-control, this technique of not incrementing the serialization
    of these commands, requires all commands with the same serialization value
    to be sent on the same connection without acknowledgment, if these commands
    are also to be kept in sequence.  This connection requirement is yet to be
    specified.
    
    Ver 6, Pg 12:
       "iSCSI may avoid delivering some command to the
       SCSI layer if so required by some prior SCSI or iSCSI action (e.g.,
       clear task set Task Management request received before all the
       commands it was supposed to act on)."
    
    Here, there seems to be expectations of the iSCSI transport interpreting the
    content of the SCSI commands.  How this is done is not obvious.  Is the
    transport expected to generate SCSI responses?
    
    In addition, although iSCSI presently relies on ACA, there are few
    applications that implement ACA.  It would appear for iSCSI to work with the
    present protocol, significant application changes are required.  With the
    proposal I am suggesting, this is not a problem as all bypassed commands are
    rejected back to the Initiator.  The drivers that implement iSCSI will be
    required to provide handling for these commands that bypass other commands.
    The amount of information contained in a rejected command list should be
    relatively small and these occasions for such Management rare.  Without
    proper handling of these events, there will be 2:00 AM alarm pagers going
    off.
    
    Here in the proposal, sorting CmdSN based on LUN values takes place within a
    "Barrier List."  I can not tell what is implied by these recovery
    instructions.  What is meant by Remove, Release, Drop, Cleanup, Placeholder,
    and ALL.  What is the intended feedback to the initiator for this Clean-up?
    It would appear the transport works on behalf of the target.  In the
    proposal that I am suggesting, there is no actions within the transport on
    behalf of the target.  All decisions are done either by the Target or the
    Initiator.  None by the transport.
    
    The concept is simple.  Keep the transport simple.  Do not expect the
    transport to decipher SCSI commands.  Do not expect the transport to respond
    on behalf of the Target.  Do not expect the transport to sort pending
    commands based on LUN value.  Do not expect the transport to require SCSI
    and iSCSI ACA.
    
    In the case of session wide serialization, what is good for the goose is
    also good for the gander.  It is important from the prospect of quickly
    detecting an error and knowing the server state to also use session wide
    serialization from the server.  The technique of replicating Management
    commands down each connection in addition to changing global commands into
    specific commands already over burdens the set-aside that must be made to
    handle these non-serialized management commands.  My proposal eliminates the
    problem of set-aside resources and loss of server state.  Rather than
    silently rejecting commands out-of-sequence, these rejections are reported.
    Once done, this feature can be used to extract pending commands in a simple
    and direct manner without burdening the transport.
    
    As attempts are made to support the SCSI architecture, rather than
    increasing the intelligence of the transport, efforts should be made to
    simplify the transport.  The number of fields that the transport must
    manipulate will be met with complexity and non-uniform implementation.
    
    See:
    http://www.ietf.org/internet-drafts/draft-otis-iscsi-fullack-00.txt
    
    Ver 6, Pg 92:
         "N.B. As an alternative to Logout and reissue commands, the
          initiator MAY instead reset the target and terminate all
          outstanding commands with a service response indicating
          Delivery Subsystem Failure. The initiator MUST perform one of
          the two actions."
    
    ...
    
    Ver 6, Pg 93:
       "The following general mechanism can be used to achieve the effect of
       ordered delivery for task management commands while enabling the
       "urgent" delivery that some of them imply and immediate execution of
       the task management commands without:
    
          At Initiator when a relevant task management command is issued:
    
             a) if ExpCmdSN is equal to CmdSN skip to step c
             b) mark all pending commands with a CmdSN field between
             ExpCmdSN and the current CmdSN and a relevant LUN as
             candidates for cleanup and retain CmdSN in a "barrier list".
             c) send the task management command for immediate delivery
             to the target
    
          At initiator when updating ExpCmdSN:
    
             a) if the "barrier list" is empty or ExpCmdSN is less than
             the first entry in the barrier list then skip to step d
             b) remove the barrier list entry and remove and drop all
             entries marked for cleanup having a CmdSN field less than
             ExpCmdSN
             c) go to step a
             d) release all queued entries between the old and new
             ExpCmdSN from the queue
    
          At target when receiving a relevant task management command for
          immediate delivery:
    
             a) if ExpCmdSN is equal to CmdSN skip to step c
             b) mark all pending entries (commands received and
             placeholders) with a CmdSN field between ExpCmdSN and the
             current CmdSN as candidates for cleanup and retain CmdSN in
             a "barrier list" including the referenced LUN (or an ALL
             marker)
             c) send the task management command to SCSI for immediate
             execution
    
          At target when updating ExpCmdSN (releasing ordered commands to
          SCSI):
    
             a) if the "barrier list" is empty or ExpCmdSN is less than
             the first entry in the barrier list then skip to step d
             b) remove the barrier list entry and remove and drop all
             entries marked for cleanup and having the same LUN as the
             barrier entry (any if the barrier is marked ALL) and a CmdSN
             field less than ExpCmdSN
             c) go to step a
             d) release all queued entries between the old and new
             ExpCmdSN from the queue
    
       Note that this scheme will withstand connection recovery."
    
    Doug
    
    > Hi Santosh:
    >
    > Please see below.
    >
    > > Charles Monia wrote:
    > >
    > > > > (1) MUST provide ordered delivery of SCSI commands from
    > > > >       the initiator to the target in the absence of transport
    > > > >       errors visible to iSCSI (e.g., iSCSI CRC failure,
    > > > >       unexpected TCP connection closure).
    > > >
    > > > Does the term "SCSI commands" include task management
    > > functions as well?  If
    > > > not, it should.
    > >
    > >
    > > Charles,
    > >
    > > Could iSCSI use a variant of the approach FCP-2 takes to solve the
    > > ordering issue for task mgmt error recovery ?
    > >
    > > The FCP-2 task management error recovery scheme is :
    > > - task mgmt function uses CRN 0
    > > - task mgmt function is executed immediately with no ordering
    > > latencies
    > > - both initiator & target clear all resources that can be cleared
    > > un-ambiguously.
    > > - any ambiguous exchanges shall be aborted by the port that
    > > detects the
    > > ambiguous state.
    > >
    > > In the case of iSCSI, an analogous approach could be :
    > > - task mgmt function uses immediate delivery flag for the
    > > task mgmt PDU.
    > > - task mgmt fn executed immediately avoiding any ordering latencies.
    > > - initiator & target clear all resources that can be cleared
    > > un-ambiguously.
    > > - initiator uses Abort Task to explicitly abort all active outstanding
    > > I/Os at the time the task mgmt fn was issued to avoid any ambiguous
    > > stale PDUs of an exchange from appearing at the target.
    > >
    > > Such an approach would avoid latencies on the execution of
    > > the task mgmt
    > > fn while still flushing out all the stale PDUs upon completion of the
    > > initiator actions for that task mgmt fn.
    > >
    >
    > The problem is to avoid scenarios where the initiator and target's view of
    > the task set are out of step.  Specifically, we must avoid the
    > case where an
    > initiator receives a PDU from a task it believes has been terminated.
    >
    > In that respect, the technique you describe above should work for an ABORT
    > TASK operation.
    >
    > In the case of ABORT TASK SET, the function could be emulated by issuing a
    > series of ABORT TASK requests. For CLEAR TASK SET, an initiator would
    > probably want to do the individual ABORT TASK operations, followed by a
    > CLEAR TASK SET to terminate tasks from other initiators.  I assume TARGET
    > RESET and LUN RESET would be emulated in a manner similar to
    > CLEAR TASK SET.
    > In all of these cases there may be some "atomicity" side effects caused by
    > doing things one at a time instead of all at once.
    >
    > The only sticky problem is insuring that the CLEAR ACA function
    > works right.
    > By that I mean that you don't want to issue the function until all prior
    > SCSI commands that were in flight when the ACA occurred have been
    > terminated
    > with the ACA ACTIVE status.  You can't simply replicate the
    > command on each
    > connection since you might inadvertently clear a subsequent ACA. (Yes -- I
    > know these are all edge cases, but we may as well try to get it right.)
    > Maybe the thing to do is implement the function such that the ACA
    > interlock
    > is not cleared until the CLEAR ACA function is sent on all the connections
    > comprising the session.
    >
    > One minor distinction worth noting is that CRN is enforced in the SCSI
    > layer, whereas cmdSN is enforced in the iSCSI transport.  So, a CRN of 0
    > doesn't take effect until the transport presents the command to the SCSI
    > layer for processing.  In that case, leapfrogging of PDU ordering never
    > occurs.
    >
    > Incidentally, I've made the tacit assumption that commands on a given
    > connection are presented to the SCSI layer in order they were sent,
    > regardless of whether or nor cmdSN was set to 0.  I assume the framing
    > mechanisms that have been discussed for buffer offloading do not
    > affect this
    > behavior.  I.e., a fully formed PDU slated for immediate delivery won't be
    > passed to the SCSI layer before a partially complete PDU that was received
    > earlier.
    >
    > If that's true, immediate delivery seems to have no meaning in a
    > single-connection scenario.  What's more, in all cases, the iSCSI layer
    > doesn't really have to be aware of task management semantics -- unless
    > someone decides to intermix immediate and sequential commands in a
    > multi-connection session.  Then all bets are off.
    >
    > Charles
    >
    
    


Home

Last updated: Tue Sep 04 01:04:55 2001
6315 messages in chronological order