SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI Reqts: In-Order Delivery



    Charles,
    
    To quote John,
    
    "A target like an IBM Shark or EMC Symmetrix will have thousands of LUs and
    10s to 100s of Hosts connected to it, and you want to reset the whole
    Target?  I do not think that is a good idea.  Perhaps Task Reset or LU
    reset etc. but not Target Reset."
    
    There could be one pending Task Management request per Logical Unit not per
    Target.  According to John, you would not have any hope of only sending one
    such command.  With hundreds of hosts, and thousands of Logical Units, there
    is more than a few potential commands to handle.  You MUST track each of
    these potential commands via their Client Tag and LUN.  It is also clear you
    will be expected to sort pending commands by their LUN value, look to see
    which commands are potentially affected by the Task Management command,
    bring the ExpCmdSN up to a place then enabled by this "cleanup", silently
    discard related commands as if they had been delivered and assume such
    operation of the Target.  These lost commands are provided placeholders so
    they do not inject holes into your sequence not yet transversed although you
    have not as yet acknowledged their delivery.  This implies there is a
    significant amount of SCSI level processing to be done on behalf of the
    Target and a bit of fudging as to what has and has not been delivered. (If I
    understood the intent of the recovery comments.)
    
    The technique that I presented does not need to track Client Tags, sorted by
    LUN, nor examine the Command content, decided which command is related to
    the Task Management, etc.  If there is a connection disqualification during
    this process, which may have actually caused the need for these Task
    Management commands, then the unacknowledged retries of these commands will
    again need to be sorted based on their Client Tag and LUN.  You will need to
    check for duplicates in this case as the Client will be unaware which
    commands had arrived.  In effect, the present technique assumes the
    transport is able to handle storing and sorting a great deal of Tags and
    LUNS during normal processing.
    
    Conversely, one could return responses serialized server wide and reject all
    bypassed commands.  Now there is no ACA problem, no Client Tags to track, no
    loss of Server state, and the flow control still works without the need to
    set aside enough room for 100,000 commands on John's IBM Shark controller,
    the extreme of one Task Management request per Client, per Logical Unit.
    You would be hoping the client doesn't try to get clever and send a single
    Abort Task per each command in mass.  If so, you may have 1,000,000 extra
    commands to handle, sort, and remember upon a response by the Target.
    
    Brutally clever or brutally simple, it is your choice.
    
    (The bypass flag would be used to bypass all commands on all connections.)
    
    Doug
    
    > Hi:
    >
    > The point of my original posting was to suggest ways in which the
    > semantics
    > of all the task management functions could be preserved in
    > multi-connection,
    > command striping implementations without a lot of complicated bookeeping.
    >
    > In that regard, the proposed solution imposes no additional tracking
    > equirements on initiators aside from those that would be needed anyhow to
    > issue the ABORT TASK request. For the most part, that amounts to keeping
    > track of each pending I/O request including a handle by which the task can
    > be referenced and a pointer to the connection the SCSI command was issued
    > on.
    >
    > I did neglect one restiction however: Specifically, that the
    > initiator have
    > no more than one task management request pending at a time to a specific
    > target.
    >
    > In other respects, as long as ordered delivery to the SCSI layer is
    > preserved for individual connections, I don't see a problem.
    >
    > > ....Those commands executed out of
    > > sequence by means of a bypass flag, those commands that are Task
    > Management .....
    >
    > I apparently don't understand how the bypass flag is supposed to
    > work.  I'd
    > assumed its function was to maximize the benefits of command striping by
    > allowing commands on other connections in the session to be bypassed. I'd
    > assumed that commands on the same connection are never bypassed
    > (since there
    > appears to be no benefit in doing so).
    >
    > Hence my statement:
    >
    > > > .....I've made the tacit assumption that commands
    > > on a given
    > > > connection are presented to the SCSI layer in order they were sent,
    > > > regardless of whether or nor cmdSN was set to 0.  I assume
    > > the framing
    > > > mechanisms that have been discussed for buffer offloading do not
    > > > affect this
    > > > behavior.  I.e., a fully formed PDU slated for immediate
    > > delivery won't be
    > > > passed to the SCSI layer before a partially complete PDU
    > > that was received
    > > > earlier.
    >
    > Is this assumption incorrect?
    >
    > Charles
    >
    > > -----Original Message-----
    > > From: Douglas Otis [mailto:dotis@sanlight.net]
    > > Sent: Monday, April 23, 2001 12:10 PM
    > > To: Charles Monia; Santosh Rao (E-mail)
    > > Cc: Ips (E-mail)
    > > Subject: RE: iSCSI Reqts: In-Order Delivery
    > >
    > >
    > > Charles,
    > >
    > > Your solution requires a fair amount of tracking of commands
    > > based solely on
    > > their Client Tags.  These Tags are randomly generated but will need to
    > > retain sequential order for your scheme.  The transport must
    > > remember the
    > > type of command sent together with their relative placement
    > > based only on
    > > the Client Tag.  In addition, these commands will need to be
    > > placed into
    > > different categories.  Those commands executed out of
    > > sequence by means of a
    > > bypass flag, those commands that are Task Management
    > > commands, and commands
    > > affected by these other types of commands.  It seems that in
    > > large part,
    > > these concerns can be met with proper handing of the
    > > transport without such
    > > laborious sorting of the Client Tags.  The out-of-sequence or
    > > bypass flag
    > > also depends on the transport sorting the Client Tag.  In addition to
    > > disabling flow-control, this technique of not incrementing
    > > the serialization
    > > of these commands, requires all commands with the same
    > > serialization value
    > > to be sent on the same connection without acknowledgment, if
    > > these commands
    > > are also to be kept in sequence.  This connection requirement
    > > is yet to be
    > > specified.
    > >
    > > Ver 6, Pg 12:
    > >    "iSCSI may avoid delivering some command to the
    > >    SCSI layer if so required by some prior SCSI or iSCSI action (e.g.,
    > >    clear task set Task Management request received before all the
    > >    commands it was supposed to act on)."
    > >
    > > Here, there seems to be expectations of the iSCSI transport
    > > interpreting the
    > > content of the SCSI commands.  How this is done is not
    > > obvious.  Is the
    > > transport expected to generate SCSI responses?
    > >
    > > In addition, although iSCSI presently relies on ACA, there are few
    > > applications that implement ACA.  It would appear for iSCSI
    > > to work with the
    > > present protocol, significant application changes are
    > > required.  With the
    > > proposal I am suggesting, this is not a problem as all
    > > bypassed commands are
    > > rejected back to the Initiator.  The drivers that implement
    > > iSCSI will be
    > > required to provide handling for these commands that bypass
    > > other commands.
    > > The amount of information contained in a rejected command
    > > list should be
    > > relatively small and these occasions for such Management
    > > rare.  Without
    > > proper handling of these events, there will be 2:00 AM alarm
    > > pagers going
    > > off.
    > >
    > > Here in the proposal, sorting CmdSN based on LUN values takes
    > > place within a
    > > "Barrier List."  I can not tell what is implied by these recovery
    > > instructions.  What is meant by Remove, Release, Drop,
    > > Cleanup, Placeholder,
    > > and ALL.  What is the intended feedback to the initiator for
    > > this Clean-up?
    > > It would appear the transport works on behalf of the target.  In the
    > > proposal that I am suggesting, there is no actions within the
    > > transport on
    > > behalf of the target.  All decisions are done either by the
    > > Target or the
    > > Initiator.  None by the transport.
    > >
    > > The concept is simple.  Keep the transport simple.  Do not expect the
    > > transport to decipher SCSI commands.  Do not expect the
    > > transport to respond
    > > on behalf of the Target.  Do not expect the transport to sort pending
    > > commands based on LUN value.  Do not expect the transport to
    > > require SCSI
    > > and iSCSI ACA.
    > >
    > > In the case of session wide serialization, what is good for
    > > the goose is
    > > also good for the gander.  It is important from the prospect
    > > of quickly
    > > detecting an error and knowing the server state to also use
    > > session wide
    > > serialization from the server.  The technique of replicating
    > > Management
    > > commands down each connection in addition to changing global
    > > commands into
    > > specific commands already over burdens the set-aside that
    > > must be made to
    > > handle these non-serialized management commands.  My proposal
    > > eliminates the
    > > problem of set-aside resources and loss of server state.  Rather than
    > > silently rejecting commands out-of-sequence, these rejections
    > > are reported.
    > > Once done, this feature can be used to extract pending
    > > commands in a simple
    > > and direct manner without burdening the transport.
    > >
    > > As attempts are made to support the SCSI architecture, rather than
    > > increasing the intelligence of the transport, efforts should
    > > be made to
    > > simplify the transport.  The number of fields that the transport must
    > > manipulate will be met with complexity and non-uniform implementation.
    > >
    > > See:
    > > http://www.ietf.org/internet-drafts/draft-otis-iscsi-fullack-00.txt
    > >
    > > Ver 6, Pg 92:
    > >      "N.B. As an alternative to Logout and reissue commands, the
    > >       initiator MAY instead reset the target and terminate all
    > >       outstanding commands with a service response indicating
    > >       Delivery Subsystem Failure. The initiator MUST perform one of
    > >       the two actions."
    > >
    > > ...
    > >
    >
    > > Ver 6, Pg 93:
    > >    "The following general mechanism can be used to achieve
    > > the effect of
    > >    ordered delivery for task management commands while enabling the
    > >    "urgent" delivery that some of them imply and immediate
    > > execution of
    > >    the task management commands without:
    > >
    > >       At Initiator when a relevant task management command is issued:
    > >
    > >          a) if ExpCmdSN is equal to CmdSN skip to step c
    > >          b) mark all pending commands with a CmdSN field between
    > >          ExpCmdSN and the current CmdSN and a relevant LUN as
    > >          candidates for cleanup and retain CmdSN in a "barrier list".
    > >          c) send the task management command for immediate delivery
    > >          to the target
    > >
    > >       At initiator when updating ExpCmdSN:
    > >
    > >          a) if the "barrier list" is empty or ExpCmdSN is less than
    > >          the first entry in the barrier list then skip to step d
    > >          b) remove the barrier list entry and remove and drop all
    > >          entries marked for cleanup having a CmdSN field less than
    > >          ExpCmdSN
    > >          c) go to step a
    > >          d) release all queued entries between the old and new
    > >          ExpCmdSN from the queue
    > >
    > >       At target when receiving a relevant task management command for
    > >       immediate delivery:
    > >
    > >          a) if ExpCmdSN is equal to CmdSN skip to step c
    > >          b) mark all pending entries (commands received and
    > >          placeholders) with a CmdSN field between ExpCmdSN and the
    > >          current CmdSN as candidates for cleanup and retain CmdSN in
    > >          a "barrier list" including the referenced LUN (or an ALL
    > >          marker)
    > >          c) send the task management command to SCSI for immediate
    > >          execution
    > >
    > >       At target when updating ExpCmdSN (releasing ordered commands to
    > >       SCSI):
    > >
    > >          a) if the "barrier list" is empty or ExpCmdSN is less than
    > >          the first entry in the barrier list then skip to step d
    > >          b) remove the barrier list entry and remove and drop all
    > >          entries marked for cleanup and having the same LUN as the
    > >          barrier entry (any if the barrier is marked ALL) and a CmdSN
    > >          field less than ExpCmdSN
    > >          c) go to step a
    > >          d) release all queued entries between the old and new
    > >          ExpCmdSN from the queue
    > >
    > >    Note that this scheme will withstand connection recovery."
    > >
    > > Doug
    >
    > < remainder deleleted>
    >
    
    


Home

Last updated: Tue Sep 04 01:04:54 2001
6315 messages in chronological order