RE: iSCSI: Out Of Sequence due to null sequence with multiple connections.

To: <sandeepj@research.bell-labs.com>
Subject: RE: iSCSI: Out Of Sequence due to null sequence with multiple connections.
From: "Douglas Otis" <dotis@sanlight.net>
Date: Wed, 4 Apr 2001 17:58:30 -0700
Cc: <Black_David@emc.com>, <ips@ece.cmu.edu>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Importance: Normal
In-Reply-To: <3ACB59C4.BCD5D379@research.bell-labs.com>
Sender: owner-ips@ece.cmu.edu

Sandeep,

My comment was concerned over what could be happening at the sequencer and I
view that as different from the target.  I view the sequencer as an aspect
of iSCSI separate from the SCSI target.  At the iSCSI sequencer, you would
have a few comparisons being made to see if a CmdSN can be issued to the
target which will be looking for the next PDU in the sequence.  For this
mechanism to work, I would recommend that all PDUs include a serial number.
In the case of a non-sequentially treated PDU, the code would look something
like this using unsigned integers.

if ( (pending CmdSN -  ns reference CmdSN ) > 2^(SERIAL_BITS - 1))
	{
	reject_pdu(CmdSN, SEQUENCER_INVALIDATION);
	}

An ability force the sequencer up to this non-sequential PDU while rejecting
any pending or stuck PDUs seems like a reasonable solution.  As the
sequencer must already make this comparison, no additional work is being
done.  No understanding of the PDU content is required also.

You can not simply drop the connection and expect to retain the sequential
nature of the interface, not introduce errors reflected within the SCSI
layer, or introduce uncertainty of completion status.  The ability to keep
the connection running has benefits in this area.  I would expect there to
be some disagreement, but keeping the retry methods simple would also have
benefits.  Control over PDUs pending within the sequencer introduced as a
result of multiple connections should have a means of controlling these
potential situations.  One reaction would be to eliminate multiple
connections and get rid of this clumsy sequencer.

The same considerations is happening with handling digest errors.  Handling
errors is hard, introducing additional state uncertainty as a means of
handling an error only seems to make a difficult situation worse.

Doug

> Doug,
>
> thanks.  If you (or anyone) could correct the psuedo-code below to
> illustrate your solution, it might help achieve quicker consensus
> and avoid some discussion.
>
> I see what I missed, in addition to Julian's point about the
> refTaskTag usage preventing ITT reuse.  But dont you still need
> the cmdSN of the original task to find out if task_mgmt command
> is early or late?
>  (a..assuming you are still sending the task_mgmt
> command with immediate delivery)
>
> **Event=task_mgmt at initiator:
>     purge PDUs in queue at initiator
>     send task_mgmt to target (cmdSN=0)
>
> **Event=task_mgmt at target:
>     compare refCmdSN with executing <min,max>CmdSN queue
>     if (refCmdSN < minCmdSN)
>         /*task_mgmt cmd is early */
>         must wait & drop the orig_task PDU when it arrives
>     else if (refCmdSN > maxCmdSN)
>         /*task_mgmt cmd is late, original task has completed at target*/
>         return task_response (response code=Task was not in task set)
>     else
>         /*task is executing*/
>         give task_mgmt command to SCSI layer
>
>
> -Sandeep
>
>
> Douglas Otis wrote:
> >
> > David,
> >
> > Sandeep missed a point found within serial math, you have a window that
> > rotates with respect to prior commands based on the magnitude of the
> > difference.  There is no need to maintain any state other than
> the sequence
> > of the flagged command where prior pending to be sent commands
> are rejected.
> > Obviously before this window rotates more than 2 billion PDUs,
> this prior
> > value will need to be retired.  This is not a difficult or high overhead
> > operation with respect to rejecting prior commands.  There
> would not be any
> > decisions within the sequencer regarding content of any
> rejected PDU.  You
> > still should want to purge PDUs waiting in a queue pending to
> be sent to the
> > target should an "immediate" command be flagged.  Your concept
> creates an
> > odd event with both sequential and non-sequential delivery of a task
> > management command.  You are then left with a time interval where a
> > non-sequential command reception must modify behavior waiting
> for a possible
> > counter-part.  Causing all pending PDUs to be rejected
> immediately there is
> > no waiting for status information or any further activity to occur.  You
> > would see reject-reject-status.  If the initiator needs these rejected
> > commands replayed, this becomes an option of the initiator.
> >
> > Doug
> >
> > > > I would state this much stronger.  Applications had better
> not have to
> > > know
> > > > that it is iSCSI underneath vs. FCP or parallel SCSI else I
> believe we
> > > > missed the objective (granted, some things such as target
> address space
> > > are
> > > > unavoidably different, but I believe task management
> functions should be
> > > the
> > > > same).  The transport needs to handle the transport issues without
> > > exposing
> > > > quirks to the SCSI or application layer.
> > >
> > > Unfortunately, I think we have an impossible situation.  It
> appears to me
> > > that
> > > we have to pick at most two of the following three goals, as
> I have yet to
> > > see
> > > any way to achieve all three for a single task management command on a
> > > multiple connection session:
> > >
> > > (1) The command takes effect immediately and its status/response
> > >       is available immediately.
> > > (2) The command affects all commands in flight, and its
> status/response
> > >       is delayed until all such effects are complete.
> > > (3) There is no significant visible departure from existing SCSI task
> > >       management behavior.
> > >
> > > The problem is that trying to do both (1) and (2) either
> requires SCSI to
> > > "execute" the task management command twice or requires that iSCSI do
> > > some task management (e.g., on the in-flight commands) on
> SCSI's behalf
> > > (or worse like having SCSI prolong the execution of the task
> management
> > > command until everything in flight in iSCSI arrives).  All of
> these appear
> > > to lead to problems with (3) in one form or another - two executions
> > > result in two SCSI status/responses that have to be merged, and iSCSI
> > > task management will sooner or later do something different from SCSI
> > > (e.g., I sincerely doubt that a Target in a bridge will ever
> get this 100%
> > > identical to the devices that are being bridged).
> > >
> > > The current iSCSI draft provides the choice of  [(1)] XOR [(2), (3)];
> > > the reason for not getting (3) with (1) is the possibility of the task
> > > management command bypassing commands that it's supposed to
> > > affect.  Charles' original proposal is [(2), (3)] because it has
> > > to time out
> > > a stuck connection before executing the command, and is roughly
> > > equivalent to sending the command for ordered delivery and having
> > > the implementation treat any queue between iSCSI and SCSI as
> > > being on the SCSI side of the line.  Doug Otis's counter-proposal
> > > falls into the category of iSCSI doing task management on SCSI's
> > > behalf and provides an example of how this results in visible changes
> > > in behavior -- for the CLEAR ACA task management command,
> > > aborting all tasks that are queued or in flight is generally
> incorrect.
> > >
> > > I would note that this issue does not arise on single
> connection sessions,
> > > because sending the command for immediate delivery plus some care not
> > > to reorder things in the iSCSI Target (i.e., consider the
> iSCSI to SCSI
> > > queue
> > > to be in "SCSI" and hence subject to the task management command)
> > > obtains all of (1) through (3).
> > >
> > > Going out on a limb, I suspect applications will generally
> want [(2), (3)]
> > > -- send for ordered delivery and wait for the dust to settle
> because that
> > > provides the best odds of having some weird device get into a known
> > > state from which further progress is possible.  This allows the
> > > application
> > > to not know whether parallel SCSI, FCP or iSCSI is underneath and
> > > relies on other iSCSI recovery procedures to make sure that the task
> > > management command is delivered and executed (e.g., unstick and/or
> > > close "stuck" connections).  There will be cases in which (1) is
> > > needed (e.g., observe tape robot doing something obviously wrong,
> > > and get it to stop immediately), but those may involve fairly blunt
> > > instruments (e.g., LUN RESET) and the need to clean up any collateral
> > > damage.
> > >
> > > Sandeep's proposal to create state in the target either fails
> to achieve
> > > (1) [if the response is delayed until the state is removed] or
> > > violates SAM2
> > > [returns the response to the task management command before the task
> > > management command is complete].  Having state linger after a
> completed
> > > LUN or TARGET RESET is almost certainly wrong.
> > >
> > > So, I think I'm down to sending task management functions
> once, usually
> > > for ordered delivery with the application making the ordered
> vs. immediate
> > > delivery choice (and sending the task management function twice if it
> > > so chooses).  I think apps will generally choose ordered
> > > delivery, choosing
> > > predictable behavior over immediacy concerns.  Aside from a longer
> > > discussion of this issue, I still don't see the need for additional
> > > mechanism(s) to task management - what have I missed in the above
> > > discussion?
> > >
> > > --David
> > >
> > > ---------------------------------------------------
> > > David L. Black, Senior Technologist
> > > EMC Corporation, 42 South St., Hopkinton, MA  01748
> > > +1 (508) 435-1000 x75140     FAX: +1 (508) 497-8500
> > > black_david@emc.com       Mobile: +1 (978) 394-7754
> > > ---------------------------------------------------
> > >
> > >
>

References:
- Re: iSCSI: Out Of Sequence due to null sequence with multiple con nections.
  - From: Sandeep Joshi <sandeepj@research.bell-labs.com>

Prev by Date: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Next by Date: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Prev by thread: Re: iSCSI: Out Of Sequence due to null sequence with multiple con nections.
Next by thread: RE: iSCSI: Out Of Sequence due to null sequence with multiple con nections.
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:05:10 2001
6315 messages in chronological order