Re: iSCSI: need for new data SNACK code?

To: <Black_David@emc.com>
Subject: Re: iSCSI: need for new data SNACK code?
From: "Mallikarjun C." <cbm@rose.hp.com>
Date: Fri, 12 Jul 2002 12:20:10 -0700
Cc: <ips@ece.cmu.edu>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
References: <277DD60FB639D511AC0400B0D068B71E0564C06F@CORPMX14>
Sender: owner-ips@ece.cmu.edu
David,

I see some convergence, but still disagreement on several aspects.  There's
an attempt to summarize the options at the bottom, and I'd prefer that others
or the process co-chair to comment, rather than continue with this thread with 
us two alone going back and forth.

> The flag per task is not needed - I'd expect the Target to look
> at the Data PDUs it would have to resend, check them against the max
> Data PDU size for this connection and fail the regular SNACK if any
> PDU is too large

I am afraid there's no free lunch here.  In this description, you're now 
expecting targets to maintain the PDU size of every PDU that it shipped
for each of the tasks, which causes a metadata explosion.  This was discussed
by Eddy Quicksall and myself in an earlier thread - "changing MaxPDUDataLength".

>If the permission is not used, the Initiator's
> status SNACK is not needed but does no harm.

Well, the point is - shouldn't the target be detecting these obvious bugs, and attempting
recovery/fix for these errors (it's a clear disconnect b/n target and initiator state).  Seems like
additional complexity on either end - to cover implementation bugs wrt prior synchronization.

> As the complexity of a protocol increases, that synchronized
> state machine assumption becomes more prone to failure.

I think this is where the major disconnect is between us.  As I responded to Dave
Sheehy yesterday, the iSCSI protocol specification *mandates* that a target must 
ship "exact replicas" of the data PDUs barring certain header fields unless the
PDU size was changed by an intermediate successful text negotiation.   What you're
suggesting is: despite this mandate, target may resegment illegally, so let's define a
new data SNACK code with identical wire semantics.  

>and for an initiator to expects to be able to do
> this with uninterrupted high performance is unrealistic 
..
>right sort of incentives in discouraging
> initiators from changing the max Data PDU size.

You're making an incorrect assumption here that it's just the initiators that
are likely to change the max PDU size.  Either party can do it - and was the
whole point behind the recent addition of "negotiation prompt" Async Message.
(Initiator, on the prompt, may respond with a blank Text Request PDU.)
It's for this precise reason that my option (a) limits *data SNACKs* while 
"any text negotiation" is in progress.

Now, on to your proposal...

> That strikes me as a productive direction that I could see enforcing
> An initiator that wants to be able to issue a Data SNACK for
> some or all of its commands then has to ensure that no such
> commands are outstanding when/while it changes (in particular
> reduces) the max Data PDU size. 

I am afraid you may have misunderstood what I was suggesting.  I was *not*
suggesting that PDU size must not be renegotiated with outstanding tasks. 
In stead, I was suggesting that there be no *data SNACKs* while any text
negotiation is going on (meaning that SNACKs can be issued after the negotiation 
completes, so the initiator can definitively throw out the status PDU for all
tasks with data recovery needs).

There's one weird corner case in the simplified option you're suggesting here - 
when a target wants to initiate a max PDU size change, it cannot know when the 
initiator is likely to quiesce the I/Os, nor there's a way to tell the initiator to stop.  
One way to deal with this - target issues a "negotiation prompt", and the initiator 
responds with a Text Request PDU only *after* all active tasks are completed 
and their statuses acknowledged.  This of course has the obvious drawback that
negotiation/declaration attempts by the target for *any key* would be rebuffed by
the initiator until the connection is quiesced.

With that said, let me suvey the available options:

Option.A
         - Keep the rev13 text, plus add the two additional text segments I proposed
            on the beginning of this thread (initiators must drop status in one case, SNACK
            must be issued only before the status is ack'ed), *and* add "no data SNACKs 
            while any text negotiation is on".
         Pros:   - No need for a new data SNACK code with identical wire semantics
                    - Can allow the PDU size change to happen with no wait for quiescing
                       any long running writes/reads (and those operations too benefit from ULPDU
                       containment from this changed PDU size).
         Cons:   - Additional complexity (compared to the standard data SNACK) on the 
                       initiator to drop the status SNACK; and to mark all active tasks while the 
                       PDU change had happened, so their statuses can dropped if necessary.

Option.B
          - Same as Option.A, but add the new resegmenting-Data SNACK code per 
             David's Last Call comment.
          Pros:  - Precludes surprises due to implementation errors (also a con, see below)
                    - Can allow the PDU size change to happen with no wait for quiescing
                       any long running writes/reads (and those operations too benefit from ULPDU
                       containment from this changed PDU size).
         Cons:  - Attempt to address an implementation error by protocol means, could be a
                      slippery slope.
                   - Requires a new data SNACK code which both sides have to handle, and which
                      conveys completely redundant information about the changed PDU size.
                   - Additional complexity (compared to the standard data SNACK) on the 
                      initiator to drop the status SNACK; and to mark all active tasks while the 
                      PDU change had happened, so their statuses can be dropped if necessary.

Option.C
           - Completely disallow PDU size changes (initiated by either party) while any tasks
              are active.  Rev13 text should be stripped of the resegmenting discussion.  Any 
              data SNACK always gets exact replicas.
           Pros:  - Simpler approach, initiators don't need to drop status PDUs, nor mark the
                       active tasks.
           Cons:  - Active tasks cannot dynamically adapt to PMTU degradation, so ULPDU
                        containment isn't always guaranteed - particulary painful for long-running tasks
                        for either party.
                      - Desired changes in max PDU size would need to wait for all tasks to quiesce
                        and the statuses be acknowledged, forcing a pause in the I/O activity.
                      - Any text negotiation prompted by the target can't be carried on until all 
                        active I/Os are quiesced (even if the target intends to negotiate other keys).

I prefer Option.A, followed by Option.B.  Option.C's cons appear to outweigh its simplicity,
so wouldn't prefer that.

Regards.
--
Mallikarjun

Mallikarjun Chadalapaka
Networked Storage Architecture
Network Storage Solutions
Hewlett-Packard MS 5668 
Roseville CA 95747
cbm@rose.hp.com


----- Original Message ----- 
From: <Black_David@emc.com>
To: <cbm@rose.hp.com>
Cc: <ips@ece.cmu.edu>
Sent: Thursday, July 11, 2002 8:42 PM
Subject: RE: iSCSI: need for new data SNACK code?


> Mallikarjun,
> 
> > > The new code
> > > provides the initiator with a more robust way to detect resegmentation
> > > by requiring the initiator to explicitly ask for it.  The initiator
> > > can take a simple approach of always starting with the existing
> > > Data SNACK code that does not resegment and only using the new code
> > > when the non-resegmenting SNACK doesn't work.  
> > 
> > That's one approach.  So, let's note that you're expecting the target
> > to maintain a "PDU-size-changed" flag for every active task, and 
> > expecting it to fail the "regular" data SNACK if the flag is set.
> 
> The flag per task is not needed - I'd expect the Target to look
> at the Data PDUs it would have to resend, check them against the max
> Data PDU size for this connection and fail the regular SNACK if any
> PDU is too large.  This could even be lazily evaluated at the time
> each PDU is to be sent because the odds are that if resegmenting
> is necessary, the first Data PDU to be resent is going to need it.
> The initiator still has to time out the failure of all the Data
> PDUs to arrive in order to deal with header corruption (unfortunately,
> the F bit doesn't help) - when it times out the "regular" Data SNACK,
> it issues a "resegmenting" one (this deals with resegmenting that
> becomes necessary after the first Data PDU for a Data SNACK has
> been sent).
> 
> > Some issues -
> >     a) would it really cover the (impossible, IMHO) case you're attempting
> >         to cover, in the face of multiple PDU size changes?
> 
> Should work - permission to resegment includes permission to
> re-resegment or worse.  Independent of how many size changes
> happen, the status SNACK at the end returns a new status with
> an ExpDataSN that reflects the right number of Data PDUs sent.
>  
> >     b) assuming that there indeed is a disconnect b/n the two,
> >         what should the target do when a resegmenting data SNACK
> >         is received, but there's no PDU size change?  I hope you
> >         aren't mandating the specific approach.
> 
> Resegmenting SNACK is "permission to resegment", and the target need
> not use that permission.  If the permission is not used, the Initiator's
> status SNACK is not needed but does no harm.
> 
> >     c) this approach costs one additional round-trip delay, where
> >         none is necessary (as argued below).
> 
> I make no apologies for spending a round trip to remove a data
> corruption risk.  This is a rare case with a possibly nasty failure
> mode - I'm much more interested in this working right than fast.
> 
> >     d) seems like it would need new Reject code(s) to distinguish
> >         a "regular" reject from that of the PDU size change ones.
> 
> Could be useful, but is not strictly necessary.
> 
> > >If the target makes
> > > its own choice to resegment, and the initiator doesn't think the
> > > target resegmented, 
> > 
> > Now this is beginning to feel more like the option A vs B vs C debate
> > we had a while ago.  If the protocol works correctly, both sides would
> > be *completely synchronized* on the fact of PDU size change.
> 
> As the complexity of a protocol increases, that synchronized
> state machine assumption becomes more prone to failure.  The
> whole discussion of default values for text keys and the resulting
> "if in doubt, negotiate it" maxim was one example.  The alternative
> of relying on every default key value to be what was expected was
> significantly less robust.
> 
> > There are two options for initiators to deal with this - 
> > 
> > a) don't issue any data SNACKs while any text negotiation is 
> >     in progress - wait till the text response is received successfully.
> 
> That strikes me as a productive direction that I could see enforcing
> with some "MUST"s / "MUST NOT"s - the initiator is causing this mess
> by changing the max Data PDU size on the connection.  This is not a
> friendly thing to do, and for an initiator to expects to be able to do
> this with uninterrupted high performance is unrealistic ... so imposing
> costs on the initiator for making this disruptive size change makes sense.
> 
> Suppose we went back to the old approach where Data SNACKs *never*
> resegment and required that:
> - Initiators MUST NOT issue Data SNACKs that could require
> resegmentation?
> - Targets MUST reject or ignore Data SNACKs that require
> resegmentation.
> - If resegmentation becomes necessary during retransmission
> of Data PDUs for a Data SNACK, PDUs retransmission
> MUST cease for that Data SNACK.
> An initiator that wants to be able to issue a Data SNACK for
> some or all of its commands then has to ensure that no such
> commands are outstanding when/while it changes (in particular
> reduces) the max Data PDU size.  In the limit, the initiator
> has to wait for all of its commands on the connection to
> complete before changing the max Data PDU size, and not
> start any new ones until the size change is complete.
> 
> This is simpler and more robust than any of the options under
> discussion and has the right sort of incentives in discouraging
> initiators from changing the max Data PDU size.
> 
> Can you accept this? I would expect widespread support for
> the resulting removal of target resegmentation from iSCSI.
> 
> I will however answer one more question ...
> 
> > > This requires additional Initiator
> > > state per command for something that almost never happens, and if it
> > > gets one of these markings wrong, 
> > 
> > Sorry, how is it different from the target getting wrong one 
> > of its aforementioned
> > "PDU-size-changed" flags for tasks?
> 
> (1) The per-task flags aren't needed - see above.
> (2) The failure is harmless - if the target fails to resegment and
> sends a PDU that is too large, the initiator discards it,
> and then decides what to do about the broken target.  There's
> no possibility of completing a READ command without all of its
> data.
> 
> Thanks,
> --David
> 
> ---------------------------------------------------
> David L. Black, Senior Technologist
> EMC Corporation, 42 South St., Hopkinton, MA  01748
> +1 (508) 249-6449            FAX: +1 (508) 497-8018
> black_david@emc.com       Mobile: +1 (978) 394-7754
> ---------------------------------------------------
> 
> 
> > -----Original Message-----
> > From: Mallikarjun C. [mailto:cbm@rose.hp.com]
> > Sent: Thursday, July 11, 2002 7:46 PM
> > To: Black_David@emc.com
> > Cc: ips@ece.cmu.edu
> > Subject: Re: iSCSI: need for new data SNACK code?
> > 
> > 
> > David, comments in text.
> > 
> > > I disagree with the "careful enough" characterization.  The new code
> > > provides the initiator with a more robust way to detect resegmentation
> > > by requiring the initiator to explicitly ask for it.  The initiator
> > > can take a simple approach of always starting with the existing
> > > Data SNACK code that does not resegment and only using the new code
> > > when the non-resegmenting SNACK doesn't work.  
> > 
> > That's one approach.  So, let's note that you're expecting 
> > the target to maintain 
> > a "PDU-size-changed" flag for every active task, and 
> > expecting it to fail the 
> > "regular" data SNACK if the flag is set.
> > 
> > Some issues -
> >     a) would it really cover the (impossible, IMHO) case 
> > you're attempting
> >         to cover, in the face of multiple PDU size changes? 
> >     b) assuming that there indeed is a disconnect b/n the 
> > two, what should the
> >         target do when a resegmenting data SNACK is received, 
> > but there's no
> >         PDU size change?  I hope you aren't mandating the 
> > specific approach.
> >     c) this approach costs one additional round-trip delay, 
> > where none is
> >         necessary (as argued below).
> >     d) seems like it would need new Reject code(s) to 
> > distinguish a "regular" reject
> >         from that of the PDU size change ones.
> > 
> > >If the target makes
> > > its own choice to resegment, and the initiator doesn't think the
> > > target resegmented, 
> > 
> > Now this is beginning to feel more like the option A vs B vs C debate
> > we had a while ago.  If the protocol works correctly, both sides would
> > be *completely synchronized* on the fact of PDU size change.  
> > 
> > There are two options for initiators to deal with this - 
> > 
> > a) don't issue any data SNACKs while any text negotiation is 
> > in progress - 
> >     wait till the text response is received successfully.
> > 
> > OR
> > 
> > b) issue a data SNACK regardless, and if the text response 
> > (that indicates 
> >     a PDU size change) arrives before the data burst 
> > completes, discard the 
> >     status PDU, and ask for its retransmission.
> > 
> > Option a is what I suggest, and b is for the adventurous sort.
> > 
> > >there are error scenarios that combine this with
> > > corrupt Data PDU headers to cause the initiator to successfully
> > > complete a SCSI command that has not delivered all its data
> > > (the resegmented PDUs caused the Data PDU count to match 
> > the ExpDataSN
> > > value in the response that should have been discarded, but wasn't).
> > 
> > Which is precisely why I'm suggesting that we mandate discarding the 
> > status PDU.  What am I missing?
> > 
> > > While these should be rare, their consequences can be catastrophic.
> > > 
> > > It is conveying the Initiator's instructions that resegmentation is
> > > permitted.  I am not comfortable with the last sentence 
> > above that assumes
> > > that the Initiator and Target will always have identical 
> > views of all of
> > > the effects of a full feature phase PDU size change - (which is
> > > a rare event to begin with, and hence likely to involve code that
> > > isn't well exercised/tested).
> > 
> > Obviously, I cannot guarantee the lack of bugs in any implementation.
> > But again, let's not attempt to address implementation bugs 
> > by protocol
> > means (that's why we picked option A in the A vs B vs C debate I 
> > referred to above - see the "reusing ISID for recovery" 
> > thread; it's for the 
> > same reason we removed the X-bit for connection reinstatement -
> > see the "X-bit in Login" thread).
> > 
> > > 
> > > > The only two changes from the rev14 text that I propose 
> > are that we add:
> > > >
> > > >    a) The first status PDU must always be dropped after a
> > > > MaxRecvDataSegmentLength change, if ever a data SNACK is
> > > > employed for the task.
> > > 
> > > When does this obligation to drop the first status PDU expire?  
> > 
> > As it says: when the first status PDU is dropped for the task 
> > - for each 
> > active task during a PDU size change, *and* for which a data SNACK 
> > is/was issued.
> > 
> > >I think
> > > the Initiator has to mark all commands that are outstanding 
> > or become
> > > outstanding between the time it starts the negotiation that changes
> > > MaxRecvDataSegmentLength and the time that it gets the 
> > final Text Response
> > > of that negotiation from the target.  This requires 
> > additional Initiator
> > > state per command for something that almost never happens, and if it
> > > gets one of these markings wrong, 
> > 
> > Sorry, how is it different from the target getting wrong one 
> > of its aforementioned
> > "PDU-size-changed" flags for tasks?
> > 
> > I believe that the onus should be on the initiator to do what 
> > it takes to 
> > do the right recovery - as is the general error recovery 
> > philosophy everywhere.
> > Target cannot predict if the initiator would be interested in 
> > recovering a
> > particular I/O (regardless of the operational ErrorRecoveryLevel).
> > 
> > >it's vulnerable to failing to deliver
> > > all the data for a SCSI command in a compound error situation.  An
> > > alternative with the new code could involve a single bit 
> > per connection
> > > that records whether the PDU size was ever changed (if so, retry any
> > > failed Data SNACK as a resegmenting Data SNACK). 
> > 
> > Or, use just "the Data SNACK", if we define only one.  I 
> > can't see why this
> > optimization needs two data SNACK codes.
> > 
> > > 
> > > > Initiator MUST issue a status SNACK to recover the
> > > > status PDU (i.e. move the onus of retransmitting
> > > > status from the target to the initiator).
> > > >     b) A SNACK requesting an R2T, Data or Status PDU for 
> > a task MUST be 
> > > >           issued before the status for the task is acknowledged.
> > > 
> > > I have no problem with these two.
> > > 
> > > > I'll be glad to see any technical reasons that I am 
> > > > overlooking, that require two codes.
> > > 
> > > See above.  This is somewhat analogous to the "if in doubt, 
> > negotiate
> > > it" principle for login - telling the other side *exactly* 
> > what is wanted
> > > is more robust than assuming that it will do what is wanted, and in
> > > this resegmenting Data SNACK case, there are potentially nasty
> > > consequences to an incorrect assumption.  Does this make any sense?
> > 
> > I see what you're trying to get at.  However, IMHO, there is 
> > no "assuming"
> > involved here.  If the protocol works right, it should do the 
> > right thing.  Or else, 
> > we are in serious trouble despite this change.
> > 
> > Regards.
> > --
> > Mallikarjun
> > 
> > Mallikarjun Chadalapaka
> > Networked Storage Architecture
> > Network Storage Solutions
> > Hewlett-Packard MS 5668 
> > Roseville CA 95747
> > cbm@rose.hp.com
> > 
> > 
>
Follow-Ups:
- RE: iSCSI: need for new data SNACK code?
  - From: "Randy Jennings" <randyj@data-transit.com>
References:
- RE: iSCSI: need for new data SNACK code?
  - From: Black_David@emc.com
Prev by Date: Re: iSCSI: DLB's [T.6] 2.3 iSCSI Session Types
Next by Date: Re: iSCSI: Recovery R2T
Prev by thread: remove
Next by thread: RE: iSCSI: need for new data SNACK code?
Index(es):
- Date
- Thread
Home
Last updated: Fri Jul 12 20:18:56 2002
11308 messages in chronological order