Re: iSCSI: Out of order commands

To: Julian Satran <Julian_Satran@il.ibm.com>
Subject: Re: iSCSI: Out of order commands
From: Santosh Rao <santoshr@cup.hp.com>
Date: Fri, 09 Nov 2001 11:40:25 -0800
Cc: ips@ece.cmu.edu
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
Organization: Hewlett Packard, Cupertino.
References: <OF91ED7F43.D3695895-ONC2256AFF.00301D4B@telaviv.ibm.com>
Sender: owner-ips@ece.cmu.edu
Julian Satran wrote:
> 
> Mallikarjun,
> 
> There are several other mechanisms that will have to be changed to allow
> for OOO.
> Task abort relies on the fact that the task management request is
> delivered after the task to be aborted and on the same connection.

Julian,

I don't agree with your example above. The initiator will only issue an
Abort Task after issuing the command, [on experiencing a timeout of the
command, or for other error recovery reasons]. The in-order properties
of TCP ensure that the Abort Task is always delivered after the command,
as long as the Abort Task is sent on the same connection as the command.

The case under discussion is that multiple unrelated iscsi command pdu's
may be shipped out of order on a given connection. Obviously, this does
not apply for an abort task of an issued command.

Again, OOO commands within a connection is not a new design. There was
nothing preventing this behaviour up until now and nothing in the spec
is broken by this behaviour. I have not heard a SINGLE case of some
feature in the spec being broken by this behaviour. 

I would think the newly introduced restriction of requiring initiators
to issue commands in order within a connection is a new design
restriction and not the other way around !


> Retry command cleaning relies on the fact that the "cleaning command" is
> shipped after the retried command.
> And I suspect we may find others.

What do you mean by "cleaning command" ? I see nothing called "cleaning
command" in the draft. If you are referring to a re-issued command for
the purpose of plugging a hole, again, the original command and the
re-issued one are related and OOO will not occur in this case.

I would like to see some real scenarios where the current draft is
broken due to the initiator shipping multiple unrelated command OOO
within a given connection. What are the REAL reasons for imposing this
new design restriction ? 

- Santosh




> 
> This thread is going nowhere IMHO for two reasons:
> 
> the proponents of OOO (Rod, Santosh and Bob Russell ) have faced us with
> an issue not a design - the validity of which is hotly contested
> I and other authors do not seem to be willing to do all the work and make
> a design (or complete set of design changes) based on this sketch. Unlike
> other requests I could not see anything appealing enough to justify the
> time spent on doing it.
> 
> I suggest that Rod, Santosh and whoever else is willing to work with them
> put together a draft based on their ideas that should:
> 
> clearly state what is to be gained
> indicate in sufficient what changes are needed in the current draft to
> accommodate the new design not only in command delivery but in all other
> areas (task management, recovery, logout etc.).
> 
> Once we have this draft I assure you that we will give it serious
> consideration.
> 
> Julo
> 
> "Mallikarjun C." <cbm@rose.hp.com>
> Sent by: owner-ips@ece.cmu.edu
> 09-11-01 03:28
> Please respond to "Mallikarjun C."
> 
> 
>         To:     John Hufferd/San Jose/IBM@IBMUS
>         cc:     <ips@ece.cmu.edu>
>         Subject:        Re: iSCSI: Out of order commands
> 
> 
> 
> John,
> 
> Sorry, this note got a little longer than I would've liked, but....
> 
> I believe there are cases where OOO CmdSN handling is a
> legitimate requirement on targets due to exception events -
>     a) retransmitting a CmdSN on a command acknowledgement
>       timeout (within-connection recovery class).  This manifests as
>       an OOO CmdSN on the connection to a target if it didn't see
>       the original copy due to a digest error.
>     b) retransmitting the last few "lost" commands due to a connection
>       failure on a new connection. If this new connection had already
>       carried a CmdSN greater than these retransmitted commands
>       (prior to connection failure), this again manifests as OOO CmdSN
>       on the new connection to the target.
> 
> OTOH, I believe sending OOO CmdSNs on a connection as a
> regular practice is counterproductive, since the target must continuously
> re-order the initiator "optimization" leading to a zero-sum game.  I
> would argue that the need to dispatch CmdSNs OOO due to immediate
> data DMA (brought up by Rod) can be addressed by simple NIC
> changes to prefetch data for the next command (or more simply use
> unsolicited separate data PDUs, if negotiated).  [ You got to deal
> with the case of all commands being writes anyway! ]
> 
> If we allow OOO CmdSNs on a connection (I'd advocate discouraging
> it as a regular practice), I don't believe any of the stuff in error
> recovery
> breaks (nor does it affect the current reliance on ExpCmdSN).  Julian
> perhaps can comment.
>     - All the in-order assumptions are for DataSNs/R2TSNs/StatSNs, not
>        for CmdSNs.
>     - Any multi-connection session by definition must deal with OOO
> CmdSNs.
>     - I belive that the current abort task scheme for immediate commands
>       detailed in section 9.3 caters to OOO CmdSNs on a connection
>       as well (we must be dealing with an immediate Abort arriving before
>       the command today, since the command could have been hit with
>       a digest error).
> 
> To summarize, here is what I suggested to Julian in a private email -
> 
> a)I suggest using a SHOULD for in-order dispatch of
>   commands on a connection - for an initiator.
> 
> b)I suggest using a SHALL handle out-of-order commands
>   on a connection - for the target (as Barry pointed out).
> 
> Hope that was useful.
> --
> Mallikarjun
> 
> Mallikarjun Chadalapaka
> Networked Storage Architecture
> Network Storage Solutions Organization
> Hewlett-Packard MS 5668
> Roseville CA 95747
> 
> ----- Original Message -----
> From: "John Hufferd" <hufferd@us.ibm.com>
> To: <cbm@rose.hp.com>
> Cc: <ips@ece.cmu.edu>
> Sent: Thursday, November 08, 2001 1:40 PM
> Subject: Re: iSCSI: Out of order commands
> 
> >
> > Mallikarjun,
> > Could you comment on the concept of OOO on the ErrorRecoveryLevel>0.  I
> had
> > thought that "in order delivery" was part of the detection of missing
> PDUs
> > and needed for timely Recovery.  I was wondering if this changes the way
> we
> > would use the ExpCmdSN, etc.
> >
> > I think your opinions on this part of the OOO discussion would be
> valuable.
> > For example, how would you contrast the differences in detecting a
> problem
> > and recovering from that problem etc., today vrs the OOO approach (if
> any).
> >
> >
> > .
> > .
> > .
> > John L. Hufferd
> > Senior Technical Staff Member (STSM)
> > IBM/SSG San Jose Ca
> > Main Office (408) 256-0403, Tie: 276-0403,  eFax: (408) 904-4688
> > Home Office (408) 997-6136, Cell: (408) 499-9702
> > Internet address: hufferd@us.ibm.com
> >
> >
> > "Mallikarjun C." <cbm@rose.hp.com>@ece.cmu.edu on 11/07/2001 09:41:05 AM
> >
> > Please respond to cbm@rose.hp.com
> >
> > Sent by:  owner-ips@ece.cmu.edu
> >
> >
> > To:   Santosh Rao <santoshr@cup.hp.com>, ips@ece.cmu.edu
> > cc:
> > Subject:  Re: iSCSI: Out of order commands
> >
> >
> >
> > Santosh,
> >
> > I have only one comment on your responses.
> >
> > > Even a single connection target *MUST* implement a scoreboard. The
> > > reason being that it can see out-of-order arrival of commands due to
> > > commands being dropped on digest errors. In such a case, it must block
> > > further command processing until holes are filled.
> >
> > I made two convenient assumptions if you noticed, :-), one of which
> > is that target forces session recovery on *any* error that it sees
> > (ErrorRecoveryLevel=0) - including a dropped command due to a digest
> > error.  With that assumption, a target can afford not to implement
> > a scoreboard.
> >
> > As I said in a private note, I guess what primarily bothers me about
> > OOO commands on a connection is that it requires the receiver to
> > undo this "optimization" on its end - most notably on a single
> > connection.  TCP experts may comment on how/if they dealt with a
> > similar issue.
> >
> > OTOH, you had some valid comments on exceptions to ordering during
> > connection recovery.  Perhaps we can move on by making Julian's
> > proposed stipulation a SHOULD....
> > --
> > Mallikarjun
> >
> >
> > Mallikarjun Chadalapaka
> > Networked Storage Architecture
> > Network Storage Solutions Organization
> > MS 5668   Hewlett-Packard, Roseville.
> > cbm@rose.hp.com
> >
> >
> > Santosh Rao wrote:
> > >
> > > Mallikarjun,
> > >
> > > Some comments below.
> > >
> > > Regards,
> > > Santosh
> > >
> > > "Mallikarjun C." wrote:
> > > >
> > > > Rod and Julian,
> > > >
> > > > This has been an interesting thread of discussion.  Some
> > > > comments -
> > > >
> > > > 1.My first reaction was - allowing out-of-order command
> > > >   transmission on the same connection deprives targets of
> > > >   an implementation choice.  Targets which support only
> > > >   single-connection sessions and only support session
> > > >   recovery (reasonable assumptions in my mind) can no
> > > >   longer afford *not to* implement a command scoreboard.
> > >
> > > Even a single connection target *MUST* implement a scoreboard. The
> > > reason being that it can see out-of-order arrival of commands due to
> > > commands being dropped on digest errors. In such a case, it must block
> > > further command processing until holes are filled.
> > >
> > > Thus, there is no getting away from implementing a sequencer at the
> > > target. Given this, I think it is unreasonable to restrict initiator
> > > implementation flexibility by imposing a strict ordering requirement
> > > within the connection.
> > >
> > > > 2.Any end-node efficiency that is sought to be achieved
> > > >   by transmitting CmdSNs out-of-order from the initiator
> > > >   would be lost on the other end-node, since the target
> > > >   now must wait for re-ordering the commands.
> > >
> > > It has to handle this situation anyway to deal with holes caused by
> > > digest errors. This scenario occurs even with initiators that issue
> > > commands in order.
> > >
> > > >
> > > > 3.The flipside is that out-of-order transmission saves
> > > >   link badwidth (albeit at the expense of end-node efficiency),
> > > >   compared to idling the link waiting for outbound DMA.
> > > >   We have to determine if this is a reasonable trade-off.
> > > >
> > > > 4.I can see Rod's point that prefetching all immediate
> > > >   data can be a burden on the NIC resources.  But, two
> > > >   questions -
> > > >         - could the NIC not use unsolicited separate data
> > > >           PDUs in these cases? [ I realize that InitialR2T
> > > >           has to be "no" to let it happen... ]
> > > >         - could the NIC have a memory architecture that
> > > >           allows data prefetching for the next command (so
> > > >           this is a non-issue from the protocol perspective)?
> > > >           This scheme incurs one DMA delay for every new
> > > >           burst of commands.
> > > >
> > > > 5.Another (perhaps radical at this point) option is to do
> > > >   away with immediate unsolicited data, to stick only with
> > > >   separate unsolicited data.  I would personally be okay
> > > >   with the choice, particularly if this feature (that
> > > >   helps software implementations) starts making hardware
> > > >   design complicated/expensive.
> > > >
> > > > So, to summarize -
> > > >
> > > > option                         immediate         allow
> > > >                                data in spec?     out-of-order?
> > > >
> > > > (A) (5) above                  no                no
> > > > (B) No real reason to do this. no                yes
> > > > (C) (4) above                  yes               no
> > > > (D) pros & cons (1), (2) & (3) yes               yes
> > > >
> > > > >From the arguments I heard so far, I am leaning towards
> > > > option A, and option C in that order.
> > > >
> > > > Comments?
> > > > --
> > > > Mallikarjun
> > > >
> > > > Mallikarjun Chadalapaka
> > > > Networked Storage Architecture
> > > > Network Storage Solutions Organization
> > > > MS 5668 Hewlett-Packard, Roseville.
> > > > cbm@rose.hp.com
> > > >
> > > > Rod Harrison wrote:
> > > > >
> > > > > Julian,
> > > > >
> > > > >         I don't understand what you are proposing here, what do
> you
> > mean by
> > > > > "multiplexed" DMA?
> > > > >
> > > > >         The problem is that the DMAs take some time, the more
> there
> > are
> > > > > queued the longer the last DMAs queued take to complete. Some
> > commands
> > > > > require DMAs to complete before they can be sent, i.e. Writes with
> > > > > immediate data, some commands do not, i.e. Reads and writes with
> no
> > > > > immediate data. The iSCSI HBA wants to be able to send commands as
> > > > > soon a possible, which for a read after a write can be before the
> > > > > write's DMA has completed. Maintaining an ordered queue for
> commands
> > > > > to be sent on the HBA is expensive and redundant since the target
> > > > > already knows how to queue commands before committing them to its
> > SCSI
> > > > > layer.
> > > > >
> > > > >         The iSCSI HBA and its host driver are not at liberty to
> > change the
> > > > > order of commands from the OS, but the DMAs those commands need
> are
> > > > > unlikely to complete in the same order, and as I mentioned some
> > > > > commands need no DMA. If the HBA can't send commands out of CmdSN
> > > > > order it has to maintain an ordered queue of commands waiting to
> be
> > > > > sent, and potentially buffer a lot of data. For an HBA this makes
> > > > > immediate data almost impossible to support.
> > > > >
> > > > >         I don't see the problem with allowing out of order
> commands
> > given
> > > > > that the target already has to deal with very similar problems. I
> > > > > think we are getting in to the area of implementation choices
> here,
> > > > > which is inappropriate for a specification.
> > > > >
> > > > >         - Rod
> > > > >

-- 
##################################
Santosh Rao
Software Design Engineer,
HP-UX iSCSI Driver Team,
Hewlett Packard, Cupertino.
email : santoshr@cup.hp.com
Phone : 408-447-3751
##################################
References:
- Re: iSCSI: Out of order commands
  - From: "Julian Satran" <Julian_Satran@il.ibm.com>
Prev by Date: RE: iSCSI initiator availability for Windows ?
Next by Date: Re: iSCSI: Out of order commands
Prev by thread: Re: iSCSI: Out of order commands
Next by thread: RE: iSCSI: Out of order commands
Index(es):
- Date
- Thread
Home
Last updated: Fri Nov 09 16:17:36 2001
7705 messages in chronological order