SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI: Out of order commands



    John Hufferd wrote:
    ....
    
    > With in order command arrival on a connection (as a normal event) seems to
    > provide the quick determination of an error in a Command PDU, and tell the
    > recovery code quickly to get the missing command resent.  What do you think
    > will be the effect of not knowing if the command is delayed in the
    > initiator or whether it was dropped because of a Header Digest Error?
    > 
    
    On single-connection sessions, yes, there's additional non-determinism
    on the target.  But keep in mind though that the target can at best 
    send a NOP to prompt the initiator to retransmit, even if it knows for
    sure - it can not definitively communicate its knowledge about the 
    missing to the initiator.
    
    On multi-connection sessions, nothing really changes.
    -- 
    Mallikarjun 
    
    
    Mallikarjun Chadalapaka
    Networked Storage Architecture
    Network Storage Solutions Organization
    MS 5668	Hewlett-Packard, Roseville.
    cbm@rose.hp.com
    
    
    > .
    > .
    > .
    > John L. Hufferd
    > Senior Technical Staff Member (STSM)
    > IBM/SSG San Jose Ca
    > Main Office (408) 256-0403, Tie: 276-0403,  eFax: (408) 904-4688
    > Home Office (408) 997-6136, Cell: (408) 499-9702
    > Internet address: hufferd@us.ibm.com
    > 
    > "Mallikarjun C." <cbm@rose.hp.com> on 11/08/2001 05:28:21 PM
    > 
    > To:   John Hufferd/San Jose/IBM@IBMUS
    > cc:   <ips@ece.cmu.edu>
    > Subject:  Re: iSCSI: Out of order commands
    > 
    > John,
    > 
    > Sorry, this note got a little longer than I would've liked, but....
    > 
    > I believe there are cases where OOO CmdSN handling is a
    > legitimate requirement on targets due to exception events -
    >     a) retransmitting a CmdSN on a command acknowledgement
    >       timeout (within-connection recovery class).  This manifests as
    >       an OOO CmdSN on the connection to a target if it didn't see
    >       the original copy due to a digest error.
    >     b) retransmitting the last few "lost" commands due to a connection
    >       failure on a new connection. If this new connection had already
    >       carried a CmdSN greater than these retransmitted commands
    >       (prior to connection failure), this again manifests as OOO CmdSN
    >       on the new connection to the target.
    > 
    > OTOH, I believe sending OOO CmdSNs on a connection as a
    > regular practice is counterproductive, since the target must continuously
    > re-order the initiator "optimization" leading to a zero-sum game.  I
    > would argue that the need to dispatch CmdSNs OOO due to immediate
    > data DMA (brought up by Rod) can be addressed by simple NIC
    > changes to prefetch data for the next command (or more simply use
    > unsolicited separate data PDUs, if negotiated).  [ You got to deal
    > with the case of all commands being writes anyway! ]
    > 
    > If we allow OOO CmdSNs on a connection (I'd advocate discouraging
    > it as a regular practice), I don't believe any of the stuff in error
    > recovery
    > breaks (nor does it affect the current reliance on ExpCmdSN).  Julian
    > perhaps can comment.
    >     - All the in-order assumptions are for DataSNs/R2TSNs/StatSNs, not
    >        for CmdSNs.
    >     - Any multi-connection session by definition must deal with OOO CmdSNs.
    >     - I belive that the current abort task scheme for immediate commands
    >       detailed in section 9.3 caters to OOO CmdSNs on a connection
    >       as well (we must be dealing with an immediate Abort arriving before
    >       the command today, since the command could have been hit with
    >       a digest error).
    > 
    > To summarize, here is what I suggested to Julian in a private email -
    > 
    > a)I suggest using a SHOULD for in-order dispatch of
    >   commands on a connection - for an initiator.
    > 
    > b)I suggest using a SHALL handle out-of-order commands
    >   on a connection - for the target (as Barry pointed out).
    > 
    > Hope that was useful.
    > --
    > Mallikarjun
    > 
    > Mallikarjun Chadalapaka
    > Networked Storage Architecture
    > Network Storage Solutions Organization
    > Hewlett-Packard MS 5668
    > Roseville CA 95747
    > 
    > ----- Original Message -----
    > From: "John Hufferd" <hufferd@us.ibm.com>
    > To: <cbm@rose.hp.com>
    > Cc: <ips@ece.cmu.edu>
    > Sent: Thursday, November 08, 2001 1:40 PM
    > Subject: Re: iSCSI: Out of order commands
    > 
    > >
    > > Mallikarjun,
    > > Could you comment on the concept of OOO on the ErrorRecoveryLevel>0.  I
    > had
    > > thought that "in order delivery" was part of the detection of missing
    > PDUs
    > > and needed for timely Recovery.  I was wondering if this changes the way
    > we
    > > would use the ExpCmdSN, etc.
    > >
    > > I think your opinions on this part of the OOO discussion would be
    > valuable.
    > > For example, how would you contrast the differences in detecting a
    > problem
    > > and recovering from that problem etc., today vrs the OOO approach (if
    > any).
    > >
    > >
    > > .
    > > .
    > > .
    > > John L. Hufferd
    > > Senior Technical Staff Member (STSM)
    > > IBM/SSG San Jose Ca
    > > Main Office (408) 256-0403, Tie: 276-0403,  eFax: (408) 904-4688
    > > Home Office (408) 997-6136, Cell: (408) 499-9702
    > > Internet address: hufferd@us.ibm.com
    > >
    > >
    > > "Mallikarjun C." <cbm@rose.hp.com>@ece.cmu.edu on 11/07/2001 09:41:05 AM
    > >
    > > Please respond to cbm@rose.hp.com
    > >
    > > Sent by:  owner-ips@ece.cmu.edu
    > >
    > >
    > > To:   Santosh Rao <santoshr@cup.hp.com>, ips@ece.cmu.edu
    > > cc:
    > > Subject:  Re: iSCSI: Out of order commands
    > >
    > >
    > >
    > > Santosh,
    > >
    > > I have only one comment on your responses.
    > >
    > > > Even a single connection target *MUST* implement a scoreboard. The
    > > > reason being that it can see out-of-order arrival of commands due to
    > > > commands being dropped on digest errors. In such a case, it must block
    > > > further command processing until holes are filled.
    > >
    > > I made two convenient assumptions if you noticed, :-), one of which
    > > is that target forces session recovery on *any* error that it sees
    > > (ErrorRecoveryLevel=0) - including a dropped command due to a digest
    > > error.  With that assumption, a target can afford not to implement
    > > a scoreboard.
    > >
    > > As I said in a private note, I guess what primarily bothers me about
    > > OOO commands on a connection is that it requires the receiver to
    > > undo this "optimization" on its end - most notably on a single
    > > connection.  TCP experts may comment on how/if they dealt with a
    > > similar issue.
    > >
    > > OTOH, you had some valid comments on exceptions to ordering during
    > > connection recovery.  Perhaps we can move on by making Julian's
    > > proposed stipulation a SHOULD....
    > > --
    > > Mallikarjun
    > >
    > >
    > > Mallikarjun Chadalapaka
    > > Networked Storage Architecture
    > > Network Storage Solutions Organization
    > > MS 5668   Hewlett-Packard, Roseville.
    > > cbm@rose.hp.com
    > >
    > >
    > > Santosh Rao wrote:
    > > >
    > > > Mallikarjun,
    > > >
    > > > Some comments below.
    > > >
    > > > Regards,
    > > > Santosh
    > > >
    > > > "Mallikarjun C." wrote:
    > > > >
    > > > > Rod and Julian,
    > > > >
    > > > > This has been an interesting thread of discussion.  Some
    > > > > comments -
    > > > >
    > > > > 1.My first reaction was - allowing out-of-order command
    > > > >   transmission on the same connection deprives targets of
    > > > >   an implementation choice.  Targets which support only
    > > > >   single-connection sessions and only support session
    > > > >   recovery (reasonable assumptions in my mind) can no
    > > > >   longer afford *not to* implement a command scoreboard.
    > > >
    > > > Even a single connection target *MUST* implement a scoreboard. The
    > > > reason being that it can see out-of-order arrival of commands due to
    > > > commands being dropped on digest errors. In such a case, it must block
    > > > further command processing until holes are filled.
    > > >
    > > > Thus, there is no getting away from implementing a sequencer at the
    > > > target. Given this, I think it is unreasonable to restrict initiator
    > > > implementation flexibility by imposing a strict ordering requirement
    > > > within the connection.
    > > >
    > > > > 2.Any end-node efficiency that is sought to be achieved
    > > > >   by transmitting CmdSNs out-of-order from the initiator
    > > > >   would be lost on the other end-node, since the target
    > > > >   now must wait for re-ordering the commands.
    > > >
    > > > It has to handle this situation anyway to deal with holes caused by
    > > > digest errors. This scenario occurs even with initiators that issue
    > > > commands in order.
    > > >
    > > > >
    > > > > 3.The flipside is that out-of-order transmission saves
    > > > >   link badwidth (albeit at the expense of end-node efficiency),
    > > > >   compared to idling the link waiting for outbound DMA.
    > > > >   We have to determine if this is a reasonable trade-off.
    > > > >
    > > > > 4.I can see Rod's point that prefetching all immediate
    > > > >   data can be a burden on the NIC resources.  But, two
    > > > >   questions -
    > > > >         - could the NIC not use unsolicited separate data
    > > > >           PDUs in these cases? [ I realize that InitialR2T
    > > > >           has to be "no" to let it happen... ]
    > > > >         - could the NIC have a memory architecture that
    > > > >           allows data prefetching for the next command (so
    > > > >           this is a non-issue from the protocol perspective)?
    > > > >           This scheme incurs one DMA delay for every new
    > > > >           burst of commands.
    > > > >
    > > > > 5.Another (perhaps radical at this point) option is to do
    > > > >   away with immediate unsolicited data, to stick only with
    > > > >   separate unsolicited data.  I would personally be okay
    > > > >   with the choice, particularly if this feature (that
    > > > >   helps software implementations) starts making hardware
    > > > >   design complicated/expensive.
    > > > >
    > > > > So, to summarize -
    > > > >
    > > > > option                         immediate         allow
    > > > >                                data in spec?     out-of-order?
    > > > >
    > > > > (A) (5) above                  no                no
    > > > > (B) No real reason to do this. no                yes
    > > > > (C) (4) above                  yes               no
    > > > > (D) pros & cons (1), (2) & (3) yes               yes
    > > > >
    > > > > >From the arguments I heard so far, I am leaning towards
    > > > > option A, and option C in that order.
    > > > >
    > > > > Comments?
    > > > > --
    > > > > Mallikarjun
    > > > >
    > > > > Mallikarjun Chadalapaka
    > > > > Networked Storage Architecture
    > > > > Network Storage Solutions Organization
    > > > > MS 5668 Hewlett-Packard, Roseville.
    > > > > cbm@rose.hp.com
    > > > >
    > > > > Rod Harrison wrote:
    > > > > >
    > > > > > Julian,
    > > > > >
    > > > > >         I don't understand what you are proposing here, what do you
    > > mean by
    > > > > > "multiplexed" DMA?
    > > > > >
    > > > > >         The problem is that the DMAs take some time, the more there
    > > are
    > > > > > queued the longer the last DMAs queued take to complete. Some
    > > commands
    > > > > > require DMAs to complete before they can be sent, i.e. Writes with
    > > > > > immediate data, some commands do not, i.e. Reads and writes with no
    > > > > > immediate data. The iSCSI HBA wants to be able to send commands as
    > > > > > soon a possible, which for a read after a write can be before the
    > > > > > write's DMA has completed. Maintaining an ordered queue for
    > commands
    > > > > > to be sent on the HBA is expensive and redundant since the target
    > > > > > already knows how to queue commands before committing them to its
    > > SCSI
    > > > > > layer.
    > > > > >
    > > > > >         The iSCSI HBA and its host driver are not at liberty to
    > > change the
    > > > > > order of commands from the OS, but the DMAs those commands need are
    > > > > > unlikely to complete in the same order, and as I mentioned some
    > > > > > commands need no DMA. If the HBA can't send commands out of CmdSN
    > > > > > order it has to maintain an ordered queue of commands waiting to be
    > > > > > sent, and potentially buffer a lot of data. For an HBA this makes
    > > > > > immediate data almost impossible to support.
    > > > > >
    > > > > >         I don't see the problem with allowing out of order commands
    > > given
    > > > > > that the target already has to deal with very similar problems. I
    > > > > > think we are getting in to the area of implementation choices here,
    > > > > > which is inappropriate for a specification.
    > > > > >
    > > > > >         - Rod
    > > > > >
    


Home

Last updated: Fri Nov 09 16:17:36 2001
7705 messages in chronological order