SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Out of order commands



    Julian,
    
    	The deadlock situation you outlined is not related to out of order
    commands, it arises because the target implementation is broken. A
    target that advertises a command window larger than the number of
    commands it can support with the intent of leveraging TCP buffering is
    making big, and invalid assumptions about the initiator. The target
    has no idea how long it might take to receive / generate the data
    associated with commands. For example, if a target offers a command
    window of 10 with the intent of processing 8 commands at a time and
    having the other two in flight it will deadlock if the oldest command
    needs an R2T. This happens even if the commands are sent in order. It
    is never acceptable for a target to stop reading its TCP stream unless
    its command window is full since there may be unread DATA-OUTs needed
    to allow commands to be committed to SCSI.
    
    	There are very real performance gains associated with sending out of
    order commands, the simplest of which was outlined by Bob Russell when
    this thread started. We do not have a zero latency network so queuing
    as close as possible the ultimate data sink will always be a win.
    
    	- Rod
    
    -----Original Message-----
    From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
    Julian Satran
    Sent: Wednesday, November 07, 2001 10:00 AM
    To: ips@ece.cmu.edu
    Subject: Re: iSCSI: Out of order commands
    
    
    Mallikarjun,
    
    I did not see a SINGLE performance improvement that results from OOO
    shipping.
    I would be bad engineering to give away the "no-deadlock" mechanism we
    have now for nothing.
    I have also the impression that the point about deadlock that I keep
    repeating is ignored or not understood.
    As we stand today commands can be shipped with Immediate data or
    without
    and an implementer determined
    to squeeze maximum bandwidth and overlap command start with delivery
    will
    choose not to work with immediate data
    (as you have pointed out) while a low performance software
    implementation
    will use immediate data to minimize CPU cycles consumed.  However both
    will be guaranteed to work without deadlock as source and sink use the
    same ordering.
    Recovery is still a low probability event and should be handled with a
    different set of considerations in mind.
    As for the strictness of the recommendation - yes we could settle on
    SHOULD.
    
    Julo
    
    
    
    
    "Mallikarjun C." <cbm@rose.hp.com>
    Sent by: owner-ips@ece.cmu.edu
    07-11-01 19:41
    Please respond to cbm
    
    
            To:     Santosh Rao <santoshr@cup.hp.com>, ips@ece.cmu.edu
            cc:
            Subject:        Re: iSCSI: Out of order commands
    
    
    
    Santosh,
    
    I have only one comment on your responses.
    
    > Even a single connection target *MUST* implement a scoreboard. The
    > reason being that it can see out-of-order arrival of commands due to
    > commands being dropped on digest errors. In such a case, it must
    block
    > further command processing until holes are filled.
    
    I made two convenient assumptions if you noticed, :-), one of which
    is that target forces session recovery on *any* error that it sees
    (ErrorRecoveryLevel=0) - including a dropped command due to a digest
    error.  With that assumption, a target can afford not to implement
    a scoreboard.
    
    As I said in a private note, I guess what primarily bothers me about
    OOO commands on a connection is that it requires the receiver to
    undo this "optimization" on its end - most notably on a single
    connection.  TCP experts may comment on how/if they dealt with a
    similar issue.
    
    OTOH, you had some valid comments on exceptions to ordering during
    connection recovery.  Perhaps we can move on by making Julian's
    proposed stipulation a SHOULD....
    --
    Mallikarjun
    
    
    Mallikarjun Chadalapaka
    Networked Storage Architecture
    Network Storage Solutions Organization
    MS 5668          Hewlett-Packard, Roseville.
    cbm@rose.hp.com
    
    
    Santosh Rao wrote:
    >
    > Mallikarjun,
    >
    > Some comments below.
    >
    > Regards,
    > Santosh
    >
    > "Mallikarjun C." wrote:
    > >
    > > Rod and Julian,
    > >
    > > This has been an interesting thread of discussion.  Some
    > > comments -
    > >
    > > 1.My first reaction was - allowing out-of-order command
    > >   transmission on the same connection deprives targets of
    > >   an implementation choice.  Targets which support only
    > >   single-connection sessions and only support session
    > >   recovery (reasonable assumptions in my mind) can no
    > >   longer afford *not to* implement a command scoreboard.
    >
    > Even a single connection target *MUST* implement a scoreboard. The
    > reason being that it can see out-of-order arrival of commands due to
    > commands being dropped on digest errors. In such a case, it must
    block
    > further command processing until holes are filled.
    >
    > Thus, there is no getting away from implementing a sequencer at the
    > target. Given this, I think it is unreasonable to restrict initiator
    > implementation flexibility by imposing a strict ordering requirement
    > within the connection.
    >
    > > 2.Any end-node efficiency that is sought to be achieved
    > >   by transmitting CmdSNs out-of-order from the initiator
    > >   would be lost on the other end-node, since the target
    > >   now must wait for re-ordering the commands.
    >
    > It has to handle this situation anyway to deal with holes caused by
    > digest errors. This scenario occurs even with initiators that issue
    > commands in order.
    >
    > >
    > > 3.The flipside is that out-of-order transmission saves
    > >   link badwidth (albeit at the expense of end-node efficiency),
    > >   compared to idling the link waiting for outbound DMA.
    > >   We have to determine if this is a reasonable trade-off.
    > >
    > > 4.I can see Rod's point that prefetching all immediate
    > >   data can be a burden on the NIC resources.  But, two
    > >   questions -
    > >         - could the NIC not use unsolicited separate data
    > >           PDUs in these cases? [ I realize that InitialR2T
    > >           has to be "no" to let it happen... ]
    > >         - could the NIC have a memory architecture that
    > >           allows data prefetching for the next command (so
    > >           this is a non-issue from the protocol perspective)?
    > >           This scheme incurs one DMA delay for every new
    > >           burst of commands.
    > >
    > > 5.Another (perhaps radical at this point) option is to do
    > >   away with immediate unsolicited data, to stick only with
    > >   separate unsolicited data.  I would personally be okay
    > >   with the choice, particularly if this feature (that
    > >   helps software implementations) starts making hardware
    > >   design complicated/expensive.
    > >
    > > So, to summarize -
    > >
    > > option                         immediate         allow
    > >                                data in spec?     out-of-order?
    > >
    > > (A) (5) above                  no                no
    > > (B) No real reason to do this. no                yes
    > > (C) (4) above                  yes               no
    > > (D) pros & cons (1), (2) & (3) yes               yes
    > >
    > > >From the arguments I heard so far, I am leaning towards
    > > option A, and option C in that order.
    > >
    > > Comments?
    > > --
    > > Mallikarjun
    > >
    > > Mallikarjun Chadalapaka
    > > Networked Storage Architecture
    > > Network Storage Solutions Organization
    > > MS 5668 Hewlett-Packard, Roseville.
    > > cbm@rose.hp.com
    > >
    > > Rod Harrison wrote:
    > > >
    > > > Julian,
    > > >
    > > >         I don't understand what you are proposing here, what do
    you
    mean by
    > > > "multiplexed" DMA?
    > > >
    > > >         The problem is that the DMAs take some time, the more
    there
    are
    > > > queued the longer the last DMAs queued take to complete. Some
    commands
    > > > require DMAs to complete before they can be sent, i.e. Writes
    with
    > > > immediate data, some commands do not, i.e. Reads and writes with
    no
    > > > immediate data. The iSCSI HBA wants to be able to send commands
    as
    > > > soon a possible, which for a read after a write can be before
    the
    > > > write's DMA has completed. Maintaining an ordered queue for
    commands
    > > > to be sent on the HBA is expensive and redundant since the
    target
    > > > already knows how to queue commands before committing them to
    its
    SCSI
    > > > layer.
    > > >
    > > >         The iSCSI HBA and its host driver are not at liberty to
    change the
    > > > order of commands from the OS, but the DMAs those commands need
    are
    > > > unlikely to complete in the same order, and as I mentioned some
    > > > commands need no DMA. If the HBA can't send commands out of
    CmdSN
    > > > order it has to maintain an ordered queue of commands waiting to
    be
    > > > sent, and potentially buffer a lot of data. For an HBA this
    makes
    > > > immediate data almost impossible to support.
    > > >
    > > >         I don't see the problem with allowing out of order
    commands
    given
    > > > that the target already has to deal with very similar problems.
    I
    > > > think we are getting in to the area of implementation choices
    here,
    > > > which is inappropriate for a specification.
    > > >
    > > >         - Rod
    > > >
    > > > -----Original Message-----
    > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
    Behalf
    Of
    > > > Julian Satran
    > > > Sent: Monday, November 05, 2001 10:06 PM
    > > > To: ips@ece.cmu.edu
    > > > Subject: Re: iSCSI: Out of order commands, was current UNH
    Plugfest
    > > >
    > > > Rod,
    > > >
    > > > I don't see any reason why DMA operations cant be "multiplexed"
    with
    > > > commands.
    > > > If you have scheduled a long outbound DMA you are doomed
    regardless
    of
    > > > the
    > > > command ordering.
    > > > And if you have scheduled DMA operations piecemeal then you can
    insert
    > > > your commands in correct order.
    > > >
    > > > Julo
    > > >
    > > > "Rod Harrison" <rod.harrison@windriver.com>
    > > > 05-11-01 20:48
    > > > Please respond to "Rod Harrison"
    > > >
    > > >         To:     Julian Satran/Haifa/IBM@IBMIL, <ips@ece.cmu.edu>
    > > >         cc:
    > > >         Subject:        iSCSI: Out of order commands, was
    current
    UNH
    > > > Plugfest
    > > >
    > > >                  [ Subject changed ]
    > > >
    > > > Julian,
    > > >
    > > >                  The ordering difference is introduced between
    the
    > > > host
    > > > side driver
    > > > and the iSCSI HBA. The host side driver must present SCSI
    commands
    to
    > > > the HBA in the order they are received from the OS to prevent
    read
    > > > after write dependency failures. The HBA might reorder the
    commands
    > > > depending on when DMA completes. The reordering can't be done
    ahead
    of
    > > > time in the host driver since it doesn't know how long each DMA
    might
    > > > take. As long as the HBA assigns CmdSN in the order it receives
    > > > commands the desired host ordering is preserved.
    > > >
    > > >                  - Rod
    > > >
    > > > -----Original Message-----
    > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
    Behalf
    Of
    > > > Julian Satran
    > > > Sent: Monday, November 05, 2001 12:35 AM
    > > > To: ips@ece.cmu.edu
    > > > Subject: RE: iSCSI: current UNH Plugfest
    > > >
    > > > Rod,
    > > >
    > > > I all examples give the point I find hard to understand is why
    is
    the
    > > > ordering on the wire different from the presentation order to
    the
    > > > initiator.  You can get as many overlaps as you want by
    presenting
    the
    > > > commands to the initiator in the desired order.
    > > > What we are considering here is the case in which you want to
    ship
    in
    > > > an
    > > > order different than the one you present the commands.
    > > >
    > > > Julo
    > > >
    > > > "Rod Harrison" <rod.harrison@windriver.com>
    > > > Sent by: owner-ips@ece.cmu.edu
    > > > 04-11-01 04:42
    > > > Please respond to "Rod Harrison"
    > > >
    > > >         To:     "Barry Reinhold" <bbrtrebia@mediaone.net>, "Dave
    > > > Sheehy"
    > > > <dbs@acropora.rose.agilent.com>, "IETF IP SAN Reflector"
    > > > <ips@ece.cmu.edu>
    > > >         cc:
    > > >         Subject:        RE: iSCSI: current UNH Plugfest
    > > >
    > > > Barry,
    > > >
    > > >                  In general I agree but I don't think this is as
    much
    > > > of a
    > > > corner case
    > > > as it at first appears. Targets will have code very similar to
    that
    > > > needed to handle out of order commands to deal with digest
    errors.
    > > > Targets also need to queue commands whilst waiting for both
    solicited
    > > > and unsolicited data to arrive. Queuing out of order commands
    seems
    > > > little extra work.
    > > >
    > > >                  From an initiators point of view there are
    > > > efficiency,
    > > > and probably
    > > > performance gains to be had from sending commands out of order.
    Bob
    > > > Russell gave the example of a read being sent whilst write data
    DMA
    is
    > > > happening, and a similar situation can arise with DMA for writes
    > > > overtaking that of earlier writes if the initiator has multiple
    DMA
    > > > engines. In this case the initiator might be forced to let the
    wire
    go
    > > > idle if it can't send the data from completed DMAs as soon as
    > > > possible.
    > > >
    > > >                  We already have a command queue at the target
    to
    > > > enforce
    > > > correct
    > > > serialisation of commands, doing the same thing at the initiator
    is
    > > > redundant.
    > > >
    > > >                  Finally, I don't believe we should be writing a
    > > > standard
    > > > to work
    > > > around poor coding and test coverage, especially at the cost of
    > > > potential efficiency gains.
    > > >
    > > >                  I agree with Dave and Santosh that commands
    being
    > > > sent
    > > > out of order
    > > > on a single session should be allowed by the standard.
    > > >
    > > >                  - Rod
    > > >
    > > > -----Original Message-----
    > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
    Behalf
    Of
    > > > Barry Reinhold
    > > > Sent: Friday, November 02, 2001 5:24 PM
    > > > To: Dave Sheehy; IETF IP SAN Reflector
    > > > Subject: RE: iSCSI: current UNH Plugfest
    > > >
    > > > Using features such as out of order command delivery on a
    connection
    > > > tend to
    > > > be the sort of things that lead to interoperability problems. It
    is
    > > > unexpected and probably going to hit poorly tested code paths
    even
    if
    > > > the
    > > > standard is written to allow it.
    > > >
    > > > >-----Original Message-----
    > > > >From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
    Behalf
    > > > Of
    > > > >Dave Sheehy
    > > > >Sent: Friday, November 02, 2001 4:19 PM
    > > > >To: IETF IP SAN Reflector
    > > > >Subject: Re: iSCSI: current UNH Plugfest
    > > > >
    > > > >
    > > > >
    > > > >> 3. Can commands be sent out of order on the same connection?
    > > > >>
    > > > >>    The behavior of targets is clearly specified in Section
    2.2.2.3
    > > > on
    > > > >>    page 25 of draft 8, which says:
    > > > >>      "Except for the commands marked for immediate delivery
    the
    > > > iSCSI
    > > > >>      target layer MUST eliver the commands for execution in
    the
    > > > order
    > > > >>      specified by CmdSN."
    > > > >>
    > > > >>    Section 2.2.2.3 on page 26 of draft 8 also says:
    > > > >>      "- CmdSN - the current command Sequence Number advanced
    by 1
    > > > on
    > > > >>      each command shipped except for commands marked for
    immediate
    > > > >>      delivery."
    > > > >>    but the meaning of the term "shipped" is vague, and does
    not
    > > > >> necessarily
    > > > >>    require that the PDUs arrive on the other end of a TCP
    > > > connection
    > > > >>    in the same order that the CmdSN values were assigned to
    these
    > > > PDUs.
    > > > >>
    > > > >>    Some initiators have been designed to send commands out of
    CmdSN
    > > > >>    order on one connection.  Consider the situation where
    there
    is
    > > > only
    > > > >>    one connection and a high-level dispatcher creates a PDU
    for a
    > > > SCSI
    > > > >>    command that involves writing immediate data to the
    target.
    > > > This PDU
    > > > >>    is enqueued to a lower-level layer which has to setup,
    start,
    > > > and
    > > > >>    wait-for a DMA operation to move the immediate data into
    an
    > > > onboard
    > > > >>    buffer before the PDU can be put onto the wire.  While
    this is
    > > > >>    happening, the dispatcher creates another unrelated PDU
    for a
    > > > SCSI
    > > > >>    read command (for example), and when this PDU is passed to
    the
    > > > >>    lower-level layer it can be sent immediately, ahead of the
    > > > previous
    > > > >>    write PDU and therefore out of order on this connection.
    > > > >>
    > > > >>    The standard clearly allows this to happen if the two PDUs
    were
    > > > sent
    > > > >>    on different connections, and seems to imply that this can
    also
    > > > happen
    > > > >>    when the two PDUs are sent on the same connection.
    > > > >>
    > > > >>    The suggestion is to put in the standard an explicit
    statement
    > > > that
    > > > >>    this is allowed or not allowed, as appropriate.
    > > > >>
    > > > >>    If this is allowed, such a statement would avoid the
    erroneous
    > > > >>    assumption being made by some target implementers that
    within
    a
    > > > single
    > > > >>    connection, commands will arrive in order.
    > > > >>
    > > > >>    If this is not allowed, such a statement would avoid the
    > > > erroneous
    > > > >>    assumption being made by some initiator implementers that
    within
    > > > a
    > > > >>    single connection, commands can be put on the wire out of
    order.
    > > > >>
    > > > >> +++
    > > > >>
    > > > >> will add an explicit statement saying that this behaviour is
    > > > forbidden.
    > > > >> 2.2.2.1 will contain:
    > > > >>
    > > > >> On any given connection, the iSCSI initiator MUST send the
    > > > >commands in the
    > > > >> order specified by CmdSN.
    > > > >>
    > > > >> +++
    > > > >
    > > > >Why do you feel this behavior should be forbidden? Targets
    already
    > > > have to
    > > > >order commands across the session. I don't see why it's a
    problem
    to
    > > > extend
    > > > >that to the connection as well. I, for one, believe we should
    take
    > > > >a liberal
    > > > >stance on this.
    > > > >
    > > > >Dave Sheehy
    > > > >
    >
    > --
    > ##################################
    > Santosh Rao
    > Software Design Engineer,
    > HP-UX iSCSI Driver Team,
    > Hewlett Packard, Cupertino.
    > email : santoshr@cup.hp.com
    > Phone : 408-447-3751
    > ##################################
    
    
    
    


Home

Last updated: Wed Nov 07 19:17:32 2001
7629 messages in chronological order