|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: Out of order commands
Ron,
Targets will most likely advertise a total (command and data) window
larger than they can accommodate on any long haul link. With the current
ordering rules nothing bad will happen.
And both initiator and target will gain. I did not see a SINGLE reason
(performance, memory etc.) to remove this ordering requirement.
And the ordering requirement is not global it is per connection.
It says that if you send on c1,c2,c3,c4.c5 with c1 & c3 going on
connection 1 you should not send c3 before c1 on connection 1 but you can
ship c3 before c2 if c2 goes on connection 2. Many other schemes like
some of the recovery, task abort connection cleanup are all based on
ordering being preserved within the connection.
The whole discussion thread seems also related to some perceived gain from
relaxing this restriction - but no one was able to show a single scenario
showing a real gain.
This type of source and sink ordering is a common requirement in most
distributed systems.
Julo
Ron Grinfeld <Rong@siliquent.com>
08-11-01 11:28
Please respond to Ron Grinfeld
To: Julian Satran/Haifa/IBM@IBMIL
cc:
Subject: RE: iSCSI: Out of order commands
Julian,
Can you clarify the deadlock scenario a little bit more (taking into
account
that a target will not advertise a command window larger than the number
of
commands it can support) ?
Rong
-----Original Message-----
From: Julian Satran [mailto:Julian_Satran@il.ibm.com]
Sent: Thursday, November 08, 2001 8:02 AM
To: ips@ece.cmu.edu
Subject: RE: iSCSI: Out of order commands
Robert,
I am not saying that handling OOO commands will create more complexity
(targets already do that over several connections and it does
not matter
for them). However allowing initiators to ship them out of
order creates a
potential deadlock that does not appear otherwise.
If a target is missing a command in a queue (and there are no
errors) the
this command is bound to be first on some connection under the
current set
of rules.
If we allow OOO shipping then the missing command can be somewhere
"inside" the window on some connection and if the target has
just filled
his queue and has room in the staging buffer only for the command it is
waiting for and that command happens to be the first to pass to
SCSI you
have a deadlock.
Julo
"Robert D. Russell" <rdr@mars.iol.unh.edu>
07-11-01 23:13
Please respond to "Robert D. Russell"
To: Somesh Gupta <somesh_gupta@silverbacksystems.com>
cc: Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
Subject: RE: iSCSI: Out of order commands
Somesh, Julian:
You state that dealing with OOO commands on the target
will add substantial complexity on the target.
Do you have any basis for that claim? My impression from the
plugfest is that most targets are already dealing with
it. Perhaps we need to hear from someone who is actually
building a target for which this would be a real problem.
If anything, what we are hearing from people who really
are building initiators is that dealing with the requirement
to send commands in order will introduce substantial complexity
on the initiator.
So why should we be saving complexity on (hypothetically) simple
targets yet requiring complexity on real initiators?
As far as the deadlock issue is concerned, if the only way
that deadlock can occur with OOO commands on the same
connection is during the use of immediate data (which is I
think what Julian was saying), then shouldn't we change
the standard to just say that -- if the initiator sends
commands out of order on a single connection, then immediate
data MUST NOT be used on that connection in order to avoid deadlock.
This gives everybody what they want, since initiators who find
it beneficial to deliver commands OOO will just negotiate
immediate data off. Those who really want to use immediate data
will have to ensure that commands are sent in order.
The tradeoff then becomes an implementation issue, not a
standards issue, which is where it belongs.
Bob Russell
InterOperability Lab
University of New Hampshire
rdr@iol.unh.edu
603-862-3774
On Wed, 7 Nov 2001, Somesh Gupta wrote:
> I think we should either have it as a MUST or not require
> it (at both ends to get the real benefit). SHOULD is one
> of those things that leads to implementation
> burden and confusion, without perhaps the feature being
> used. There are implementation as well as protocol
> considerations mixed in here.
>
> If we are to remove the restriction, we should (SHOULD)
> get the maximum benefit from it, rather than to
> accomodate an implementation choice. Out of sequence
> commands, combined with the possibility of digest errors,
> will add substantial complexity on the target side,
> without corrosponding benefit in performance. If we change
> this to SHOULD, we should also relax the requirement
> to present commands on the target side to a SHOULD.
>
>
>
> > -----Original Message-----
> > From: owner-ips@ece.cmu.edu
[mailto:owner-ips@ece.cmu.edu]On Behalf Of
> > Julian Satran
> > Sent: Wednesday, November 07, 2001 10:00 AM
> > To: ips@ece.cmu.edu
> > Subject: Re: iSCSI: Out of order commands
> >
> >
> > Mallikarjun,
> >
> > I did not see a SINGLE performance improvement that results from OOO
> > shipping.
> > I would be bad engineering to give away the "no-deadlock"
mechanism we
> > have now for nothing.
> > I have also the impression that the point about deadlock that I keep
> > repeating is ignored or not understood.
> > As we stand today commands can be shipped with Immediate data or
without
> > and an implementer determined
> > to squeeze maximum bandwidth and overlap command start with
delivery
will
> > choose not to work with immediate data
> > (as you have pointed out) while a low performance software
implementation
> > will use immediate data to minimize CPU cycles consumed.
However both
> > will be guaranteed to work without deadlock as source and
sink use the
> > same ordering.
> > Recovery is still a low probability event and should be
handled with a
> > different set of considerations in mind.
> > As for the strictness of the recommendation - yes we could settle on
> > SHOULD.
> >
> > Julo
> >
> >
> >
> >
> > "Mallikarjun C." <cbm@rose.hp.com>
> > Sent by: owner-ips@ece.cmu.edu
> > 07-11-01 19:41
> > Please respond to cbm
> >
> >
> > To: Santosh Rao <santoshr@cup.hp.com>, ips@ece.cmu.edu
> > cc:
> > Subject: Re: iSCSI: Out of order commands
> >
> >
> >
> > Santosh,
> >
> > I have only one comment on your responses.
> >
> > > Even a single connection target *MUST* implement a scoreboard. The
> > > reason being that it can see out-of-order arrival of
commands due to
> > > commands being dropped on digest errors. In such a case, it must
block
> > > further command processing until holes are filled.
> >
> > I made two convenient assumptions if you noticed, :-), one of which
> > is that target forces session recovery on *any* error that it sees
> > (ErrorRecoveryLevel=0) - including a dropped command due to a digest
> > error. With that assumption, a target can afford not to implement
> > a scoreboard.
> >
> > As I said in a private note, I guess what primarily bothers me about
> > OOO commands on a connection is that it requires the receiver to
> > undo this "optimization" on its end - most notably on a single
> > connection. TCP experts may comment on how/if they dealt with a
> > similar issue.
> >
> > OTOH, you had some valid comments on exceptions to ordering during
> > connection recovery. Perhaps we can move on by making Julian's
> > proposed stipulation a SHOULD....
> > --
> > Mallikarjun
> >
> >
> > Mallikarjun Chadalapaka
> > Networked Storage Architecture
> > Network Storage Solutions Organization
> > MS 5668 Hewlett-Packard, Roseville.
> > cbm@rose.hp.com
> >
> >
> > Santosh Rao wrote:
> > >
> > > Mallikarjun,
> > >
> > > Some comments below.
> > >
> > > Regards,
> > > Santosh
> > >
> > > "Mallikarjun C." wrote:
> > > >
> > > > Rod and Julian,
> > > >
> > > > This has been an interesting thread of discussion. Some
> > > > comments -
> > > >
> > > > 1.My first reaction was - allowing out-of-order command
> > > > transmission on the same connection deprives targets of
> > > > an implementation choice. Targets which support only
> > > > single-connection sessions and only support session
> > > > recovery (reasonable assumptions in my mind) can no
> > > > longer afford *not to* implement a command scoreboard.
> > >
> > > Even a single connection target *MUST* implement a scoreboard. The
> > > reason being that it can see out-of-order arrival of
commands due to
> > > commands being dropped on digest errors. In such a case, it must
block
> > > further command processing until holes are filled.
> > >
> > > Thus, there is no getting away from implementing a
sequencer at the
> > > target. Given this, I think it is unreasonable to
restrict initiator
> > > implementation flexibility by imposing a strict ordering
requirement
> > > within the connection.
> > >
> > > > 2.Any end-node efficiency that is sought to be achieved
> > > > by transmitting CmdSNs out-of-order from the initiator
> > > > would be lost on the other end-node, since the target
> > > > now must wait for re-ordering the commands.
> > >
> > > It has to handle this situation anyway to deal with holes
caused by
> > > digest errors. This scenario occurs even with initiators
that issue
> > > commands in order.
> > >
> > > >
> > > > 3.The flipside is that out-of-order transmission saves
> > > > link badwidth (albeit at the expense of end-node efficiency),
> > > > compared to idling the link waiting for outbound DMA.
> > > > We have to determine if this is a reasonable trade-off.
> > > >
> > > > 4.I can see Rod's point that prefetching all immediate
> > > > data can be a burden on the NIC resources. But, two
> > > > questions -
> > > > - could the NIC not use unsolicited separate data
> > > > PDUs in these cases? [ I realize that InitialR2T
> > > > has to be "no" to let it happen... ]
> > > > - could the NIC have a memory architecture that
> > > > allows data prefetching for the next command (so
> > > > this is a non-issue from the protocol perspective)?
> > > > This scheme incurs one DMA delay for every new
> > > > burst of commands.
> > > >
> > > > 5.Another (perhaps radical at this point) option is to do
> > > > away with immediate unsolicited data, to stick only with
> > > > separate unsolicited data. I would personally be okay
> > > > with the choice, particularly if this feature (that
> > > > helps software implementations) starts making hardware
> > > > design complicated/expensive.
> > > >
> > > > So, to summarize -
> > > >
> > > > option immediate allow
> > > > data in spec? out-of-order?
> > > >
> > > > (A) (5) above no no
> > > > (B) No real reason to do this. no yes
> > > > (C) (4) above yes no
> > > > (D) pros & cons (1), (2) & (3) yes yes
> > > >
> > > > >From the arguments I heard so far, I am leaning towards
> > > > option A, and option C in that order.
> > > >
> > > > Comments?
> > > > --
> > > > Mallikarjun
> > > >
> > > > Mallikarjun Chadalapaka
> > > > Networked Storage Architecture
> > > > Network Storage Solutions Organization
> > > > MS 5668 Hewlett-Packard, Roseville.
> > > > cbm@rose.hp.com
> > > >
> > > > Rod Harrison wrote:
> > > > >
> > > > > Julian,
> > > > >
> > > > > I don't understand what you are proposing
here, what do
you
> > mean by
> > > > > "multiplexed" DMA?
> > > > >
> > > > > The problem is that the DMAs take some time, the more
there
> > are
> > > > > queued the longer the last DMAs queued take to complete. Some
> > commands
> > > > > require DMAs to complete before they can be sent, i.e. Writes
with
> > > > > immediate data, some commands do not, i.e. Reads and
writes with
no
> > > > > immediate data. The iSCSI HBA wants to be able to
send commands
as
> > > > > soon a possible, which for a read after a write can be before
the
> > > > > write's DMA has completed. Maintaining an ordered queue for
commands
> > > > > to be sent on the HBA is expensive and redundant since the
target
> > > > > already knows how to queue commands before committing them to
its
> > SCSI
> > > > > layer.
> > > > >
> > > > > The iSCSI HBA and its host driver are not at
liberty to
> > change the
> > > > > order of commands from the OS, but the DMAs those
commands need
are
> > > > > unlikely to complete in the same order, and as I
mentioned some
> > > > > commands need no DMA. If the HBA can't send commands out of
CmdSN
> > > > > order it has to maintain an ordered queue of commands
waiting to
be
> > > > > sent, and potentially buffer a lot of data. For an HBA this
makes
> > > > > immediate data almost impossible to support.
> > > > >
> > > > > I don't see the problem with allowing out of order
commands
> > given
> > > > > that the target already has to deal with very similar
problems.
I
> > > > > think we are getting in to the area of implementation choices
here,
> > > > > which is inappropriate for a specification.
> > > > >
> > > > > - Rod
> > > > >
> > > > > -----Original Message-----
> > > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
Behalf
> > Of
> > > > > Julian Satran
> > > > > Sent: Monday, November 05, 2001 10:06 PM
> > > > > To: ips@ece.cmu.edu
> > > > > Subject: Re: iSCSI: Out of order commands, was current UNH
Plugfest
> > > > >
> > > > > Rod,
> > > > >
> > > > > I don't see any reason why DMA operations cant be
"multiplexed"
with
> > > > > commands.
> > > > > If you have scheduled a long outbound DMA you are doomed
regardless
> > of
> > > > > the
> > > > > command ordering.
> > > > > And if you have scheduled DMA operations piecemeal
then you can
> > insert
> > > > > your commands in correct order.
> > > > >
> > > > > Julo
> > > > >
> > > > > "Rod Harrison" <rod.harrison@windriver.com>
> > > > > 05-11-01 20:48
> > > > > Please respond to "Rod Harrison"
> > > > >
> > > > > To: Julian Satran/Haifa/IBM@IBMIL,
<ips@ece.cmu.edu>
> > > > > cc:
> > > > > Subject: iSCSI: Out of order commands, was
current
> > UNH
> > > > > Plugfest
> > > > >
> > > > > [ Subject changed ]
> > > > >
> > > > > Julian,
> > > > >
> > > > > The ordering difference is
introduced between
the
> > > > > host
> > > > > side driver
> > > > > and the iSCSI HBA. The host side driver must present SCSI
commands
> > to
> > > > > the HBA in the order they are received from the OS to prevent
read
> > > > > after write dependency failures. The HBA might reorder the
commands
> > > > > depending on when DMA completes. The reordering can't be done
ahead
> > of
> > > > > time in the host driver since it doesn't know how
long each DMA
> > might
> > > > > take. As long as the HBA assigns CmdSN in the order
it receives
> > > > > commands the desired host ordering is preserved.
> > > > >
> > > > > - Rod
> > > > >
> > > > > -----Original Message-----
> > > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
Behalf
> > Of
> > > > > Julian Satran
> > > > > Sent: Monday, November 05, 2001 12:35 AM
> > > > > To: ips@ece.cmu.edu
> > > > > Subject: RE: iSCSI: current UNH Plugfest
> > > > >
> > > > > Rod,
> > > > >
> > > > > I all examples give the point I find hard to
understand is why
is
> > the
> > > > > ordering on the wire different from the presentation order to
the
> > > > > initiator. You can get as many overlaps as you want by
presenting
> > the
> > > > > commands to the initiator in the desired order.
> > > > > What we are considering here is the case in which you want to
ship
> > in
> > > > > an
> > > > > order different than the one you present the commands.
> > > > >
> > > > > Julo
> > > > >
> > > > > "Rod Harrison" <rod.harrison@windriver.com>
> > > > > Sent by: owner-ips@ece.cmu.edu
> > > > > 04-11-01 04:42
> > > > > Please respond to "Rod Harrison"
> > > > >
> > > > > To: "Barry Reinhold"
<bbrtrebia@mediaone.net>, "Dave
> > > > > Sheehy"
> > > > > <dbs@acropora.rose.agilent.com>, "IETF IP SAN Reflector"
> > > > > <ips@ece.cmu.edu>
> > > > > cc:
> > > > > Subject: RE: iSCSI: current UNH Plugfest
> > > > >
> > > > > Barry,
> > > > >
> > > > > In general I agree but I don't think
this is as
> > much
> > > > > of a
> > > > > corner case
> > > > > as it at first appears. Targets will have code very
similar to
that
> > > > > needed to handle out of order commands to deal with digest
errors.
> > > > > Targets also need to queue commands whilst waiting for both
> > solicited
> > > > > and unsolicited data to arrive. Queuing out of order commands
seems
> > > > > little extra work.
> > > > >
> > > > > From an initiators point of view there are
> > > > > efficiency,
> > > > > and probably
> > > > > performance gains to be had from sending commands out
of order.
Bob
> > > > > Russell gave the example of a read being sent whilst
write data
DMA
> > is
> > > > > happening, and a similar situation can arise with DMA
for writes
> > > > > overtaking that of earlier writes if the initiator
has multiple
DMA
> > > > > engines. In this case the initiator might be forced
to let the
wire
> > go
> > > > > idle if it can't send the data from completed DMAs as soon as
> > > > > possible.
> > > > >
> > > > > We already have a command queue at
the target
to
> > > > > enforce
> > > > > correct
> > > > > serialisation of commands, doing the same thing at
the initiator
is
> > > > > redundant.
> > > > >
> > > > > Finally, I don't believe we should
be writing a
> > > > > standard
> > > > > to work
> > > > > around poor coding and test coverage, especially at
the cost of
> > > > > potential efficiency gains.
> > > > >
> > > > > I agree with Dave and Santosh that commands
being
> > > > > sent
> > > > > out of order
> > > > > on a single session should be allowed by the standard.
> > > > >
> > > > > - Rod
> > > > >
> > > > > -----Original Message-----
> > > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
Behalf
> > Of
> > > > > Barry Reinhold
> > > > > Sent: Friday, November 02, 2001 5:24 PM
> > > > > To: Dave Sheehy; IETF IP SAN Reflector
> > > > > Subject: RE: iSCSI: current UNH Plugfest
> > > > >
> > > > > Using features such as out of order command delivery on a
connection
> > > > > tend to
> > > > > be the sort of things that lead to interoperability
problems. It
is
> > > > > unexpected and probably going to hit poorly tested code paths
even
> > if
> > > > > the
> > > > > standard is written to allow it.
> > > > >
> > > > > >-----Original Message-----
> > > > > >From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
Behalf
> > > > > Of
> > > > > >Dave Sheehy
> > > > > >Sent: Friday, November 02, 2001 4:19 PM
> > > > > >To: IETF IP SAN Reflector
> > > > > >Subject: Re: iSCSI: current UNH Plugfest
> > > > > >
> > > > > >
> > > > > >
> > > > > >> 3. Can commands be sent out of order on the same
connection?
> > > > > >>
> > > > > >> The behavior of targets is clearly specified in Section
> > 2.2.2.3
> > > > > on
> > > > > >> page 25 of draft 8, which says:
> > > > > >> "Except for the commands marked for immediate
delivery
the
> > > > > iSCSI
> > > > > >> target layer MUST eliver the commands for
execution in
the
> > > > > order
> > > > > >> specified by CmdSN."
> > > > > >>
> > > > > >> Section 2.2.2.3 on page 26 of draft 8 also says:
> > > > > >> "- CmdSN - the current command Sequence
Number advanced
by 1
> > > > > on
> > > > > >> each command shipped except for commands marked for
> > immediate
> > > > > >> delivery."
> > > > > >> but the meaning of the term "shipped" is vague,
and does
not
> > > > > >> necessarily
> > > > > >> require that the PDUs arrive on the other end of a TCP
> > > > > connection
> > > > > >> in the same order that the CmdSN values were
assigned to
these
> > > > > PDUs.
> > > > > >>
> > > > > >> Some initiators have been designed to send
commands out of
> > CmdSN
> > > > > >> order on one connection. Consider the situation where
there
> > is
> > > > > only
> > > > > >> one connection and a high-level dispatcher
creates a PDU
for a
> > > > > SCSI
> > > > > >> command that involves writing immediate data to the
target.
> > > > > This PDU
> > > > > >> is enqueued to a lower-level layer which has to setup,
start,
> > > > > and
> > > > > >> wait-for a DMA operation to move the immediate
data into
an
> > > > > onboard
> > > > > >> buffer before the PDU can be put onto the wire. While
this is
> > > > > >> happening, the dispatcher creates another unrelated PDU
for a
> > > > > SCSI
> > > > > >> read command (for example), and when this PDU
is passed to
the
> > > > > >> lower-level layer it can be sent immediately,
ahead of the
> > > > > previous
> > > > > >> write PDU and therefore out of order on this connection.
> > > > > >>
> > > > > >> The standard clearly allows this to happen if
the two PDUs
> > were
> > > > > sent
> > > > > >> on different connections, and seems to imply
that this can
> > also
> > > > > happen
> > > > > >> when the two PDUs are sent on the same connection.
> > > > > >>
> > > > > >> The suggestion is to put in the standard an explicit
statement
> > > > > that
> > > > > >> this is allowed or not allowed, as appropriate.
> > > > > >>
> > > > > >> If this is allowed, such a statement would avoid the
erroneous
> > > > > >> assumption being made by some target implementers that
within
> > a
> > > > > single
> > > > > >> connection, commands will arrive in order.
> > > > > >>
> > > > > >> If this is not allowed, such a statement would avoid the
> > > > > erroneous
> > > > > >> assumption being made by some initiator
implementers that
> > within
> > > > > a
> > > > > >> single connection, commands can be put on the
wire out of
> > order.
> > > > > >>
> > > > > >> +++
> > > > > >>
> > > > > >> will add an explicit statement saying that this
behaviour is
> > > > > forbidden.
> > > > > >> 2.2.2.1 will contain:
> > > > > >>
> > > > > >> On any given connection, the iSCSI initiator MUST send the
> > > > > >commands in the
> > > > > >> order specified by CmdSN.
> > > > > >>
> > > > > >> +++
> > > > > >
> > > > > >Why do you feel this behavior should be forbidden? Targets
already
> > > > > have to
> > > > > >order commands across the session. I don't see why it's a
problem
> > to
> > > > > extend
> > > > > >that to the connection as well. I, for one, believe
we should
take
> > > > > >a liberal
> > > > > >stance on this.
> > > > > >
> > > > > >Dave Sheehy
> > > > > >
> > >
> > > --
> > > ##################################
> > > Santosh Rao
> > > Software Design Engineer,
> > > HP-UX iSCSI Driver Team,
> > > Hewlett Packard, Cupertino.
> > > email : santoshr@cup.hp.com
> > > Phone : 408-447-3751
> > > ##################################
> >
> >
> >
> >
>
Home Last updated: Thu Nov 08 22:17:33 2001 7678 messages in chronological order |