RE: Status summary on multiple connections

To: ips@ece.cmu.edu
Subject: RE: Status summary on multiple connections
From: David Robinson <David.Robinson@EBay.Sun.COM>
Date: Thu, 28 Sep 2000 11:40:57 -0700 (PDT)
Content-MD5: Mjj4K43tocck6Y8ZlmY6WQ==
Content-Type: TEXT/plain; charset=us-ascii
Reply-To: David Robinson <David.Robinson@EBay.Sun.COM>
Sender: owner-ips@ece.cmu.edu

------------- Begin Forwarded Message -------------

From: Robert Snively <rsnively@Brocade.COM>
To: "'David Robinson'" <David.Robinson@EBay.Sun.COM>, Robert Snively 
<rsnively@Brocade.COM>
Subject: RE: Status summary on multiple connections
Date: Thu, 28 Sep 2000 09:36:50 -0700

Dave,

You ask some interesting questions with non-short answers.  If you would
like, and if you think it useful, you may post this out to the ips reflector.

>  > The single connection alternative allows a simplistic ordering
>  > structure, a simple recovery mechanism, and does not require
>  > state sharing among multiple NICs.  It allows bandwidth aggregation
>  > across any set of boundaries that is required.  Because command
>  > queuing is the rule among high performance SCSI environments,
>  > latency appears only as an increment in host buffer requirements
>  > except during writes that perform a commit function.  
>  Those traditionally
>  > have been taken out of the performance path by using local
>  > non-volatile RAM to perform the commit functions, using slower
>  > high latency writes with less strict ordering requirements relative
>  > to reads to actually perform the write to media.
>  
>  Can you clarify something for me, in my previous questions on
>  flow control it was strongly indicated that the target must drain
>  the stream in order to allow commands to flow when the command queue
>  filled up.  You seem to indicate here that command queuing and the
>  flow control needed to handle overflow is being done at the
>  SCSI layer and not the transport.  Is it correct that it really
>  is a command level function and not a transport function? Without
>  considering the TCP window management, SCSI will cause 
>  command flow to
>  stop when the queue fills up?  If this is true then most of
>  the arguments for at least two connections are no relivent.  You
>  may need to still discard commands if the target over advertises its
>  total queue space but that seems to be more of an implementation
>  bug.

SCSI manages two independent sets of resources.  One is the resources
required to receive and process command states.  The other is the
resources required to buffer and process data to be transferred
as a result of processing the commands.

At the initiator, all resources for the execution of a command,
including both the command state resources and the explicitly specified
buffer area, are defined at the time the command is delivered to
the SCSI stack.  Those resources are locked down until the SCSI
command is finished, at which time the command state resources (by this
time a response packet) and control of the buffer are passed back
to the application client (user, driver, operating system, file system,
application program or whatever).

The beauty of SCSI is that all the transfer management is done by
the target (which knows exactly what is going on and exactly what is
needed), not by the initiator.

The target also has two sets of resources.  The command is received
into a command buffer (perhaps implemented as a large single buffer or
perhaps implemented as a large number of smaller buffers at each logical
unit).  The rules on the command queueing I explained before, but
basically all commands are posted into the same buffer from whatever
initiator they were received from with some kind of time/order stamp.
A well-behaved device is always capable of receiving at least one
simple or ordered queued command and one head of queue command for
each logical unit/initiator nexus that is supported by the device.
Present devices support from 16 to 64 initiators per logical unit.
Typically at least one additional slot is available for task management
functions.
The remaining locations for commands in the queue are dynamically
portioned out to whatever commands come in, regardless of initiator
or logical unit.  When there is no more dynamic space left and all the
pre-allocated locations for a particular ITL nexus are also full, 
the next command gets a queue full indication returned.  Because of
the dynamic assignment area, this will typically be rare in a properly
configured system.  The initiator then resends the command and all
subsequent commands after at least one command comes back completed,
indicating that at least one (and probably a whole stack more) slots
are again available.  Note that there is a possibility that commands
that are inflight and have ordering constraints may be accepted out of
order, a question that has caused lots of agonizing, but is apparently
reasonably well managed by most file systems today by the selective
use of ordering only for blocking boundaries of a particular logical
stream of commands.

The target then begins sorting commands for optimum execution order,
to exploit pre-buffered data, and to perform any coalescence of 
streaming operations to the device and begins to execute the commands
in ITS desired order, modified by the ordered queueing restrictions, if
any.  If data is required from the initiator, buffers
are set aside for the data in the target and the data (already locked
down in the initiator buffers) is requested from the initiator.
If data is to be sent to the initiator, it is assembled in the target
buffers and shipped off to the specified buffers in the initiator.
In large storage subsystems, this is typically going on for multiple
initiators and in both directions at the same time.
The initiator buffers are identified by the command context in the
initiator.  The command context is selected by the 
Initiator/Target/Logical unit/Queue Tag (ITLQ) nexus carried from
the target with the data or the data request.

As a result, SCSI, independent of transport (IEEE 1394, Parallel SCSI,
FC, and I hope iSCSI) has complete flow control at the initiator and
at the target with respect to all command and data transfers.

SCSI, being a storage protocol and having bursty traffic characteristics,
is traditionally configured such that over-subscription is of
short duration and has little effect on average latencies except for
very short periods.  Of course, underconfigured SCSI transports
will increase latency, but they will still be well-behaved in terms
of throughput and they will not block.  Depending on the particular 
transport implementation, they may be more or less well-behaved in 
terms of IT nexus fairness.  As an example, older parallel SCSI
implementations may exhibit higher throughput on high priority IT 
nexi than on low priority IT nexi.

However, if additional flow controls or congestion management
exist in the transport layer, they can interact in some pathological
ways with the basic SCSI function.  I believe it is possible that
such structures could create head of queue blocking or throttling
behaviors in the transport switches if those mechanisms are not
implemented properly.  Note that this is 100% outside the scope of
the SCSI behaviors.

I believe that the conclusions in your note are well founded.

------------- End Forwarded Message -------------

Follow-Ups:
- RE: Status summary on multiple connections -- iSCSI flow control
  - From: "Y P Cheng" <ycheng@advansys.com>

Prev by Date: RE: Why FCP doesn't need RDMA? It has a better way.
Next by Date: RE: ISCSI: draft-wakeley-iscsi-msgbndry-00.txt
Prev by thread: Re: Status summary on multiple connections
Next by thread: RE: Status summary on multiple connections -- iSCSI flow control
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:07:00 2001
6315 messages in chronological order