Re: Notes of 06/21 meeting

To: ips@ece.cmu.edu
Subject: Re: Notes of 06/21 meeting
From: Michael Krause <krause@cup.hp.com>
Date: Thu, 29 Jun 2000 07:23:06 -0700
Content-Type: text/plain; charset="us-ascii"; format=flowed
In-Reply-To: <Pine.GSO.4.10.10006290000270.2987-100000@csapuntz-u1.cisco.com>
Sender: owner-ips@ece.cmu.edu

At 12:04 AM 6/29/00 -0700, Costa Sapuntzakis wrote:

>It was pointed out that the command reference number
>as spec'ed was not long-lived enough to provide error
>recovery. However, the task tag could be used to do
>error recovery.

This is why it should be at least a 32-bit value and possibly a 48-bit 
value.  It should also be done on all commands to simplify the problems 
later described, i.e. multiple TCP op support, the ability to deal with 
overflows (only receive what is within the window of support ops and let a 
SACK-like error recovery deal anything lost), simplifies the hardware 
(always present and can be used to retain ordering without much overhead), 
simplifies mirroring since one can immediately forward the ops in the order 
the initiator wanted without stalls, etc.

>--------------
>
>There was then a discussion about whether the command
>reference number should be per LU or per session.

Large value per session and then it does not matter.

>There was a lot of talk about whether we want
>to support multiple TCP connections/session.
>
>John Hufferd pointed out that SCSI load balancers already exist
>that take advantage of multiple sessions (multiple SCSI busses)
>to stripe commands to a target. He argued that multiple
>TCP connections are unnecessary. He also argued that no applications
>make effective use of SCSI ORDERED attribute, because the
>interface are not there.

Very simple implementation can be built with multiple TCP connections / per 
session.  With the command reference numbers always sent on operation, the 
start / stop problem is mitigated because one is receiving / processing the 
operations in the order they were received.  In addition, one can develop 
the hooks that separate specs provide for arbitration policies, QoS, etc. 
to deal with different link bandwidth / etc. attributes.

>However, they have to stop and wait for ordered commands.
>One application where stop and wait hurts is tape (where
>all writes are ordered), so some tape applications write
>self-describing blocks to tape which can be written in any order.
>
>Remote asynchronous mirroring can also be done with ordered
>writes. Hufferd argued that remote asynchronous mirroring must
>be solved at a higher layer and is being solved today.

Not that difficult to do with what I described above.

>Most of those arguing for multiple TCP connection said that
>     - it isn't that hard
>     - it would make iSCSI better than other SCSI transports
>     - it would make high-perf apps easier to write

Add in
   - Multi-path support is much easier to implement.
   - Higher performance can be achieved
   - Implementations are fairly simple - minimal state
   - Application transparent ability to take advantage / recover from 
hot-plug / removal of fabric components

>-------------
>Deadlock:
>
>Luciano pointed out that it is possible to run out of
>buffers and deadlock with multiple TCP connections.
>
>The source of the problem is
>         1) receive too many out-of-order commands
>         2) receiving too much unsolicited (immediate)
>            data
>
>The solution to 1) is to either
>    - limit the number of out-of-order commands that
>      are read from each TCP pipe to 1 (requires NIC
>      to know that command is out-of-order) and then
>      stop reading from the connections (deskewing)
>    - have a windowing mechanism on the command
>      ordering queue in target
>    - have a separate TCP pipe for emergency
>      recovery commands
>    - Nuspeed aborts command with SCSI status TASK QUEUE FULL
>
>The consensus seems to have resulted in windowing
>being adopted.

The NIC does not have to track this per se.  If the NIC has the SGL for the 
target buffer it can perform the DMA.  If the SGL does not exist, then it 
can drop the message without issuing a TCP ACK (Issue is whether one wants 
to slow this down at the TCP level or allow it to complete but have the NIC 
still drop the buffers w.r.t. the DMA targeting - preference is to complete 
from TCP point of view but drop the DMA operation).  The operation target 
and buffers are locally posted so the rate can be controlled quite easily.

The windowing proposal will work well as a control point for SGL posting to 
individual commands - again with minimal if any complexity.  If the command 
reference number is always present, life can be further simplified.

>The consensus solution to 2) was to allow the
>target to drop immediate data and request it be
>retransmited via ready-to-transmit (RTT).
>
>--------------
>
>Should task management commands be ordered with respect to tasks?
>
>Those against feared that ordering task mangement commands
>would prevent their timely delivery.
>
>Those for feared that not ordering task management commands
>would lead to surprising behaviors (like ABORT TASK SET
>overtaking and not aborting all previously issued tasks).
>
>----------------
>
>Can a single iSCSI TCP connection use multiple paths in the network
>simultaneously?
>
>Answer: Most networks keep a flow on one path to help ensure
>minimal re-ordering, so no in that case. Of course, this being IP,
>people could design a network that sprays packets of a flow across
>multiple paths and it would still work...

Most of us would prefer to not have a single connection flow through 
different paths - the complexity to the hardware for what is nominally a 
rare event would be increased.  A well-behaved environment is possible to 
implement but then one is asking for IP to do this and creating additional 
specification work.

Mike

References:
- Notes of 06/21 meeting
  - From: Costa Sapuntzakis <csapuntz@cisco.com>

Prev by Date: Re: Proposed Connection Recovery Additions for Draft 03
Next by Date: Re: 16-bit CmdRN too small?
Prev by thread: Notes of 06/21 meeting
Next by thread: ordered command retrieval by LUN
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:08:12 2001
6315 messages in chronological order