Comments to Comments!

To: Jack Harwood <harwood_jack@emc.com>
Subject: Comments to Comments!
From: julian_satran@il.ibm.com
Date: Sun, 12 Mar 2000 19:39:30 +0200
cc: ips@ece.cmu.edu
Content-Disposition: inline
Content-type: text/plain; charset=us-ascii
Delivery-Date: Sun Mar 12 12:43:00 2000
Sender: owner-ips@ece.cmu.edu



Jack,

Thanks for your attention and detailed comments. I sincerely hope that we
could work all together to get to a better standard.

And here are our thoughts as expressed by several authors:


--- start forwarded message by harwood, jack ---
> From: "harwood, jack" <harwood_jack@emc.com>
> To: ips@ece.cmu.edu
> Subject: Comments on the current iSCSI draft
> Date: Fri, 10 Mar 2000 20:26:05 -0500
...
> Architectural
> -------------
> * There is an issue with the separation of the Control and Data Channel.
NAT
> (address translation), firewall, or load balancing products will not
support
> iSCSI without changes which in turn is a barrier to adoption for large
> networks.  If the goal is to provide interleaving of control commands
with
> large data transfers we feel this can be accomplished in other ways.
>    - Use smaller data frames to allow better interleaving of control and
>    data on a single connection
>    - Use multiple connections between the same source and destination
pair
>    where each connection is independent of other connections
>    (i.e., data/control are combined on each connection).
> Separation of control and data also adds new failure modes where one
channel
> closes but the other does not.


True, separating data from control introduces some new problems that could
be avoided if we interleave. We briefly considered such a design aiming at
one TCP connection per LUN.  But this is inordinately expensive.

If we multiplex LUNs (as we do in the current draft) keeping to a short TCP
frame will leave as open to all sorts of troubles (possible deadlocks) due
to the limited TCP window and our lack of control over the data source and
sink. Separating the control and data stream we could resort to selective
resets to get out of trouble - while with a common connection we might have
to resort to radical means (e.g., closing connections).

In addition in a "permissive" environment (like a video server) we might
require CRC on the control connection while leaving the data connections up
to the user.

It is a bit more difficult to implement but worth the trouble.

> * The use of DNS addressing in the protocol as described in sections
3.13,
> Open Data Connection, and section 3.17, Third Party Copy, will force all
> parties to depend on DNS in order for the protocol to work. While system
and
> network administrators should be free to make this choice (and invest the
> effort in making DNS suitably robust), this protocol design should NOT be
> based on the assumption that DNS is a robust highly available service.
The
> protocol should be based on IP addresses.

It is true that the system recommends using DNS.  However, the
administrator is free to choose names such as "123.45.67.89" and the
initiators and targets will interpret that as IPv4 (or IPv6) as necessary.

It was felt that we should be completely independent of IP addresses
because of firewall and IP masquerading issues with setting up new TCP
connections. IP addresses /can/ be used, but only in dotted decimal
notation.

Note that no addresses need be provided for simple systems, and all
key:value pairs can be safely ignored by the target.

> Conceptual
> ----------
> * The iSCSI protocol requires a strong authentication mechanism. In its
> current form, without an implementation and corresponding specification,
it
> is impossible to write an interoperable authentication implementation
from
> the document as it stands, hence at least one strong authentication
> mechanism must be mapped onto the protocol, possibly in a separate
document
> or documents.

Correct.  We decided to make a flexible framework for authentication,
rather than specify a particular method.  Specific authentication schemes
could be described in other documents.

We briefly considered (and are not outright rejecting) other schemes - most
notably the one used in SST (SCSI over ST) in which in fact the connection
can go through 3 stages - Idle - Authenticating - Active. 1 bit in the
login indicates if the authentication is required and gets the state
machines in either the Authenticating stage or the Active stage. The
standard does not address how you go from authenticating to Active.
This design enables non-authenticating machines to interoperate and leaves
open the whole authentication process to other standards. We felt that we
have to have a minimal authentication specifies at least to avoid "good
faith" mistakes but we are open to discuss this in the working group at
some length.

> * The parameter negotiation, described in sections 3.9-12, is very
general.
> The free-form text/value format will cost code to parse and may not be
> justified.

We designed the system so that any non-responses to TEXT commands are
considered as not supported.  On targets or initiators where text:value is
too complex, a set of defaults should be chosen and no TEXT commands
supported.  For targets, the MODE SELECT can set SCSI-like things.  The
TEXT command covers Network-like things.

> * The action of killing all outstanding IOs on a login or operation
timeout
> seems too severe for this process and provides an opening for a denial of
> service attack.  Also there is no other rationale in the document as to
why
> this semantic is useful.

I assume you are referring to what is written in the section on Error
Handling (section 4.0).
Denial of service is a problem inherent in all IP based
protocols, and we cannot completely solve it.
The initiator can wait a long time before it determines that it has timed
out.
TCP ensures ordered delivery as long at there is a connection. What other
alternative is there other than to completely clean up, once it has been
decided that we have a connection problem?


> * A general mapping of error recovery for iSCSI is needed, i.e. what
parts
> need definition versus what will use TCP error recovery mechanisms.

Did you have a particular situation in mind that iSCSI does not cover?

> * In section 3.17, Third party copy needs a much better explanation about
> authentication, login and how the entire process works.

Again, this is a framework.  When devices start offering third party
commands that go beyond the provisions of iSCSI, we will extend it.
We know about and we think we covered the extended copy commands considered
by the SCSI working group.

> Specifics
> ---------
> * It should be stated specifically in sections 2.4 and 3.8 that iSCSI
data
> segments cannot overlap.

We agree that the iSCSI should state that data segments should not overlap
(and will do this in the next version). However we would be reluctant to
require that receiver implementations check for this type of error and
report it in the status. Is this acceptable?


> * The expected data length and flags, i.e. command direction, should be
> described in the SCB itself and not as separate fields in the SCSI
command,
> see section 3.3.

As stated by SAM the SCB contains only the number of data blocks not the
transfer length. SAM also mandates that the "execution request" include the
data length and CAM (as well as other standard software interfaces) require
a residual count report with reference to the length. It make all the
implementations "more compliant" to include the length.
For all hardware bridge providers it makes also more sense to have the
length and direction in a "common" header than to scan SCBs.



> * Using the task tag and TCP connection 4-tuple (source and destination
IP
> addresses and ports) we should have a fully qualified identifier and
should
> not need LUN number in the response and task management response, see
> section 3.3 and 3.6.

You are right - it was so many times on and off! It ended up being there to
make all controls "target-to-initiator" identical. The last reasoning
behind getting it in was a "proxy LUN" - i.e. the work was done by a "third
party". If the returned LUN disagrees with the transmitted LUN
then it may mean that a proxy satisfied the request.  However, we have not
specified what action should be taken and I cannot at present think of
anything useful to do with any proxy-LUN information. We (the working
group, including you hopefully, in its infinite wisdom!) might decide to
remove it.

> * The LUN number should be embedded in the data for the AEN, see section
> 3.4.

We do not specify what goes in the data that is sent in an Asynchronous
Error Notification. I think SAM-2 requires LUN to be specified (as a
parameter). We want to be independent of whatever data is packaged, and we
therefore have to specify the LUN in the header.

> * In section 5.1 a recommendation is made to use 8k as the upper limit
for
> small TCP segments.  Depending on the MTU size this recommendation may
cause
> fragmentation.  More detail and analysis are needed to justify this
> recommendation.

8k is an upper limit. If MTU size is smaller, then a smaller data size
should be used, as implied by the note to the implementer. 8k is also an
upper limit for good CRC algorithms (perhaps 8k is too big for this also).
We welcome a more detailed analysis to provide a better recommendation.

> * A standard CRC should be required, see section 6.1.

A agree that a good CRC is a thing to have. I think that a TCP-CRC should
be mandated for the control channel. This should be set when
opening the TCP connection for the control channel. There are cases where
CRC is not desirable for the data connection, as when transferring
transient voice or video . Hence there ought to be some kind of negotiation
as to whether CRC will be used for the data channel (like a parameter for
open). Let's talk some more about it.

> * The target should not gets its name from the initiator, see section
10.1.

The target can ignore any key:value pairs sent by the initiator, so it need
not receive its name from the initiator. This feature is useful in case the
target is actually a front end for many machines and/or disks, in which
case the initiator can specify to which target it really wants to interact
with.

> * Section 10.3 needs to provide details on how to prevent reply/reuse.
Also
> this text seems to allow passwords in the clear which is not acceptable.


The example given is conceptual. You can use encryption if you the
initiator and target can agree on it, or if it automatically provided by
the TCP layer. But we are ready to work some more on it.

> * In section 10.5 it states "Once AllowNoRTT has been set to 'yes', it
> cannot be set back to no".  It should clarify this is for the open
> connection and closing this connection and opening a new connection will
> clear this condition.

This was the intention. We will clarify.

> Questions
> --------
> * What value does the ability to do an iSCSI ping add to the existing
> ability to do an ICMP ECHO?  If little or none, this should be omitted,
see
> section 3.15.

This is very valuable.  First, ICMP may be blocked by a firewall.  Second,
it is very useful to test certain pathological data sets over particular
networks.  Third, when a TCP link is not being used, no data is sent.  This
makes it almost impossible to detect if the connection has been broken.
Having a ping command allows the TCP connection to be tested periodically.
And it tests more than just the TCP/IP stack - a valuable add-on in many
settings.

> TCP-RDMA
> --------
> Although the premise of TCP acceleration is quite useful the concept of
RDMA
> does not apply for our application of internet SCSI.  We will handle the
> moving of data as implementation specific and not as generic design such
as
> RDMA.

As they say - we all leave in free world... I would agree that you have a
strong case for a controller but I am not that confident about a general
purpose host adapter - like a NIC card (not SCSI specific)

> --- end forwarded message by harwood, jack ---

Regards,
Julo

Julian Satran (on behalf of all my colleagues),
IBM Research at Haifa
Prev by Date: Re: Comments on the current iSCSI draft
Next by Date: RE: IPS Issues document
Prev by thread: Re: Comments on the current iSCSI draft
Next by thread: RE: Comments to Comments!
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:08:17 2001
6315 messages in chronological order