RE: Comments to Comments!

To: ips@ece.cmu.edu
Subject: RE: Comments to Comments!
From: Black_David@emc.com
Date: Thu, 23 Mar 2000 20:03:59 -0500
Content-Type: text/plain;charset="iso-8859-1"
Delivery-Date: Thu Mar 23 20:05:04 2000
Sender: owner-ips@ece.cmu.edu
My turn to reply to both Julian and Costa - apologies
for the delay involved.  I hope folks have a chance
to read this prior to Adelaide, and I apologize for
the fact that both the timing (last week of the
calendar quarter) and distance from here make it
impossible for me to attend the BOF in Adelaide.
This email includes input from EMC folks in addition
to yours truly.

-- DNS:

> Parties who wish to use IP addresses for extra reliability may encode the
> IP addresses in the target names (e.g. 10.0.4.5/dvd). After all, domain
> names can represent IP addresses. These type of domain names do not
> require DNS for resolution.

Keep in mind that the storage side of this is a closed box that may not be
fully configurable.  Allowing the storage to use hostnames may force the
initiator to perform host name resolution whether it wants to or not, and
v.v.
Host name resolution issues should be confined to hosts, with the storage
kept out of this.  This objection is primarily about Open Data Connection;
the use of DNS names may be defensible for third party copy.  Moving to
a combined data/control connection model would go a long way
towards resolving this.

> It was felt that we should be completely independent of IP addresses
> because of firewall and IP masquerading issues with setting up new TCP
> connections. IP addresses /can/ be used, but only in dotted decimal
> notation.

But this doesn't solve the fundamental problem with NAT/NAPT (of which IP
masquerade is an example) because there's no way of knowing whether
the namespaces and conventions for name resolution match on both sides
of the NAT/NAPT.  For example, if I pass "foo" as an identifier, it may
resolve
to foo.emc.com here and foo.eng.cisco.com there - FQDNs eliminate this
simple example but cause other problems because DNS need not be a
globally uniform namespace - in general one is now at the mercy of all
sorts of peculiar DNS configuration oddities.  Non-DNS resolution
mechanisms are not a magic cure - different YP/NIS domains and
out of sync /etc/hosts files are capable of causing problems.  OTOH,
the use of a combined control/data connection, and avoiding passing
endpoint addresses in the payload makes this problem vanish, except for
Third Party Copy, which is a much more involved story that probably
requires a discussion of ALGs and some serious SHOULD NOTs.

The notion of passing IP addresses as text strings reproduces one of the
most irritating (in 20/20 hindsight) design mistakes of ftp.  The problem is
that remapping an IP address may change the length of the text string,
causing all sorts of complications (e.g., what if the packet is at MTU and
the string gets longer?).  This won't work in a NAT/NAPT, and makes writing
an ALG to put in such a box unnecessarily painful.

> On the plus side, domain names decouple the iSCSI protocol from the
> underlying addressing architecture. The potential deployment of IPv6 will
> not require any changes to the iSCSI protocol.

Good Grief!  Costa can't be serious about this as a reason.  Designing
a variable length address field that accommodates IPv4 and IPv6 addresses
is so easy, .. and besides TCP requires no changes for IPv6 and it has
no clue about host names.

-- Parameter negotiation

> An implementation can ignore all free-form text/value pairs and still
operate just fine.

Provided that all the defaults are acceptable.  For example, Section 3.7
says: 

     In order to allow write operations without RTT, the initiator and
     target must have agreed to do so by both sending the AllowNoRTT:yes
     key-pair attribute to each other (either during Login or through
     the Text Command/Response mechanism).

In this case, the default (RTT required on write) appears to be correct, the
bad news is that
implementations that want to negotiate it away buy into the text processing
by comparison to
the small number of bits that are negotiated in FCP.  One of the more
important defaults that
must be accepted is No Authentication, but I think the key:value pairing is
an ok way to
support authentication - I'm concerned that it's overkill for the small
number of bits that SCSI
needs to negotiate.

In general, this sort of arbitrary extensibility can be both a virtue and a
vice because while
extensions don't require on-the-wire format changes, they do require
complicated rules
about what key:value pairs are supposed to be in which message and how to
deal with
situations in which some of them are missing.  

-- CRC

> A CRC at the TCP layer or higher?

Higher.  The read and write data need to be covered by a real CRC.  TCP and
IP checksums are
too weak to be acceptable, and routers strip/regenerate layer 2 checksums
leaving no CRC to
cover corruption in the router.  Restricting the CRC to data only avoids any
requirement that an ALG
recalculate it, as CRCs are much more difficult to adjust than the 1's
complement TCP and
IP checksums.

-- Ping

> > * What value does the ability to do an iSCSI ping add to the existing
> > ability to do an ICMP ECHO?  If little or none, this should be omitted,
see
> > section 3.15.
> 
> It makes sure the iSCSI device server is still alive and kicking.

I'm not sure about this one, as a SCSI Inquiry command seems to do about
the same thing, and verifies that the iSCSI engine can actually do something
SCSI, as opposed to just answer a ping.  One complication is that Inquiry
can return a check condition that requires further action, in contrast to a
self-contained ping.  

-- Combined control and data connections

> If we multiplex LUNs (as we do in the current draft) keeping to a short
TCP
> frame will leave as open to all sorts of troubles (possible deadlocks) due
> to the limited TCP window and our lack of control over the data source and
> sink. Separating the control and data stream we could resort to selective
> resets to get out of trouble - while with a common connection we might
have
> to resort to radical means (e.g., closing connections).

Multiplexing LUNs is the right decision to avoid massive proliferation of
TCP session
state.  Separating the control and data streams is not the only way out of
"all sorts of troubles".  With combined control and data connections,
holding
another control connection open works for selective resets, with the
possible
exception of Abort Task (which may have to be issued on the connection that
the task was initiated on).

> In addition in a "permissive" environment (like a video server) we might
> require CRC on the control connection while leaving the data connections
up
> to the user.

But there's more than enough flexibility to negotiate this behavior.  In any
case,
EMC lives in a part of the world where omitting CRCs is a generally bad
idea,
even for video data.

-- Killing all I/Os

There are two important cases here.  In the first case, if the control
connection times
out and closes (or closes for any other reason), then clearly all I/Os have
to be killed.
The problem of concern is in the second case: opening up a new control
connection
CAUSES the old one to be closed as a side effect.  That seems unnecessary,
especially
because resets of various forms can be issued down the new connection to
cause the device
to clean up and get into a known state.    This interacts with the issue
above about
combining data/control onto one connection and allowing multiple connections
between
an initiator and responder pair.

-- Target Name from Initiator

> The target can ignore any key:value pairs sent by the initiator, so it
need
> not receive its name from the initiator. This feature is useful in case
the
> target is actually a front end for many machines and/or disks, in which
> case the initiator can specify to which target it really wants to interact
> with.

I wonder if going there is a good idea, vs. something like a front end
simply exporting each machine and/or disk on a different TCP port.
The problem with handing the target its address inband is that the
connection address no longer fully specifies what the initiator is
talking to, and that seems wrong.

--David

---------------------------------------------------
David L. Black, Senior Technologist
EMC Corporation, 42 South St., Hopkinton, MA  01748
+1 (508) 435-1000 x75140, FAX: +1 (508) 497-6909
black_david@emc.com  Cellular: +1 (978) 394-7754
---------------------------------------------------
Prev by Date: Re: bibliography
Next by Date: SCSI over ST
Prev by thread: Comments to Comments!
Next by thread: RE: IPS Issues document
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:08:16 2001
6315 messages in chronological order