RE: A Transport Protocol Without ACK

To: "Jim McGrath" <Jim.McGrath@quantum.com>, "'Randall Stewart'" <rrs@cisco.com>
Subject: RE: A Transport Protocol Without ACK
From: "Douglas Otis" <dotis@sanlight.net>
Date: Wed, 20 Sep 2000 20:41:31 -0700
Cc: "'Y P Cheng'" <ycheng@advansys.com>, "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Importance: Normal
In-Reply-To: <B7E2A2967AF7D211995B00805FA7E4DF01FE08A2@milcmsgc.qntm.com>
Sender: owner-ips@ece.cmu.edu
Jim,

I agree drives are primarily used in sector units of 512 bytes, but even the
lowly CD use 2048 and include the headers.  A header that targets the
initiator could act to exchange credits.  The danger in not keeping units
confined to FC structures means there could end up a disagreement.  A signal
when login credits are available (in case the initiator becomes confused)
should be a means to reconcile accounts.  The same signal echoed (Now at
Login Credit) could act as acknowledgement.  The PDU used to acknowledge
would then be the PDU size below login.  Rather than using a single token, a
token count could act to dynamically allocate additional credit in addition
to acknowledging used credit being returned.  If you signal login credit,
should there be an over allocation and not just a reminder, waiting for
acknowledgement does not seem onerous.  At least, things stay compatible.
512 or 2k, hard to decide.  To match FC, it would be 2k.  What is 1536 bytes
between friends?

Doug

> -----Original Message-----
> From: Jim McGrath [mailto:Jim.McGrath@quantum.com]
> Sent: Wednesday, September 20, 2000 6:00 PM
> To: 'Douglas Otis'; Jim McGrath; 'Randall Stewart'
> Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
> Subject: RE: A Transport Protocol Without ACK
>
>
>
> Doug,
>
> I agreed that FCP offers some instructive ideas.  I would like to decouple
> the allocation of initial credits from the login process (as per
> a previous
> message) and allow the target to really dynamically allocate them on a per
> initiator basis.
>
> On the general BB credit model, the only real issue I have there
> is that in
> FC the credits are for frames (in FC often 2K bytes) not bytes.  I agree
> that commands can just be stored by the target like any other data, but
> there is big difference in size between a command frame and a user data
> frame.  We don't need byte level granularity, but keeping the
> "credit unit"
> to something like 512 byes (or smaller) would allow for more efficient
> target memory management at modest controller complexity.  Note you could
> still send things like 2K byte payloads, you just end up using 4 512 byte
> credits rather than a single frame credit.  It was the coupling
> of 1 credit
> per frame, and then the need for large frames for efficient bus
> utilization
> that got us into trouble.
>
> In FC one objection to making a lot more smaller credits is the number of
> primitive tokens you would have to send (since each mapped to one credit),
> but here we need control packets anyway, so we can free up and use credits
> in bunches rather than individually (similar to what is done in FCP).
>
> In theory an initiator could send down multiple commands and then start
> sending down data sort of randomly between the commands, creating
> potential
> starvation issues.  But no initiator that I know of does anything
> like that.
> I've never seen one that will send down some write data, jump to another
> command and send data, and then go back to the first command
> (maybe someone
> else has?).  As long as the amount of data you can send with credits is
> smaller than the TCP window size, then you should never get starvation as
> far as I can tell (am I missing something)?
>
> Jim
>
>
>
>
> -----Original Message-----
> From: Douglas Otis [mailto:dotis@sanlight.net]
> Sent: Wednesday, September 20, 2000 12:17 AM
> To: Jim McGrath; 'Randall Stewart'
> Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
> Subject: RE: A Transport Protocol Without ACK
>
>
> Jim,
>
> I understand a desire to stick with what works.  Regardless of the IP
> transport, the traffic to each LUN will require some other flow control
> mechanism.  Yes, we could re-invent the wheel as it applies to
> SCSI, but if
> you examine FC Class 3 FCP, you will see an appropriate flow control
> mechanism in place.  It uses Buffer-to-Buffer credit tokens generated by
> comma frame delimiters.  These frame delimiters are defined within the
> FC-encapsulation documentation.  Simple, direct and easy.  See
> http://search.ietf.org/internet-drafts/draft-otis-fc-sctp-ip-00.txt. To
> facilitate processing frame delimiters within software, both could be
> presented before the frame rather than as shown in the rough
> draft.  Perhaps
> even a Null CRC option could be added if one trusts SCTP checksum for
> software implementations.
>
> Once flow control is in place, there is little need for extending command
> tags, CRN, or anything associated with the LUN as these
> structures are then
> independent of transport bandwidth.  I doubt there is a great benefit in
> having more than 256 commands pending on a nexus.  Expanding any
> field only
> makes converting to a drive interface state-full, difficult and far less
> reliable.  With the FCP flow control mechanism, T10 does not need to
> redefine SCSI for initiators that overwhelm the target. The target would
> have adequate control of resources.
>
> Should IP-SCSI be driven by controller design?  Caching, volume
> management,
> reservations, and nearly every feature offered by a controller is
> significantly reduced in value should the controller be placed next to the
> drive.  If you are in a facility 35 miles from a location holding drives,
> you may find 50 miles of fiber transversed creating some 800+
> micro-seconds
> of round-trip time simply due to the speed of light.  You may shudder to
> think about any NIC buffer.  Where would you want the controller and where
> would you want the drive?  The controller must remain on the
> client side of
> the network.  As such, drive design should steer the IP-SCSI standard.  At
> least with FCP, the drive manufacturers have already spoken.  Those making
> controllers will just have to make more of them and develop controller
> locking protocols should this controller be part of a remote cluster.
>
> If you examine FCP documentation, you will find that you can send
> data with
> the command as an option.  You can also send the response at the
> end of data
> as an option.  Every vital feature used to justify tossing FCP structures
> become moot.  Should just an 8M byte FIFO buffer be placed between an IP
> agent and a FC agent, as much as 65 milli-seconds of latency can
> be created.
> Merely this additional latency will greatly facilitate rate differences
> between these two agents.  FCP flow control and burst limits could easily
> finish the task.
>
> You speak of TCP as a proven technology, but TCP is not being
> suggested for
> IP-SCSI.  TCP with some other mechanism is used to solve ills created by a
> persistent single byte stream.  This is not proven technology,
> nor likely to
> function properly without major tweaking.  At least if you wish to have a
> hand at creating a suitable API for multi-object-streams far and away more
> suitable for SCSI, now is the time.  Perhaps either Randall
> Stewart's U-SCTP
> or a stale frame timer should be added to prevent overlapping retry
> mechanisms if this protocol is used as a bridge to FC.
>
> As far as the configuration effort, convert these requirements into LDAP
> structures.  This would allow a single database to manage all aspects of
> configuration.  Stuffing this information across the transport
> only weakens
> security.  A bad idea and makes deciding who manages difficult.  Networks
> will always have a means to identify equipment in some binary fashion, and
> LDAP and DHCP servers combine this information into meaningful structures
> with meaningful names.  All values required for the various
> transport layers
> would be derived from these standard servers.
>
> As far as what to do with Stream 0- revision negotiations, FC-domain
> mapping, SRC-DST filtering done in purely binary form would be the best
> means at getting equipment to accept commands without a high
> overhead.  The
> equipment does not care what the binary number represents.  As far as a
> clever means of doing remote DNS, SCTP has that covered.  Again, this
> information comes from an LDAP server accessed by the driver and
> not the SAM
> interface or the SCSI transport layer.
>
> Yes, there are many options within FC that should be excluded.  If FCP
> structures can be used, perhaps while holding one's nose, they should be.
> There are far too many benefits for doing so, and too few
> benefits for not.
> In the end, a better product would have a common set of
> structures to speak
> SAN.  If you wish to make round wheels out of square blocks, don't let me
> stop you.  I think I see a set of wheels already.
>
> Doug
>
> > -----Original Message-----
> > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
> > Jim McGrath
> > Sent: Tuesday, September 19, 2000 4:05 PM
> > To: 'Randall Stewart'; Jim McGrath
> > Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
> > Subject: RE: A Transport Protocol Without ACK
> >
> >
> >
> > Actually the burden of proof issues is why I suggest we look at
> > some things
> > that are actually being used today (since you don't have to
> guess how they
> > behave).  That is one of TCP's great strengths, and a bit of a
> > weakness for
> > SCTP (no offense to SCTP supporters, but it certainly does not
> have a big
> > and long "track record" yet, and so I can understand the concerns
> > others may
> > have as to whether things would work out as well in practice as
> they do in
> > proposal).
> >
> > Jim
> >
> > PS Personally, I'm a big believer is copying stuff that works,
> making the
> > minimum amount of required changes, and then doing a rapid but
> controlled
> > deployment (I've been involved in a lot of those sorts of things
> > in ATA and
> > SCSI).  Having been involved in both these sorts of endevors and the
> > opposite (big, clean sheet of paper efforts, like 1394 (no offense to
> > 1394/Firewire supporters, but I was working on it a decade ago)),
> > I know how
> > easy it is to underestimate the work required by the latter, and to be
> > turned off by the "inelegance" of the former.  For me, life has
> become too
> > short - I'm willing to accept inelegance as the price for speed of
> > deployment.
> >
> >
> > -----Original Message-----
> > From: Randall Stewart [mailto:rrs@cisco.com]
> > Sent: Tuesday, September 19, 2000 4:38 AM
> > To: Jim McGrath
> > Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
> > Subject: Re: A Transport Protocol Without ACK
> >
> >
> > Jim:
> >
> > Any transport protocol proposal is ok. As long as it can be seen and
> > reviewed. So far I have seen only two TCP and SCTP.
> >
> > Oh, a little side note, any transport protocol proposed MUST be able to
> > show TCP like behavior in the face of congestion. And I think,
> IMHO, that
> > this means  that if it is NOT using RFC2581 procedures it MUST show that
> > it does backoff and share with TCP. It also has a HEAVY burden
> of proof to
> > show this facility at least in my mind and I would think in the
> > IESG's mind
> > as well...
> >
> > R
> >
> >
> > Jim McGrath wrote:
> >
> > > I would expand your search to include non standard protocols (i.e.
> > > proprietary ones) as well if they offered something and were
> adequately
> > > understood by the outside world.  We do that in storage quite a lot -
> > > indeed, some standard protocols are direct descendants of
> what were once
> > > proprietary protocols (e.g. ATA, the most widely used desktop disk
> > > interface, and ESCON, a dominant mainframe class interface
> > (both of which
> > > originated from IBM proprietary technologies)).
> > >
> > > Jim
> > >
> > > -----Original Message-----
> > > From: Y P Cheng [mailto:ycheng@advansys.com]
> > > Sent: Monday, September 18, 2000 5:52 PM
> > > To: 'Ips@Ece. Cmu. Edu'
> > > Subject: RE: A Transport Protocol Without ACK
> > >
> > > From: randall@stewart.chicago.il.us
> > > > I see no viable transport protocol here and I don't see this
> > > > conversation of any use unless you get exact details AND point
> > > > to a internet draft that defines EXACTLY how it works (or possibly
> > > > some other standards document).
> > >
> > > Both I2O and VI are transport protocols which define the format of a
> > request
> > > to a transport service provider, i.e. an adapter card.  I2O
> is used but
> > not
> > > limited to deliver SCSI requests and VI is used for any payload
> > including
> > IP
> > > packets.  VI is mapped into FC with the device headers between the FC
> > header
> > > and data payload.  VI can certainly be used for delivery of
> > SCSI requests
> > > too.  Both protocols require the service provider to have reliable
> > delivery
> > > and reception.  VI defines different QoS.
> > >
> > > > > I don't claim any credit about this transport layer
> protocol. Every
> > > fibre
> > > > > channel and Infiniband adapter designer knows about this
> protocol --
> > > > > although there is no standard.  I am sure the TCP
> > accelerator card is
> > > doing
> > > > > the same.  This protocol is a great alternative to the
> use of TCP/IP
> > and
> > > > > should be incorporated into iSCSI.
> > > >
> > > > No it is not. You are not offering an alternative yet..
> > >
> > > I did not imply iSCSI should use I2O or VI.  In fact, the
> > purpose iSCSI is
> > > to map SCSI requests into IP packets as well as to define the
> delivery .
> > It
> > > seems to me that the working group has set its mind on TCP/IP and is
> > > believing this is the only solution.  The consensus seems if
> > there is any
> > > other solutions that address flow control and congestion, it
> > would end up
> > > like TCP/IP.  I am simply pointing out if we keep an iSCSI
> request as a
> > > single atomic transaction without separating it into the
> > > TCP/IP-stream-oriented Writes and Reads that each deals with a
> > single DU,
> > > then, the deadlock problem goes away.  While the work group thinks we
> > should
> > > take advantage the flow control and congestion management of
> > TCP/IP, there
> > > are alternatives known as BB-credit and EE-credit management.
>  The fibre
> > > channel adapters make reliable delivery, lost packet detection, and
> > > retransmission without TCP/IP.
> > >
> > > Randall, you are right, I did not spent time to provide the
> > working group
> > a
> > > draft defining such transaction-oriented protocol.  All I have
> > provided is
> > > an idea that besides TCP/IP.  The designers for SCSI and fibre channel
> > > adapters have solved the head-of-queue blocking, the congestion, and
> > > retransmission problems.  The transaction-oriented WRITE-REQUEST and
> > > READ-RESPONSE, in my humble opinion, allows us to implement
> > iSCSI simpler
> > > than that of WRITE and READ stream requests.  The performance cost of
> > > requiring ACKs on every DU with size greater than MTU on a
> network with
> > long
> > > latency is very expensive..  By defining a greater ACK
> granularity is an
> > > attempt to solve this performance problem.  If we do wish to
> > ACK on every
> > > DU, then, on a long latency network, we must have a method to
> stream the
> > > PDUs to ensure the performance.  The method should not consume a large
> > > amount of memory space.  One should never ignore the TCP/IP
> > memory-to-memory
> > > copy overhead when the backbone will be running at OC-192 speed in the
> > near
> > > future.  Finally, please don't ever ask two NIC cards to
> > synchronize with
> > > each other.  It is really hard to do as those of us in business of
> > designing
> > > NIC cards can testify.
> > >
> > > Y.P. Cheng, CTO, ConnectCom Solutions Corp.
> >
>
References:
- RE: A Transport Protocol Without ACK
  - From: Jim McGrath <Jim.McGrath@quantum.com>
Prev by Date: RE: iSCSI: Flow Control
Next by Date: Re: iSCSI: Flow Control
Prev by thread: RE: A Transport Protocol Without ACK
Next by thread: RE: A Transport Protocol Without ACK
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:07:09 2001
6315 messages in chronological order