SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: A Transport Protocol Without ACK



    
    Doug,
    
    I agreed that FCP offers some instructive ideas.  I would like to decouple
    the allocation of initial credits from the login process (as per a previous
    message) and allow the target to really dynamically allocate them on a per
    initiator basis.  
    
    On the general BB credit model, the only real issue I have there is that in
    FC the credits are for frames (in FC often 2K bytes) not bytes.  I agree
    that commands can just be stored by the target like any other data, but
    there is big difference in size between a command frame and a user data
    frame.  We don't need byte level granularity, but keeping the "credit unit"
    to something like 512 byes (or smaller) would allow for more efficient
    target memory management at modest controller complexity.  Note you could
    still send things like 2K byte payloads, you just end up using 4 512 byte
    credits rather than a single frame credit.  It was the coupling of 1 credit
    per frame, and then the need for large frames for efficient bus utilization
    that got us into trouble.
    
    In FC one objection to making a lot more smaller credits is the number of
    primitive tokens you would have to send (since each mapped to one credit),
    but here we need control packets anyway, so we can free up and use credits
    in bunches rather than individually (similar to what is done in FCP).
    
    In theory an initiator could send down multiple commands and then start
    sending down data sort of randomly between the commands, creating potential
    starvation issues.  But no initiator that I know of does anything like that.
    I've never seen one that will send down some write data, jump to another
    command and send data, and then go back to the first command (maybe someone
    else has?).  As long as the amount of data you can send with credits is
    smaller than the TCP window size, then you should never get starvation as
    far as I can tell (am I missing something)?
    
    Jim
    
     
    
    
    -----Original Message-----
    From: Douglas Otis [mailto:dotis@sanlight.net]
    Sent: Wednesday, September 20, 2000 12:17 AM
    To: Jim McGrath; 'Randall Stewart'
    Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
    Subject: RE: A Transport Protocol Without ACK
    
    
    Jim,
    
    I understand a desire to stick with what works.  Regardless of the IP
    transport, the traffic to each LUN will require some other flow control
    mechanism.  Yes, we could re-invent the wheel as it applies to SCSI, but if
    you examine FC Class 3 FCP, you will see an appropriate flow control
    mechanism in place.  It uses Buffer-to-Buffer credit tokens generated by
    comma frame delimiters.  These frame delimiters are defined within the
    FC-encapsulation documentation.  Simple, direct and easy.  See
    http://search.ietf.org/internet-drafts/draft-otis-fc-sctp-ip-00.txt. To
    facilitate processing frame delimiters within software, both could be
    presented before the frame rather than as shown in the rough draft.  Perhaps
    even a Null CRC option could be added if one trusts SCTP checksum for
    software implementations.
    
    Once flow control is in place, there is little need for extending command
    tags, CRN, or anything associated with the LUN as these structures are then
    independent of transport bandwidth.  I doubt there is a great benefit in
    having more than 256 commands pending on a nexus.  Expanding any field only
    makes converting to a drive interface state-full, difficult and far less
    reliable.  With the FCP flow control mechanism, T10 does not need to
    redefine SCSI for initiators that overwhelm the target. The target would
    have adequate control of resources.
    
    Should IP-SCSI be driven by controller design?  Caching, volume management,
    reservations, and nearly every feature offered by a controller is
    significantly reduced in value should the controller be placed next to the
    drive.  If you are in a facility 35 miles from a location holding drives,
    you may find 50 miles of fiber transversed creating some 800+ micro-seconds
    of round-trip time simply due to the speed of light.  You may shudder to
    think about any NIC buffer.  Where would you want the controller and where
    would you want the drive?  The controller must remain on the client side of
    the network.  As such, drive design should steer the IP-SCSI standard.  At
    least with FCP, the drive manufacturers have already spoken.  Those making
    controllers will just have to make more of them and develop controller
    locking protocols should this controller be part of a remote cluster.
    
    If you examine FCP documentation, you will find that you can send data with
    the command as an option.  You can also send the response at the end of data
    as an option.  Every vital feature used to justify tossing FCP structures
    become moot.  Should just an 8M byte FIFO buffer be placed between an IP
    agent and a FC agent, as much as 65 milli-seconds of latency can be created.
    Merely this additional latency will greatly facilitate rate differences
    between these two agents.  FCP flow control and burst limits could easily
    finish the task.
    
    You speak of TCP as a proven technology, but TCP is not being suggested for
    IP-SCSI.  TCP with some other mechanism is used to solve ills created by a
    persistent single byte stream.  This is not proven technology, nor likely to
    function properly without major tweaking.  At least if you wish to have a
    hand at creating a suitable API for multi-object-streams far and away more
    suitable for SCSI, now is the time.  Perhaps either Randall Stewart's U-SCTP
    or a stale frame timer should be added to prevent overlapping retry
    mechanisms if this protocol is used as a bridge to FC.
    
    As far as the configuration effort, convert these requirements into LDAP
    structures.  This would allow a single database to manage all aspects of
    configuration.  Stuffing this information across the transport only weakens
    security.  A bad idea and makes deciding who manages difficult.  Networks
    will always have a means to identify equipment in some binary fashion, and
    LDAP and DHCP servers combine this information into meaningful structures
    with meaningful names.  All values required for the various transport layers
    would be derived from these standard servers.
    
    As far as what to do with Stream 0- revision negotiations, FC-domain
    mapping, SRC-DST filtering done in purely binary form would be the best
    means at getting equipment to accept commands without a high overhead.  The
    equipment does not care what the binary number represents.  As far as a
    clever means of doing remote DNS, SCTP has that covered.  Again, this
    information comes from an LDAP server accessed by the driver and not the SAM
    interface or the SCSI transport layer.
    
    Yes, there are many options within FC that should be excluded.  If FCP
    structures can be used, perhaps while holding one's nose, they should be.
    There are far too many benefits for doing so, and too few benefits for not.
    In the end, a better product would have a common set of structures to speak
    SAN.  If you wish to make round wheels out of square blocks, don't let me
    stop you.  I think I see a set of wheels already.
    
    Doug
    
    > -----Original Message-----
    > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
    > Jim McGrath
    > Sent: Tuesday, September 19, 2000 4:05 PM
    > To: 'Randall Stewart'; Jim McGrath
    > Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
    > Subject: RE: A Transport Protocol Without ACK
    >
    >
    >
    > Actually the burden of proof issues is why I suggest we look at
    > some things
    > that are actually being used today (since you don't have to guess how they
    > behave).  That is one of TCP's great strengths, and a bit of a
    > weakness for
    > SCTP (no offense to SCTP supporters, but it certainly does not have a big
    > and long "track record" yet, and so I can understand the concerns
    > others may
    > have as to whether things would work out as well in practice as they do in
    > proposal).
    >
    > Jim
    >
    > PS Personally, I'm a big believer is copying stuff that works, making the
    > minimum amount of required changes, and then doing a rapid but controlled
    > deployment (I've been involved in a lot of those sorts of things
    > in ATA and
    > SCSI).  Having been involved in both these sorts of endevors and the
    > opposite (big, clean sheet of paper efforts, like 1394 (no offense to
    > 1394/Firewire supporters, but I was working on it a decade ago)),
    > I know how
    > easy it is to underestimate the work required by the latter, and to be
    > turned off by the "inelegance" of the former.  For me, life has become too
    > short - I'm willing to accept inelegance as the price for speed of
    > deployment.
    >
    >
    > -----Original Message-----
    > From: Randall Stewart [mailto:rrs@cisco.com]
    > Sent: Tuesday, September 19, 2000 4:38 AM
    > To: Jim McGrath
    > Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
    > Subject: Re: A Transport Protocol Without ACK
    >
    >
    > Jim:
    >
    > Any transport protocol proposal is ok. As long as it can be seen and
    > reviewed. So far I have seen only two TCP and SCTP.
    >
    > Oh, a little side note, any transport protocol proposed MUST be able to
    > show TCP like behavior in the face of congestion. And I think, IMHO, that
    > this means  that if it is NOT using RFC2581 procedures it MUST show that
    > it does backoff and share with TCP. It also has a HEAVY burden of proof to
    > show this facility at least in my mind and I would think in the
    > IESG's mind
    > as well...
    >
    > R
    >
    >
    > Jim McGrath wrote:
    >
    > > I would expand your search to include non standard protocols (i.e.
    > > proprietary ones) as well if they offered something and were adequately
    > > understood by the outside world.  We do that in storage quite a lot -
    > > indeed, some standard protocols are direct descendants of what were once
    > > proprietary protocols (e.g. ATA, the most widely used desktop disk
    > > interface, and ESCON, a dominant mainframe class interface
    > (both of which
    > > originated from IBM proprietary technologies)).
    > >
    > > Jim
    > >
    > > -----Original Message-----
    > > From: Y P Cheng [mailto:ycheng@advansys.com]
    > > Sent: Monday, September 18, 2000 5:52 PM
    > > To: 'Ips@Ece. Cmu. Edu'
    > > Subject: RE: A Transport Protocol Without ACK
    > >
    > > From: randall@stewart.chicago.il.us
    > > > I see no viable transport protocol here and I don't see this
    > > > conversation of any use unless you get exact details AND point
    > > > to a internet draft that defines EXACTLY how it works (or possibly
    > > > some other standards document).
    > >
    > > Both I2O and VI are transport protocols which define the format of a
    > request
    > > to a transport service provider, i.e. an adapter card.  I2O is used but
    > not
    > > limited to deliver SCSI requests and VI is used for any payload
    > including
    > IP
    > > packets.  VI is mapped into FC with the device headers between the FC
    > header
    > > and data payload.  VI can certainly be used for delivery of
    > SCSI requests
    > > too.  Both protocols require the service provider to have reliable
    > delivery
    > > and reception.  VI defines different QoS.
    > >
    > > > > I don't claim any credit about this transport layer protocol. Every
    > > fibre
    > > > > channel and Infiniband adapter designer knows about this protocol --
    > > > > although there is no standard.  I am sure the TCP
    > accelerator card is
    > > doing
    > > > > the same.  This protocol is a great alternative to the use of TCP/IP
    > and
    > > > > should be incorporated into iSCSI.
    > > >
    > > > No it is not. You are not offering an alternative yet..
    > >
    > > I did not imply iSCSI should use I2O or VI.  In fact, the
    > purpose iSCSI is
    > > to map SCSI requests into IP packets as well as to define the delivery .
    > It
    > > seems to me that the working group has set its mind on TCP/IP and is
    > > believing this is the only solution.  The consensus seems if
    > there is any
    > > other solutions that address flow control and congestion, it
    > would end up
    > > like TCP/IP.  I am simply pointing out if we keep an iSCSI request as a
    > > single atomic transaction without separating it into the
    > > TCP/IP-stream-oriented Writes and Reads that each deals with a
    > single DU,
    > > then, the deadlock problem goes away.  While the work group thinks we
    > should
    > > take advantage the flow control and congestion management of
    > TCP/IP, there
    > > are alternatives known as BB-credit and EE-credit management.  The fibre
    > > channel adapters make reliable delivery, lost packet detection, and
    > > retransmission without TCP/IP.
    > >
    > > Randall, you are right, I did not spent time to provide the
    > working group
    > a
    > > draft defining such transaction-oriented protocol.  All I have
    > provided is
    > > an idea that besides TCP/IP.  The designers for SCSI and fibre channel
    > > adapters have solved the head-of-queue blocking, the congestion, and
    > > retransmission problems.  The transaction-oriented WRITE-REQUEST and
    > > READ-RESPONSE, in my humble opinion, allows us to implement
    > iSCSI simpler
    > > than that of WRITE and READ stream requests.  The performance cost of
    > > requiring ACKs on every DU with size greater than MTU on a network with
    > long
    > > latency is very expensive..  By defining a greater ACK granularity is an
    > > attempt to solve this performance problem.  If we do wish to
    > ACK on every
    > > DU, then, on a long latency network, we must have a method to stream the
    > > PDUs to ensure the performance.  The method should not consume a large
    > > amount of memory space.  One should never ignore the TCP/IP
    > memory-to-memory
    > > copy overhead when the backbone will be running at OC-192 speed in the
    > near
    > > future.  Finally, please don't ever ask two NIC cards to
    > synchronize with
    > > each other.  It is really hard to do as those of us in business of
    > designing
    > > NIC cards can testify.
    > >
    > > Y.P. Cheng, CTO, ConnectCom Solutions Corp.
    >
    


Home

Last updated: Tue Sep 04 01:07:09 2001
6315 messages in chronological order