SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: A Transport Protocol Without ACK



    Y P Cheng wrote:
    > 
    > (My apology for this long reply.  I hope it worthies your reading.)
    > > From: randall@stewart.chicago.il.us
    > > [mailto:randall@stewart.chicago.il.us]
    > > I am a bit confused by the above Y.P. you state " by the returning of
    > > status PDU."... Both SCTP and TCP will carry a piggyback
    > > ACK with that PDU, so you end up accomplishing the same thing. What are
    > > you trying to say that I am missing???
    > 
    > The piggybacked ACK saves extra PDUs but does not solve the buffer
    > requirement for long latency.  In my example, if we have 20-milliseconds of
    > round-trip time on a IP network with gigabit backbone, in order to keep the
    > data streaming on the net we must have 2MB of buffer just in case we need to
    > retransmit the data.  By the way, in SCSI read/write data transfer, the
    > receiver sends nothing back until status phase.  Therefore, there is nothing
    > to piggyback on.  The SCSI protocol for data transfer is basically
    > half-duplex, not full-duplex.  Sending ACKs requires extra PUDs.
    > 
    > > Y.P. please enumerate the protocols that have this property that also
    > > provide TCP friendly congestion control. If you could enumerate the
    > > exact
    > > protocols and pointers to the specifications I would be more than
    > > glad to have a look at these and see if I can support them. Making vague
    > > references to "not limiting itself to TCP/IP" does not do anything for
    > > me and I think nothing for the WG. We need specific transport protocols
    > > listed that are capable of transporting iSCSI AND have TCP friendly
    > > congestion control principles built into them...
    > 
    > I am not an expert in making a transport protocol proposal.  However, let me
    > use a bottom-up approach by saying how an iSCSI transport layer should work.
    > Other people can help in making it an IETF proposal. I participated in this
    > discussion with an intention to provide the working group information on the
    > latest NIC adapter technology so the iSCSI proposal can better serve the NIC
    > adapter industry as well as the community who uses TCP/IP. In this response,
    > I will address two topics:
    >   1) The inefficiency of using TCP/IP to implement iSCSI
    >   2) What we can do in the transport layer to overcome the inefficiency
    >      of iSCSI on TCP/IP (In here, I am stealing ideas from VI, TCP/RDMA, and
    > FCP.)
    > 
    > (Disclaimer: my apology in advance if my view on TCP/IP is incorrect herein.
    > After all, I am a career adapter designer.)  For iSCSI to use TCP/IP, it
    > uses SOCKET, CONNECT or BIND to first make a connection point which is a (IP
    > address, TCP port) pair.  The asymmetric model provides a second TCP port; a
    > multi-path to another node has a second IP address. After connecting -- with
    > one or more connection points and paths -- the iSCSI creates multiple PDUs:
    > command, data, and status.  A SCSI initiator uses a WRITE call to tell an IP
    > NIC to send the PDUs.  The iSCSI driver is aware that there could be
    > multiple NIC cards.  A SCSI target LISTEN to the incoming PDUs.  It may
    > listen to multiple NIC cards.  I will not repeat the queuing and blocking
    > problems of the iSCSI driver in dealing with multiple application software
    > with many TCP/IP ports, and the issues of connecting to multiple targets or
    > initiators.  We will address only the performance issue of the stream- and
    > connection-oriented delivery of TCP/IP.  As in the example of my previous
    > posting, to keep write data streaming on a 1 gigabit connection with 20
    > milliseconds round-trip latency time, the initiator must have 2000 1K
    > buffers hanging around for retransmitting lost data packets.  If it has 200
    > 1K buffers allocated for a target, the initiator can only send 200K of data
    > in 2 milliseconds and wait for 18 milliseconds for the first ACK to come
    > back.  Therefore, it runs at 10% of the possible maximum throughput.  A
    > target uses RTT to control how much resources each initiator can consume.
    > However, it has no choice but to provide 2000 1K buffers to receive the
    > incoming data for maximum possible performance.  To get TCP/IP data, the
    > target uses READs to get data from the IP NIC cards.  The memory-to-memory
    > copy to process the TCP stack looking for a TCP port number in the IP packet
    > is the greatest culprit of all of the TCP/IP performance problem. Companies
    > like Alacritec builds special TCP adapter to solve the performance problem
    > to doing the port look up in the adapter.
    > 
    > The good news is the above performance problem has already been addressed by
    > VI and FCP implementation in the latest NIC adapters.  Here is my proposed
    > iSCSI transport layer protocol: A TRANSACTION ORIENTED WITH BULK
    > ACKNOWLEDGMENT protocol.
    > 
    > Instead of using READs and WRITEs for data streams for TCP/IP, A iSCSI
    > driver should send a SCSI request or response to a transport layer using
    > SEND-REQUEST and RECEIVE-RESPONSE MESSAGEs.  These message contains the IP
    > end-point connection, SCSI command bytes, and data buffer descriptors that
    > supplied by the application software.  Each message describes a transaction
    > EXCHANGE, which can have an exchange-ID. (iSCSI calls the Initiator Task
    > Tag, although a task can have multiple SCSI commands.)  The iSCSI driver
    > still use SOCKET, CONNECT, and BIND to create connections.
    > 
    > It is true using a total connectionless protocol like UDP to transmit 10
    > megabytes of data on a busy Internet, we will be forever trying to
    > retransmit due to the lost-frame error.  However, instead of sending an ACK
    > for every data frame, we can steal the ideal from fibre channel by breaking
    > down a transaction exchange into data sequences each with a collection of
    > data frames.  The receiver needs only to acknowledge a sequence which has a
    > unique sequence ID.  A sequence with lost data frame will be retransmitted.
    > Using sliding window, multiple sequences can be transmitted.  This is how we
    > keep data frames streaming on a network with long latency time.  The size of
    > data sequence is of course network dependent.
    > 
    
    So unless you have a very very small sequence (network dependant as you
    say) you
    end up with one lost frame causes the WHOLE sequence to be
    retransmitted.
    Reducing the network goodput to a crawl.
    
    If the "sequence" is very very small (say a MTU for the network) then
    what
    you have described is a connectionless UDP protocol OR SCTP if it is
    connection oriented and reliable....
    
    
    > Having the data descriptors provided by application software for a
    > transaction layer is the greatest benefit of this proposal.  There is no
    > data buffering like TCP/IP.  The transport layer does not have to allocate a
    > huge buffer to keep data frames streaming on a network with long latency
    > delay.  It uses the buffers provided the application software.  It can
    > always retransmit a data frame because the application software must stay
    > around until the transaction exchange is complete.  In VI, the application
    > software allocates a memory segment, gives it a handle, and passes to a
    > remote node to allow remote DMA.  Therefore, the data descriptors of this
    > transport protocol can be simply a memory-handle for a memory segment
    > previously created.  TCP/RDMA is copying this idea.
    
    Not having the transport protocol copy data from the user and just
    passing the buffers is NOT a new idea. It has been around in TCP
    for quite some time and you have been able to find implemenations
    that do this for years.  Now one question I have for you is that you
    nicely describe the sender side above.. what about the receiver? It will
    still need some sort of buffer to read data into..
    
    > 
    > Each transaction exchange is executed by a NIC driver atomically.  Hundred
    > or even thousands of SEND-REQUEST and RECEIVE-RESPONSE can be outstanding in
    > the driver.  After sending the SCSI command PDUs, the data and status PDUs
    > are handled on demand by the NIC driver.  There is no queue and deadlock
    > problem.  The detection of lost data frame is a function of the transport
    > layer which specifies the QoS (Quality of Service).
    
    Your proposal does not describe how you detect a lost frame. The only
    thing I can see is you must use some sort of ordering of the sequences.
    Otherwise if you lost a whole sequence you would never be aware that
    you lost one...
    
    > 
    > Flow control is done by EE-credit granting by a receiver so no one can
    > overflow its resources.  This is the same as the Max---RN discussed in
    > iSCSI. Congestion control is managed by alternative NIC or IP endpoints.
    > Both should be a part of the transport protocol.
    
    How is congestion control managed by the NIC cards or IP endpoint? These
    are CRITICAL questions that you MUST have answers for... passing this
    off to an undefined NIC card is UNACCEPTABLE. 
    
    You MUST be able to show a transport protocol that is responsive 
    to network congestions the way TCP is. 
    
    1) What is the indication of network congestion?
    2) How does the protocol respond to network congestion when
       detected by (1)?
    3) When do retransmissions happen? Is there a timer? if
       so does it back off properly?
    4) If you have a timer what algorithm are you using to
       establish it? 
    5) What limits your data flow? EE-credits granting would need
       to be shown responsive like TCP cwnd?
    
    
    You so far have describe nothing but "let the iSCSI designer use a
    unspecifed
    protocol that send packets" aka UDP. 
    
    I see no viable transport protocol here and I don't see this 
    conversation of any use unless you get exact details AND point
    to a internet draft that defines EXACTLY how it works (or possibly
    some other standards document).
    
    
    
    
    > 
    > I don't claim any credit about this transport layer protocol.  Every fibre
    > channel and Infiniband adapter designer knows about this protocol --
    > although there is no standard.  I am sure the TCP accelerator card is doing
    > the same.  This protocol is a great alternative to the use of TCP/IP and
    > should be incorporated into iSCSI.
    
    No it is not. You are not offering an alternative yet..
    
    R
    
    -- 
    Randall R. Stewart
    randall@stewart.chicago.il.us or rrs@cisco.com
    815-342-5222 (cell) 815-477-2127 (work)
    


Home

Last updated: Tue Sep 04 01:07:13 2001
6315 messages in chronological order