SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: A Transport Protocol Without ACK



    (My apology for this long reply.  I hope it worthies your reading.)
    > From: randall@stewart.chicago.il.us
    > [mailto:randall@stewart.chicago.il.us]
    > I am a bit confused by the above Y.P. you state " by the returning of
    > status PDU."... Both SCTP and TCP will carry a piggyback
    > ACK with that PDU, so you end up accomplishing the same thing. What are
    > you trying to say that I am missing???
    
    The piggybacked ACK saves extra PDUs but does not solve the buffer
    requirement for long latency.  In my example, if we have 20-milliseconds of
    round-trip time on a IP network with gigabit backbone, in order to keep the
    data streaming on the net we must have 2MB of buffer just in case we need to
    retransmit the data.  By the way, in SCSI read/write data transfer, the
    receiver sends nothing back until status phase.  Therefore, there is nothing
    to piggyback on.  The SCSI protocol for data transfer is basically
    half-duplex, not full-duplex.  Sending ACKs requires extra PUDs.
    
    > Y.P. please enumerate the protocols that have this property that also
    > provide TCP friendly congestion control. If you could enumerate the
    > exact
    > protocols and pointers to the specifications I would be more than
    > glad to have a look at these and see if I can support them. Making vague
    > references to "not limiting itself to TCP/IP" does not do anything for
    > me and I think nothing for the WG. We need specific transport protocols
    > listed that are capable of transporting iSCSI AND have TCP friendly
    > congestion control principles built into them...
    
    I am not an expert in making a transport protocol proposal.  However, let me
    use a bottom-up approach by saying how an iSCSI transport layer should work.
    Other people can help in making it an IETF proposal. I participated in this
    discussion with an intention to provide the working group information on the
    latest NIC adapter technology so the iSCSI proposal can better serve the NIC
    adapter industry as well as the community who uses TCP/IP. In this response,
    I will address two topics:
      1) The inefficiency of using TCP/IP to implement iSCSI
      2) What we can do in the transport layer to overcome the inefficiency
         of iSCSI on TCP/IP (In here, I am stealing ideas from VI, TCP/RDMA, and
    FCP.)
    
    (Disclaimer: my apology in advance if my view on TCP/IP is incorrect herein.
    After all, I am a career adapter designer.)  For iSCSI to use TCP/IP, it
    uses SOCKET, CONNECT or BIND to first make a connection point which is a (IP
    address, TCP port) pair.  The asymmetric model provides a second TCP port; a
    multi-path to another node has a second IP address. After connecting -- with
    one or more connection points and paths -- the iSCSI creates multiple PDUs:
    command, data, and status.  A SCSI initiator uses a WRITE call to tell an IP
    NIC to send the PDUs.  The iSCSI driver is aware that there could be
    multiple NIC cards.  A SCSI target LISTEN to the incoming PDUs.  It may
    listen to multiple NIC cards.  I will not repeat the queuing and blocking
    problems of the iSCSI driver in dealing with multiple application software
    with many TCP/IP ports, and the issues of connecting to multiple targets or
    initiators.  We will address only the performance issue of the stream- and
    connection-oriented delivery of TCP/IP.  As in the example of my previous
    posting, to keep write data streaming on a 1 gigabit connection with 20
    milliseconds round-trip latency time, the initiator must have 2000 1K
    buffers hanging around for retransmitting lost data packets.  If it has 200
    1K buffers allocated for a target, the initiator can only send 200K of data
    in 2 milliseconds and wait for 18 milliseconds for the first ACK to come
    back.  Therefore, it runs at 10% of the possible maximum throughput.  A
    target uses RTT to control how much resources each initiator can consume.
    However, it has no choice but to provide 2000 1K buffers to receive the
    incoming data for maximum possible performance.  To get TCP/IP data, the
    target uses READs to get data from the IP NIC cards.  The memory-to-memory
    copy to process the TCP stack looking for a TCP port number in the IP packet
    is the greatest culprit of all of the TCP/IP performance problem. Companies
    like Alacritec builds special TCP adapter to solve the performance problem
    to doing the port look up in the adapter.
    
    The good news is the above performance problem has already been addressed by
    VI and FCP implementation in the latest NIC adapters.  Here is my proposed
    iSCSI transport layer protocol: A TRANSACTION ORIENTED WITH BULK
    ACKNOWLEDGMENT protocol.
    
    Instead of using READs and WRITEs for data streams for TCP/IP, A iSCSI
    driver should send a SCSI request or response to a transport layer using
    SEND-REQUEST and RECEIVE-RESPONSE MESSAGEs.  These message contains the IP
    end-point connection, SCSI command bytes, and data buffer descriptors that
    supplied by the application software.  Each message describes a transaction
    EXCHANGE, which can have an exchange-ID. (iSCSI calls the Initiator Task
    Tag, although a task can have multiple SCSI commands.)  The iSCSI driver
    still use SOCKET, CONNECT, and BIND to create connections.
    
    It is true using a total connectionless protocol like UDP to transmit 10
    megabytes of data on a busy Internet, we will be forever trying to
    retransmit due to the lost-frame error.  However, instead of sending an ACK
    for every data frame, we can steal the ideal from fibre channel by breaking
    down a transaction exchange into data sequences each with a collection of
    data frames.  The receiver needs only to acknowledge a sequence which has a
    unique sequence ID.  A sequence with lost data frame will be retransmitted.
    Using sliding window, multiple sequences can be transmitted.  This is how we
    keep data frames streaming on a network with long latency time.  The size of
    data sequence is of course network dependent.
    
    Having the data descriptors provided by application software for a
    transaction layer is the greatest benefit of this proposal.  There is no
    data buffering like TCP/IP.  The transport layer does not have to allocate a
    huge buffer to keep data frames streaming on a network with long latency
    delay.  It uses the buffers provided the application software.  It can
    always retransmit a data frame because the application software must stay
    around until the transaction exchange is complete.  In VI, the application
    software allocates a memory segment, gives it a handle, and passes to a
    remote node to allow remote DMA.  Therefore, the data descriptors of this
    transport protocol can be simply a memory-handle for a memory segment
    previously created.  TCP/RDMA is copying this idea.
    
    Each transaction exchange is executed by a NIC driver atomically.  Hundred
    or even thousands of SEND-REQUEST and RECEIVE-RESPONSE can be outstanding in
    the driver.  After sending the SCSI command PDUs, the data and status PDUs
    are handled on demand by the NIC driver.  There is no queue and deadlock
    problem.  The detection of lost data frame is a function of the transport
    layer which specifies the QoS (Quality of Service).
    
    Flow control is done by EE-credit granting by a receiver so no one can
    overflow its resources.  This is the same as the Max---RN discussed in
    iSCSI. Congestion control is managed by alternative NIC or IP endpoints.
    Both should be a part of the transport protocol.
    
    I don't claim any credit about this transport layer protocol.  Every fibre
    channel and Infiniband adapter designer knows about this protocol --
    although there is no standard.  I am sure the TCP accelerator card is doing
    the same.  This protocol is a great alternative to the use of TCP/IP and
    should be incorporated into iSCSI.
    
    
    


Home

Last updated: Tue Sep 04 01:07:14 2001
6315 messages in chronological order