SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Synchronization problem and TCP big window problem



    
    
    Hello,
    
    
    This mail is about three problems outlined in the recent discussions :
    
    a) the lost of synchronization in the TCP byte stream
    
    b) in case of lost or out of order packet, the extra delay
       in the command completion on the
       initiator side. This leading to block
       the initiator because its command window is closed.
    
    c) the quantity of TCP dedicated storage needed to
       cope with a full TCP compliant implementation
       on a fast link with a long round trip time.
    
    About c)
    --------
    The problem is that the quantity of memory needed is
    unlimited. More the link is fast, more the target is far
    from the initiator and more TCP dedicated storage is needed.
    Some calculations several did showed how big can be this memory
    and we don't know where we are going in the future
    the link being faster and faster.
    
    This TCP dedicated memory is needed
    to cope with the out of order or lost datagrams.
    To have good performance one want to use SACK.
    If a datagram is lost, the receive side have to store
    all the byte stream incoming through the TCP pipe until
    the send side re-transmit. This byte stream even
    if acknowledged with SACK has to be
    stored in a temporary TCP dedicated buffer. This is
    because iSCSI can't process it. iSCSI lost the synchronization
    due to the missing datagram. The quantity of memory
    needed depends on the RTT and on the link speed
    hence it can be very big and will get bigger and
    bigger in the future. Hence a large memory buffer
    will be needed just to handle error cases.
    
    
    Proposition to solve these problems.
    ===================================
    
    Add a "pad" command in iSCSI. This pad command is only one
    byte: the opcode.
    
    How does it works?
    ------------------
    
    At the login time the initiator and the target agree on
    two synchronization periods (SPEs). One for each direction.
    
    
    A synchronization period is a number of bytes that
    separates two synchronization points (SPO). At each SPO
    the sender guarantees that it will put the beginning of
    an iSCSI header, eventually adding some padding before
    with the pad command.
    The SPE value is implementation dependent and could be
    determined based on the memory capacity of the receiver.
    Shorter the SPE is, and less memory the receiver needs
    to handle lost or out order datagrams.
    
    On the receiver side when a hole in the TCP data stream
    occurs (datagram lost), the receiver continues to SACK
    the incoming data stream and store it in a TCP dedicated buffer
    up to the next SPO. Then, from the SPO it can start again to
    interpret the data stream and process it. It stops
    copying in the TCP dedicated buffer. That means
    for example, in case of WRITE data, copy the data on disk
    in case of READ data, copy it into the host reception buffer,
    in case of command completion, do the cleanup and so on.
    When it receives the missing datagram it empties the
    TCP dedicated buffer.
    
    
    For example, if the receiver can store up to 5Mbytes of TCP
    dedicated memory per TCP connexion it could choose
    a SPE of 5Mbytes.
    
    In case of bad quality line, if its dedicated memory get
    full (because it got other holes in the data stream after
    the re-synchronization and the first holes have not been
    filled in by the sender), it drops everything new
    it receives till it gets the missing datagrams.
    
    Advantages of this proposal
    ===========================
    1) Reduce the memory needed on the receive side while
      maintaining good performance
    
    2) Cap the memory needed for TCP even with long RTT and
       increasing bandwidth
    
    3) Allow synchronization check each SPE
    
    4) Negligeable loss of bandwidth (padding)
    
    
    
    Regards,
    
    
    Pierre


Home

Last updated: Tue Sep 04 01:08:11 2001
6315 messages in chronological order