SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI: Flow Control



    Pierre,
    I don't know if I completely understand what you are proposing,
    but is seems that you are proposing to process TCP segments out
    of order.  As I have said in a previous message, this is extremely
    dangerous as the TCP layer will not ACK any segment until all previous
    segments are processed. Without an ACK the segment may be retransmitted
    many times and that will require iSCSI to track what has been processed
    and what has not by adding a segment number.  Essentially duplicating
    TCP segment numbers, SACK doesn't help either.
    
    If you are proposing that iSCSI simply keep receiving data without
    doing the SCSI layer processing and thus at the SCSI layer processing
    them out of order.  This is feasible but it is still subject to buffer
    restrictions that would cause data to be discarded which is the whole
    point of these flow control discussions, to minimize data being
    discarded.
    
    What I strongly object to is any feature in the iSCSI layer that
    requires
    any direct manipulation of the TCP layer features like the window
    pointers.
    Implementations are free to violate layering as an optimization but
    it MUST be possible to have a functional implementation without
    knowing any details of the TCP implemenation.
    
    	-David
    
    Pierre Labat wrote:
    > 
    > julian_satran@il.ibm.com wrote:
    > 
    > > Pierre,
    > >
    > > You are wrong again. When the target reopens the window - i.e., reads some
    > > data from the
    > > pipe at his end you get to put your Read command - but it goes after the
    > > rest of the window and
    > > window can be several megabytes.
    > 
    > Julian,
    > 
    > The TCP window is not a buffer on the receive side.
    > On the receive side, in our case (the target) and as far as TCP segments
    > arrive
    > in order, there is not an opaque  FIFO containing a full window size of
    > command/data
    > waiting to be processed. You can avoid that.
    > What the target does is: receive bytes through the TCP connection, does the
    > TCP work
    > and forms a iSCSI PDU. The maximum you have to store is a few TCP segments
    > to re-build the PDU. As soon as the PDU is built it is processed.
    > When the target wants to close the TCP window it updates accordingly the
    > window and CONTINUEs to process the incoming PDUs.
    > At that point you assume that the incoming PDUs are put in an opaque FIFO, but
    > 
    > rather than that,  the target can process them and put the data a the right
    > location in the target cache.
    > Then, when the window is opened again and the read PDU comes, it is processed
    > immediately.
    > 
    > In fact as Y P Cheng described in a previous mail in this thread, the model
    > that
    > can be used for iSCSI traffic is different of the common model we have for
    > regular
    > TCP/IP networking although a  TCP fully complient with the  RFCs can be used
    > for iSCSI.
    > In regular TCP/IP networking the application (on the transmit side) fills
    > a FIFO that the adapter empties. In our case as explained by Y P Cheng
    > you replace the FIFO by an "exchange table" what i called a flat
    > array. It allows you to avoid the head of queue blocking at this level.
    > 
    > On the receive side (the target in our case) in regular networking,
    > the incoming data are tossed in a FIFO by TCP. The application
    > empties this FIFO and can block (in this case the FIFO grows)
    > and yes, when the application unblock, it has a large amount
    > of PDUs to process.
    > But in the model described the application never blocks. Hence there is no
    > big receive opaque FIFO on the target. In our case the application is the
    > module that
    > process the iSCSI pdus. The application never blocks because it is able to
    > pace down
    > the flow coming from the initiator with the TCP window and the command
    > flow control (MaxCmdRN).
    > 
    > Regards,
    > 
    > Pierre
    > 
    > >
    > >
    > > Pierre Labat <pierre_labat@hp.com> on 10/10/2000 19:50:48
    > >
    > > Please respond to Pierre Labat <pierre_labat@hp.com>
    > >
    > > To:   ips@ece.cmu.edu
    > > cc:
    > > Subject:  Re: iSCSI: Flow Control
    > >
    > > Julian_Satran@il.ibm.com wrote:
    > >
    > > > Pierre,
    > > >
    > > > The only point you are missing is that the TCP window may be closed when
    > > > you want to send your
    > > > Read command
    > >
    > > Julian,
    > >
    > > Yes, but as soon as the target re-open the window it receives the read
    > > first.
    > >
    > > > and even if not it will reach the other end after all the data
    > > > before it
    > > > regardless of how clever your adapter is.
    > >
    > > The time used to reach the other end of the wire (for the read in our case)
    > > is the same if there was data sent on the wire before or not. On the
    > > target, as soon as the read is sampled from the wire it can be
    > > processed.
    > >
    > > Regards,
    > >
    > > Pierre
    > >
    > > > The FIFO you have in mind is
    > > > certainly not
    > > > equivalent to the pipe capacity.
    > > >
    > > > Julo
    > > >
    > > > Pierre Labat <pierre_labat@hp.com> on 10/10/2000 02:58:41
    > > >
    > > > Please respond to Pierre Labat <pierre_labat@hp.com>
    > > >
    > > > To:   ips@ece.cmu.edu
    > > > cc:
    > > > Subject:  Re: iSCSI: Flow Control
    > > >
    > > > Julian_Satran@il.ibm.com wrote:
    > > >
    > > > > Pierre,
    > > > >
    > > > > It does not matter how from where you send the data on the wire.
    > > > > If you have a long wire and you want to cover the latency you will
    > > > > send data as soon as you can and then commands get stuck  behind.
    > > >
    > > > Julian,
    > > >
    > > > The command can NOT  be stuck because there is "data on the wire".
    > > > Let me give you an example,
    > > > Let's talk again about the "pull model" adapter on the initiator.
    > > > Imagine you have 100Mbytes of (write) data outstanding
    > > > because 1000 cmds of large write commands have been posted to
    > > > the adapter.
    > > > The adapter sends this data as fast as it can. But very important,
    > > > the data are not tossed in any kind of buffer on the adapter.
    > > > What the adapter does is: pull some kbytes of data form host memory,
    > > > encapsulate it, send it on the wire. Again and again, as fast as it can.
    > > >
    > > > Now, imagine that a read is posted to the adapter after the 1000 writes.
    > > > Here is the point. The interface between the host and the adapter is not
    > > > a FIFO but a flat array and the adapter can works in parallel on
    > > > all the commands. Immediately when the host posts the read
    > > > (in the flat array), the adapter sees it. The adapter as soon as it
    > > > completes transmitting the current data PDU, sends the read command.
    > > >
    > > > The read command is not stuck behind the 100Mbytes of data.
    > > > The maximum latency for the command is the time to
    > > > transmit one iSCSI pdu on the wire.
    > > > That is (size of pdu)/throughput.
    > > >  Then the adapter continues to send the write data of the
    > > > 100Mbytes. And as soon as a new command will be posted,
    > > > it will send a command pdu immediately after the current
    > > > data PDU.
    > > >
    > > > Commands are not stuck behind data because there is no FIFO
    > > > before the wire, and because data "on the wire" doesn't block anything.
    > > > The wire is always able to deliver its throughput.
    > > >
    > > > Regards,
    > > >
    > > > Pierre
    > > >
    > > > >
    > > > >
    > > > > And nobody is suggesting you should park the data on the NIC card if
    > > > > you know better.
    > > > >
    > > > > Julo
    > > > >
    > > > > Pierre Labat <pierre_labat@hp.com> on 09/10/2000 20:41:14
    > > > >
    > > > > Please respond to Pierre Labat <pierre_labat@hp.com>
    > > > >
    > > > > To:   Julian Satran/Haifa/IBM@IBMIL
    > > > > cc:
    > > > > Subject:  Re: iSCSI: Flow Control
    > > > >
    > > > > julian_satran@il.ibm.com wrote:
    > > > >
    > > > > > Pierre,
    > > > > >
    > > > > > Sorry I missed a point about a - I though you where saying that
    > > > > unsolicited
    > > > > > data
    > > > > > are not allowed. On this we are in agreement.
    > > > > >
    > > > > > On the rest - I can hardly follow. The model you suggest while valid
    > > in
    > > > a
    > > > > > close
    > > > > > scheme like a bus or short serial connection - in which the target
    > > > > fetches
    > > > > > data is closely matched by th R2T for data with no such match for
    > > > > commands.
    > > > > > Keeping track of how many commands where shipped for what LU is
    > > > > impractical
    > > > > > as we don't what per-LU state at the initiator (for the same reason
    > > we
    > > > > > rejected
    > > > > > the connection per LU model).
    > > > > >
    > > > > > As for D - the point is that when you have a command to send and the
    > > > > > command window
    > > > > > is open you might have to wait a long time as the TCP window is
    > > closed
    > > > > > and/or you have
    > > > > > a lot of data ahead.
    > > > >
    > > > > I think there is a misunderstanding about the model i was talking
    > > about.
    > > > > It's a pull model as implemented in some FC cards today and it is
    > > assumed
    > > > > that
    > > > >
    > > > > TCP/IP is handled on the adapter. It is the "no memory on adapter"
    > > model
    > > > > Somesh talked about.
    > > > >
    > > > > When a command comes out the SCSI layer, it is posted to the adapter.
    > > > > At this point it is not posted in a queue but in a flat array of
    > > > commands.
    > > > > The data is till in host memory.
    > > > > Let's assume the card can handle 1000 commands in parallel, the array
    > > > > has 1000 entries.
    > > > > The adapter is able to process this commands the way it wants
    > > > > as far as it respects the protocol (iSCSI in our case). It could
    > > > > be able to process them all in parallel if needed.
    > > > > As it is a flat array, no commands are blocked by an other commands
    > > > > or data. The adapter can pick (pull) whatever command or data
    > > > > from host memory and send
    > > > > it on the wire (again as far as it respect the protocol).
    > > > >
    > > > > Regards,
    > > > >
    > > > > Pierre
    


Home

Last updated: Tue Sep 04 01:06:42 2001
6315 messages in chronological order