SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: ISCSI: Urgent Flag requirement violates TCP.



    Y.P.
    
    TCP does not technically allow zero copy as PDU headers may be split across
    segments and urgent pointers may coalesce beyond received segments.  You are
    advocating modification to TCP specifically for iSCSI and appear to advocate
    segment alignment for PDU headers with a one packet at a time standard.  Do
    you wish to see the TCP sender and receiver modified as indicated within the
    proposal?  In your view TCP software must be modified so why stop at
    specifying the urgent pointer, include frame alignment.  Clearly, you view
    wire compatibility the only constraint.  Is TCP just a wire specification?
    
    At the initiator side, status should be held pending missing data and at the
    target, commands must be held and only data where the command has been seen
    or the R2T has been sent can data be processed out of sequence.  There would
    be no way to know the queue status of the missing command and so all
    commands and status must be held.  If processed at the network level, the
    status sent with data becomes difficult as this status must be sent first.
    
    Doug
    
    > Dave Black wrote:
    > > It is not clear to me that the Urgent feature is "required for
    > > interoperation or to limit behavior which has potential for
    > > causing harm". I'm prepared to be convinced otherwise, and would
    > > like to hear from implementers other than Matt on this subject,
    > > and specifically comments on his statement that:
    > >    "... high speed implementations will require framing in order
    > > 	to prevent a massive amount of buffer resources to 'buffer up' TCP
    > > 	segments that arrive after a dropped TCP segment."
    >
    > My apology for taking so long to respond to this request.
    >
    > I support Matt whole heartedly.  A TCP-Offload-Engine (TOE), a
    > hardware-aid
    > TCP implementation with zero-copy function, is essential for Gigabit-plus
    > Ethernet and Fibre Channel and InfiniBand adapters supporting iSCSI over
    > TCP.  Using urgent bit to identify the beginning of an iSCSI
    > message enables
    > the TOE adapter to parse an incoming TCP/IP segment quickly and deals with
    > out-of-order and duplicated frames efficiently. Most arguments against
    > Matt's position were based on existing software TCP implementation.  While
    > supporting TCP 100%, the TOE adapter does require some changes to the TCP
    > implementation at installation. The changes are necessary to enable the
    > zero-copy function.  However, an TOE adapter with its hardware
    > and software
    > will inter-operate with any existing TCP implementation on any client or
    > server by following the TCP spec.
    >
    > An TOE is a multi-function adapter that supports both TCP/IP and
    > iSCSI.  The
    > NFS implementation with UDP or TCP over IP can be supported by a
    > scatter/gather DMA list which splits the TCP/IP header from the
    > data payload
    > such that the latter is copied directly from and to the NFS cache buffers.
    > This is the essence of the zero-copy function. To deal with
    > out-of-order and
    > duplicated frames, the TOE adapter works with one IP packet at a
    > time with a
    > score card that tracks all incoming segments.  The maximum IP
    > packet size is
    > 65K.  Some implementations prefer "jumbo" packets or frames.
    > Inside each IP
    > packet, the TOE adapter finds the UDP/TCP headers.
    >
    > For iSCSI support, the TOE adapter receives its SCSI requests
    > directly from
    > the iSCSI driver instead from a TCP/IP driver.  To avoid
    > duplicating the TCP
    > run time parameters, the "OPTIONAL" second connection will be
    > used for SCSI
    > commands while another TCP connection will be used for login, logout, and
    > other task administrative matters.  Needless to say, there are many
    > concurrent sessions to many different targets or initiators.  Hundreds or
    > thousands of concurrent SCSI commands to different TCP endpoints are
    > implemented by an exchange table as I have discussed before this posting.
    > The adapter must move the incoming or outgoing TCP segments at
    > the speed of
    > the media and generate ACK's (or SACK's) quickly.   As a target, the TOE
    > adapter passes incoming SCSI commands directly back to the waiting
    > application software like RAID or tape or JBOD storage devices.  For
    > outgoing frames or packets, the TOE adapter creates TCP segments with IP
    > headers.  It may bundle several iSCSI PDUs into one TCP/IP
    > segment destined
    > for the same target.  For incoming frames, the TOE adapter must know the
    > iSCSI message boundary.  This is why the urgent bit is extremely useful.
    > Without it, the TOE adapter must buffer the whole IP packet before it can
    > process an iSCSI header.  While an TOE adapter can deal with a
    > 65K IP packet
    > with ease, the "jumbo" frame places a large SRAM demand on the adapter.
    > Several incoming packets from different sources just aggregate the SRAM
    > requirements.  For iSCSI, jumbo frames are very useful for clients or
    > servers thousands miles away.  The TOE adapter must deal with hundreds of
    > jumbo frames inflight.  We are dealing with many gigantic TCP windows from
    > many connections.
    >
    > Many objections of the "MUST" word were based on out-of-order delivery of
    > SCSI commands.  For outgoing SCSI commands, the TOE adapter will deliver
    > them in the same order received from its iSCSI driver.  There is
    > no problem
    > here.  For incoming SCSI commands, while honoring the TCP
    > sequence numbers,
    > an TOE adapter operates in the same manner as current SCSI, 1394,
    > and fibre
    > channel adapters, meaning, no guarantee to in-order command execution.
    >
    > To illustrate this, lets use an example of command A being followed by
    > command B closely.  For a SCSI adapter, if a target gets the
    > command A with
    > bus parity check, it would return check status to command A and proceed to
    > accept B happily, even there is dependency between command A and
    > B.  For an
    > initiator device, the check status on A never blocks the delivery of B.
    > This out-of-order deliver is OK because if B depended on A, all
    > file system
    > software would hold command B until the completion of command A.
    > For a 1394
    > adapter commands A and B are stored in two ORB's.  A target 1394
    > device will
    > fetch the ORB's.  Again, after encountering error in fetching the
    > ORB for A,
    > a target device will proceed to fetch the ORB for B.  For a fibre channel
    > adapter, an initiator will send commands A and B in two separate FCP_CMD
    > frames.  If frame A arrives with bad CRC, a target device simply
    > throws the
    > frame away and proceeds with execution of command B, if it is arrived with
    > good CRC.
    >
    > Therefore, as Matt has stated, it is OK to deliver command B even if the
    > command A segment is still missing.  The urgent bit allows us to do that.
    > One objection to out-of-order delivery uses the aborting of a non-existing
    > command as an example.  But, abort is never deterministic.  An aborted
    > command may be either non-existence or already completed.  This theme is
    > known to all adapters today.
    >
    > I do believe the iSCSI WG should facilitate the implementation of TOE
    > adapters as well as accommodate the traditional TCP implementation.
    > However, in either cases, no TCP changes.  (Personally, I would
    > like to see
    > some changes.  But, I learned my lessons earlier on this. :-))
    >
    > Y.P. Cheng, CTO, ConnectCom Solutions Corp.
    >
    >
    
    


Home

Last updated: Tue Sep 04 01:06:26 2001
6315 messages in chronological order