SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: a vote for asymmetric connections in a session



    
    
    Dear Mr. Cheng,
    
    I hove some trouble with your note as it contains many items some that
    where
    repeatedly discussed and already agreed upon, some with smaller or larger
    misunderstandings (like the duplex issue - the links are in fact used in
    duplex mode
    even for iSCSI as R2T and data can flow on the same links and outbound and
    inbound data can be used simultaneously with different commands, deadlocks
    are not
    caused by execution speed - or lack of,  a NT miniport serves a port driver
    which in turns serves a class driver, UDP is not more efficient than TCP -
    although is has a better matching datagram model it lacks reliable delivery
    and congestion control, etc.).
    
    I will try to summarize your subject line position for my (and our list
    records) - and please correct if I am wrong:
    
    - you are against the asymmetric model as it requires more work to execute
    a SCSI
    command than the symmetric model.
    
    Regards,
    Julo
    
    "Y P Cheng" <ycheng@advansys.com> on 11/09/2000 02:13:02
    
    Please respond to "Y P Cheng" <ycheng@advansys.com>
    
    To:   John Hufferd/San Jose/IBM@IBMUS, Julian Satran/Haifa/IBM@IBMIL,
          black_david@emc.com
    cc:   ips@ece.cmu.edu
    Subject:  RE: a vote for asymmetric connections in a session
    
    
    
    
    (I apologize for this long response.  However, I hope it worthies your
    reading.)
    John Wrote:
    >I think I understood what you said in the context
    >of the Symmetric model, but could you please take
    >me through how this would occur in the Asymmetric
    >when you have at least two connections?
    Juliano Wrote:
    >A note of caution. The most serious dead-lock sitaution
    >we are aware of steams from a mix of RTT (or should we
    >call it R2T to accommodate Doug Ottis?) and unsolicited
    >immediate data. If channels are full with unsolicited
    >data and the target requests something else - that something
    >else will not get through. This dead-lock, as far as I can
    >tell exists in all transports. A target should be able to
    >detect it and iSCSI has provided for the target
    >to be able to drop data and reclaim them later with R2T.
    
    An asymmetric connection, while doable, makes NO sense in today's
    technologies used in NIC adapters.  Before people flaming me on such a
    statement, I will support my position with my understanding of today's NIC
    and its driver and iSCSI protocol. I don't know enough about mainframe NIC
    design.  Therefore, in a different context, my position could be wrong.
    However, I welcome the teaching of mainframe NIC designs from someone.
    Beyond the position of asymmetric connection, I also take the position that
    failover is a function of a protocol and could be incorporated in iSCSI.
    Load balance is a function a NIC adapter and its driver. Detection of
    incomplete SCSI session due to traffic congestion or lack of resource is
    the
    responsibility of a SCSI initiator, not a SCSI target.  Here are reasons
    for
    my positions:
    
    To understand my arguments, it requires one to understand the context of my
    analysis.  Let's start with understanding of terms used herein.
    "Transport Connection" -- It is a unique pair of endpoints (IP address,
    port
    number), one sending and one receiving.  A SCSI initiator may have many
    connections to send commands to different SCSI targets who in turn have
    many
    connections to different initiators to receive commands.  The SOCKET system
    call returns a handle and data structure which stores an IP address and
    port
    number.  The BIND and CONNECT system calls duplicates the socket structure
    and returns a unique port number. (Note, this is a software port number.
    Later, we will mention the hardware ports on a NIC adapter.)  By
    duplicating
    the sockets and their handles, an initiator or target supports multiple
    connections.   Often, people mistake the SCSI transport connection -- a
    socket -- to TCP connection.  In fact, UDP is a better connection protocol
    for iSCSI as I should argue for later.
    
    "Server Client Protocol" -- A SCSI target is a server which enters a
    passive
    state listening to incoming SCSI commands.  It does so by the SOCKET and
    BIND system calls which establish a receiving endpoint.  A SCSI initiator
    is
    a client which enters an active state to send SCSI commands.  It does so by
    the SOCKET and CONNECT system calls which establish a sending endpoint.
    The
    domain name to IP address conversion in the BIND and CONNECT are provided
    by
    an IP name server or the Address Resolution Protocol (ARP).
    
    "Peer to Peer" -- A SCSI target may become an initiator to start third
    party
    SCSI commands.  Acting as an initiator, it does so by starting a new
    transport connection to another SCSI target.  An iSCSI endpoint is either
    sending or receiving, but never both.  Therefore, a SCSI storage device
    uses
    one connection to receive commands and another connection for send a third
    party copy commands.  If the sending initiator can also act as a target,
    then, we have the appearance of peer-to-peer with two transport
    connections.
    Note, SCSI is never a full-duplex protocol.
    
    "Multi-path Connection" -- If a SCSI target can be reached by more than one
    IP addresses, the CONNECT system call on a SCSI initiator returns a list of
    addresses in the socket data structure.  This list will be used for load
    balance and failover recovery.
    
    "SCSI Session" -- It is a stateless transaction between an initiator and a
    target.  The session has a request and response relationship.  The request
    is a SCSI command and the response is data-transfer-and-status.  SCSI
    commands like mode select and sense and iSCSI messages like security and
    authentication create state information.  But, they are uninteresting in
    this scope of this discussion.
    
    "iSCSI driver" -- A NIC adapter driver supports one or more NIC adapters
    who
    in turn support iSCSI protocols.  For a legacy NIC adapter like old
    Ethernet
    cards, the driver much build the iSCSI messages.  For a new NIC adapter
    like
    the fibre channel adapters or even gigabit Ethernet, the driver simply
    sends
    a SCSI request to the adapter which in turn builds the iSCSI messages.  The
    new NIC adapter can accept a few hundreds or even thousands of SCSI
    requests.  The iSCSI driver is a miniport driver -- in Windows/NT
    terminology -- running under the SCSI class driver which sends SCSI
    requests
    to SCSI devices.  Needless to say, application software or file system code
    send requests to SCSI devices.
    
    "iSCSI NIC adapter" -- A NIC adapter with one or more functional interfaces
    and one or more ports connecting to the IP gateways executes the iSCSI
    requests and responses.  It sends requests for a SCSI initiator and
    receives
    them for a SCSI target.  Therefore, a NIC adapter is running in either
    initiator mode or target mode or in both.  A multiple functional NIC
    adapter
    can accept FCP requests from one functional interface and iSCSI from
    another
    and even VI from a third interface.  A dual-channel adapter will have two
    ports connecting to two different physical paths, say, one to intranet and
    another to internet.  A NIC adapter has transmit- and receive-buffers for
    incoming and outgoing SCSI messages and data. When the receive buffer is
    full, incoming messages will be dropped and lost.
    
    Now, here is my argument for why asymmetric connections makes NO sense in
    the context of the NIC adapter technologies that I understand.  For
    asymmetric connection, if the iSCSI driver is running on old legacy NIC
    adapters, it must send the SCSI command to one adapter and set up data
    transfer on another.  While with great difficulty one may make these two
    adapters talking to each other to coordinate the command and data
    sequences,
    the newer NIC adapters execute hundreds or even thousands SCSI requests and
    responses "atomically."  Therefore, there is no deadlock problem between
    processing commands and data in the context of either a SCSI initiator or a
    target.  Furthermore, even with the NIC adapters built with the latest
    technology having two functional interfaces accepting command and data
    requests separately, there is nothing gained because the SCSI requests are
    executed atomically by the adapters.  In the era of a NIC adapter that
    execute a whole iSCSI request in 25 microseconds, it does not make sense to
    have two NIC adapters to split the command and data processing with the
    coordination itself taking more much time.
    
    For the problem of lost SCSI messages due to traffic congestion, it must be
    detected by the sender who times out the responses, in this case, a SCSI
    initiator.   The congestion problem can not be managed by BB credit used in
    FCP.  For end-to-end connection to a switch or that of an arbitrated loop,
    one can use BB credit to manage the traffic. But, there is no way to manage
    that in an Ethernet connection because the collision avoidance protocol.
    In
    addition, the gateway can loss packets too.  When an initiator is in New
    York and a target in Los Angeles, one can't afford a zero initial BB credit
    due to the long latency time.  With a non-zero initial BB credit, hundreds
    of initiators around the world may send requests at the same time.
    Therefore, traffic jam and lost of packets must use smooth recovery in
    iSCSI.  Only the initiator sends requests and target only returns
    responses.
    Therefore, it is very easy for initiator to detect the lost of messages by
    setting proper time out values.  A target must accept at least one request
    from an initiator; it must manage its resource allocation with RTT.
    
    Once a time out on a SCSI request is detected by an initiator, the
    microcode
    on the NIC adapter is quite capable of sending the request again, even on
    another path for failover recovery -- if the adapter has a second port to
    reach the same target.  If not, the NIC iSCSI driver can try another path.
    In resending the request, yes, the issue of duplicated requests must be
    managed.  However, this is a well-understood problem when retry is allowed
    in a protocol.  Notice, I never say the SCSI target will initiate a retry.
    If necessary, the target always sends a status message requesting the
    initiator to retry.
    
    Similar to retry, for load balancing the NIC adapter microcode and the
    iSCSI
    driver of an initiator is quite capable of selecting a different port or a
    NIC adapter to send a SCSI request as long as the adapter or the driver are
    made aware to the multiple paths in the socket data structure which was
    filled at time of making the connection.  To keep the design simple, the
    target does not, should not, or must not take on the responsibility of load
    balance.
    
    On stripping the data transfers on multiple connections, I do believe we
    are
    using four 2.5 gigabit MAC chips to get the 10 gigabit fibre channel,
    Ethernet, and InfibiBand connections.  In fact, the 12x option of
    InfiniBand
    stripes the data on 12 MAC chips to get three gigabyte per second data
    rate.
    Stripping data across multiple NIC adapters would be too difficult for the
    poor adapter designers to do.
    
    On using UDP instead of TCP for iSCSI, I am having trouble with the TCP
    because it is stream based.  The READ and WRITE system calls are extremely
    inefficient for  block oriented SCSI data transfer.  On the other hand, the
    UDP datagram using and SEND and RECEIVE system calls is better suite for
    iSCSI.  In fact, I believe NFS is built on UDP.  The request and response
    relationship between SCSI initiator and target makes the connectionless UDP
    protocol possible.  FCP is implemented using the class 3 fibre channel
    protocol which is designed for datagrams.
    
    Finally, comments on the resources used by the initiator and target.  A
    SCSI
    initiator has  self-regulating resource allocation.  Where there is no
    resources to start new processes to initiate new SCSI requests, the SCSI
    requests cease.  For each SCSI request, the required resources are
    pre-allocated waiting for responses from a target.  A SCSI target receives
    requests from everyone on the net.  While it must have room to accept new
    SCSI requests -- which can be done at login by specifying the queue depth
    --
    it needs RTT (R2T) to control the buffer space for data transfers.  This
    position has already been expressed by many storage controller people.  I
    need not repeat the position here.
    
    Y.P. Cheng, CTO, ConnectCom Solutions Corp.
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:07:25 2001
6315 messages in chronological order