SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    single vs multiple channels for iSCSI commands



    
    
    
    
    Proposal to support single Control Channel with multiple Data  Channels
    in the iSCSI protocol.
    
    by Kalman Meth
    27 June 2000
    
    In our discussions on the iSCSI protocol, we came to the conclusion that
    we needed to send data over multiple channels in order to make best use
    of the available network resources. We also were inclined to
    have all of the channels acting in a symmetric manner so as to simplify
    the protocol by not having to deal differently with some channels.
    This allows vendors to introduce uniform iSCSI NICs for all of the
    network connections that will be exploited by iSCSI.
    
    We decided on allowing commands to be sent over any of the multiple
    connections, with the command's data and status being sent in the same
    channel that was used to issue the command.
    The use of multiple channels to pass commands introduced a complication
    of servicing the commands on the receiving end in the original order
    that the commands were issued. We had a further complication when one
    of the connections failed; how do we determine which command got lost
    on a broken connection, and what actions are required to recover from
    the failed connection. The solution we found to these problems
    (introducing a Command Reference Number and placing the commands back
    in order on the receiver's end) introduced flow control problems,
    such as maintaining a window on commands to ensure that we don't overrun
    the reference count, and that we don't block up all of the channels
    just because one channel failed and its lost command causes us to fill
    up the command queue on the target (while we wait for the lost command
    to arrive).
    
    I would like us to go back and consider a variation of the model we
    originally proposed with one Command Channel and multiple Data Channels.
    Some ideas that came up during our discussions are included below and
    also apply to the symmetric model.
    
    Session establishment: as in existing draft.
    Naming: as in existing draft with adjustments from design discussions.
    security: as decided in design discussions.
         (0) none
         (1) challenge/response
         (2) IPSec or SSL
    
    
    
    Normal case:
    
    An iSCSI session between and initiator and a target consists of a
    number of TCP connections. Each TCP connection between initiator and
    target requires an iSCSI login. The first established connection of a
    session between initiator and target (numbered 0) is the Control
    Connection (also called Control Channel).
    Subsequent connections between the same initiator and target can be
    added to an existing session upon request of the initiator during login.
    These connections are numbered 1,2,3, etc, and are called Data
    Connections (also called Data Channels).
    An initiator may establish several sessions with the same target, each
    session having its own Control Channel and its own set of Data Channels.
    
    All SCSI commands and task management messages will go over the Control
    Connection. Order is maintained within a single session by virtue of all
    commands going through the same TCP connection.
    The iSCSI packets for RTT and Data may go over any of the channels.
    iSCSI Login must be performed on each of the connections.
    iSCSI Ping may be performed over any of the connections.
    
    It is recommended that large data transfers be performed on the Data
    Channels (rather than the Control Channel) so as to ensure that the
    Control Channel is always free. It is permissible, however, to
    establish a single connection and perform all iSCSI operations on that
    single channel.
    
    On a READ or WRITE command, the initiator specifies on which channel it
    expects to perform the data transfer. This gives the initiator and
    target a chance to set up buffers for DMA ahead of time.
    Once a data transfer for a particular SCSI command begins on a
    particular Data Channel, all subsequent data that is transferred for the
    same SCSI command is to be transferred over the same Data Channel.
    On RTT, the target confirms on which channel it is expecting the data
    transfer. An RTT request will be sent over the same channel as the
    expected data transfer (as was specified by the initiator).
    If the target decides (for whatever reason) that it wants to receive the
    data transfer on another channel, it sends the RTT over the Control
    Channel with an indication as to which Data Channel it wants to use.
    It is understood that this may entail a performance
    cost on the initiator's side to now move the data transfer to another
    Data Channel (which may be another NIC, thus requiring DMA to be set
    up all over again). A target will usually change the connection for
    a data transfer only in case of some problem it has with the originally
    specified connection (unresponsive connection, or couldn't handle
    large amount of data on specified connection, etc).
    
    Commands may be sent with immediate data (in the Control Channel) if the
    immediate data is small (say less than 8K), thereby avoiding the need to
    later match up the data with the corresponding command. A bit in the
    iSCSI command header indicates that there is immediate data.
    An initiator may also send unsolicited data (no RTT) over the Data
    Channels, in case the initiator and target have agreed (during login
    on the Control Channel) to not use RTT.
    
    The initiator and target may renegotiate the use (or non-use) of RTT
    between commands, using an iSCSI Text command.
    The initiator sends the request to the target and does not send any
    other commands to the target until the target has responded.
    The change in using RTT will take affect with the command following the
    response of the target.
    
    The status of a READ command is sent with the last data packet,
    thus allowing hardware implementations to perform a single interrupt
    when the entire data transfer has completed.
    Similarly, a flag in a data packet sent from initiator to target
    indicates the last data buffer in an unsolicited WRITE operation.
    If the initiator sends unsolicited data for a WRITE operation
    (i.e. without an RTT) over one of the Data Channels, it is possible
    that the data will arrive before the command arrived on the Control
    Channel. It is also possible that the target will not have enough
    buffers to receive the unsolicited data. The target has the option of
    placing the unsolicited data in reserve buffers or of completely
    discarding the data. If the target discards the data, the target will
    later issue an RTT to instruct the initiator to resend the data.
    
    
    Multiple iSCSI NICs:
    
    One argument to support the symmetric model was to allow having
    identical iSCSI NICs to handle all iSCSI connections. In the
    symmetric model, since all channels look alike, all of the (identical)
    NICs can be fully utilized.
    
    We argue that even in the model with one Control connection and many
    Data Connections that we can still utilize the NICs to their maximum.
    
    The main operations to be implemented by iSCSI NICs will be to send
    data packets and RTTs. Data Channels can be spread across these iSCSI
    NICs. The less frequent iSCSI operations (and especially recovery)
    can be performed in software in a device driver.
    Note also that a Control Channel and a Data Channel can
    go over the same wire (NIC) even if they are different TCP connections.
    In order to handle additional iSCSI operations in hardware,
    vendors can introduce fancier NICs that also handle some other iSCSI
    operations.
    
    A target may use one NIC to handle the Control channel from one
    initiator, and another NIC to handle the Control channel from another
    initiator. Thus, even if all NICs can handle the entire iSCSI set,
    they can still be utilized to the maximum by using each NIC for the
    Control Channel of a different session. Similarly, if an initiator has
    devices on several targets, it can use each NIC to handle the Control
    Connection of a different session.
    An initiator can  also open multiple sessions with the same target
    using a different NIC for the Control Channel of the different sessions.
    
    Recovery:
    
    An initiator must hold on to data it has sent via a WRITE operation
    until it has received the status for the corresponding command.
    Even if the initiator sends immediate data (in the Control Channel) or
    unsolicited data (in one of the Data Channels), the target may discard
    the data in case it didn't have the resources to handle the data at that
    instant. The target may then request that the data be resent with an
    RTT.
    A target need not keep a copy of the data buffers it has sent, if
    such data can be regenerated from the storage device.
    However, the target must keep around the status information until it has
    been acknowledged by the initiator. The initiator sends Status Ack info
    (a new iSCSI message type) over the Control Channel.
    If strict ordering between commands is needed (such as reading and
    writing of the same device) then the application must perform the
    proper synchronization by not issuing the second command until it has
    received the status of the first command (as in linked commands).
    
    If it seems that a connection has stopped functioning, then either
    the initiator or the target may issue an iSCSI Ping command to determine
    if the connection is still alive. (A bit in the Ping header determines
    which side initiated the ping operation.) If the Ping operation times
    out, then it may be assumed that the connection is not functioning
    properly. When a Ping operation fails, the connection should immediately
    be closed.
    
    Note: It is not required to support iSCSI level recovery.
    It is sufficient for the initiator to report failure for the commands
    that did not complete and let the upper layer protocol handle the
    recovery.
    In this case, all channels of the session should be  closed,  all data
    structures should be cleaned up, and a  new  session  may be established
    between the initiator and target.
    
    There is an advanced recovery mechanism that MAY be implemented by
    the initiator and target, as described below.
    
    Data that was sent over a failed Data Connection will have to be
    resent over another Data Connection.
    On a WRITE operation, the target will eventually issue an RTT over the
    Control Connection to inform the initiator as to which other Data
    Connection to use. (Is it OK to wait for the target to figure out that
    the connection is down? Can the initiator somehow bring this to the
    target's attention?)
    On a READ operation, the initiator will indicate to the target which
    data it wants resent from the failed data transfer. This is done
    using an RTT (sent from the initiator to the target over the Control
    Channel) to resend the data from some preivous READ operation.
    (This is the only time an RTT is sent from the initiator to a target.)
    (Since the status of that READ operation did not arrive at the
    initiator and it was never acknowledged, the target will have kept
    the relevant information about the corresponding command.)
    
    If the Control Channel stops working properly (agin, determined by a
    time out on an iSCSI Ping operation) then the initiator must know
    which commands made it to the target and were not lost, which commands
    were completed whose status got lost, and which commands never made it
    to the target.
    
    Upon setting up a new session (by establishing a Control Channel), the
    initiator may specify whether it is in fact starting a new session or
    taking over an existing session. When taking over an existing
    session, the initiator must specify the identifiers of the session to
    be taken over.
    The target then stops transmitting on the old Control Channel, and
    transfers all of the old session resources to the new Control Channel.
    
    The target returns to the initiator the Initiator Task Tag of the last
    command it received on the old Control Channel.
    
    For each command that was sent to the target before the specified
    Initiator Task Tag, the initiator queries (new iSCSI Query command)
    about the status of that command.  (The information about the status of
    those commands will not have been discarded since the target never
    received an Ack about them from the initiator.)
    Some of the commands may have had incomplete data transfers (use special
    iSCSI status code), and the target and initiator will re-issue RTTs to
    recover the data from those commands. Once the initiator has received
    (and acknowledged) the status of all pending commands, the initiator
    sends an iSCSI Sync message to the target to inform it that they are
    back in sync, and that all commands before the specified Initiator Task
    Tag have been satisfactorily accounted for.
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:08:13 2001
6315 messages in chronological order