SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Comments to Comments!



    
    
    
    Jack,
    
    Thanks for your attention and detailed comments. I sincerely hope that we
    could work all together to get to a better standard.
    
    And here are our thoughts as expressed by several authors:
    
    
    --- start forwarded message by harwood, jack ---
    > From: "harwood, jack" <harwood_jack@emc.com>
    > To: ips@ece.cmu.edu
    > Subject: Comments on the current iSCSI draft
    > Date: Fri, 10 Mar 2000 20:26:05 -0500
    ...
    > Architectural
    > -------------
    > * There is an issue with the separation of the Control and Data Channel.
    NAT
    > (address translation), firewall, or load balancing products will not
    support
    > iSCSI without changes which in turn is a barrier to adoption for large
    > networks.  If the goal is to provide interleaving of control commands
    with
    > large data transfers we feel this can be accomplished in other ways.
    >    - Use smaller data frames to allow better interleaving of control and
    >    data on a single connection
    >    - Use multiple connections between the same source and destination
    pair
    >    where each connection is independent of other connections
    >    (i.e., data/control are combined on each connection).
    > Separation of control and data also adds new failure modes where one
    channel
    > closes but the other does not.
    
    
    True, separating data from control introduces some new problems that could
    be avoided if we interleave. We briefly considered such a design aiming at
    one TCP connection per LUN.  But this is inordinately expensive.
    
    If we multiplex LUNs (as we do in the current draft) keeping to a short TCP
    frame will leave as open to all sorts of troubles (possible deadlocks) due
    to the limited TCP window and our lack of control over the data source and
    sink. Separating the control and data stream we could resort to selective
    resets to get out of trouble - while with a common connection we might have
    to resort to radical means (e.g., closing connections).
    
    In addition in a "permissive" environment (like a video server) we might
    require CRC on the control connection while leaving the data connections up
    to the user.
    
    It is a bit more difficult to implement but worth the trouble.
    
    > * The use of DNS addressing in the protocol as described in sections
    3.13,
    > Open Data Connection, and section 3.17, Third Party Copy, will force all
    > parties to depend on DNS in order for the protocol to work. While system
    and
    > network administrators should be free to make this choice (and invest the
    > effort in making DNS suitably robust), this protocol design should NOT be
    > based on the assumption that DNS is a robust highly available service.
    The
    > protocol should be based on IP addresses.
    
    It is true that the system recommends using DNS.  However, the
    administrator is free to choose names such as "123.45.67.89" and the
    initiators and targets will interpret that as IPv4 (or IPv6) as necessary.
    
    It was felt that we should be completely independent of IP addresses
    because of firewall and IP masquerading issues with setting up new TCP
    connections. IP addresses /can/ be used, but only in dotted decimal
    notation.
    
    Note that no addresses need be provided for simple systems, and all
    key:value pairs can be safely ignored by the target.
    
    > Conceptual
    > ----------
    > * The iSCSI protocol requires a strong authentication mechanism. In its
    > current form, without an implementation and corresponding specification,
    it
    > is impossible to write an interoperable authentication implementation
    from
    > the document as it stands, hence at least one strong authentication
    > mechanism must be mapped onto the protocol, possibly in a separate
    document
    > or documents.
    
    Correct.  We decided to make a flexible framework for authentication,
    rather than specify a particular method.  Specific authentication schemes
    could be described in other documents.
    
    We briefly considered (and are not outright rejecting) other schemes - most
    notably the one used in SST (SCSI over ST) in which in fact the connection
    can go through 3 stages - Idle - Authenticating - Active. 1 bit in the
    login indicates if the authentication is required and gets the state
    machines in either the Authenticating stage or the Active stage. The
    standard does not address how you go from authenticating to Active.
    This design enables non-authenticating machines to interoperate and leaves
    open the whole authentication process to other standards. We felt that we
    have to have a minimal authentication specifies at least to avoid "good
    faith" mistakes but we are open to discuss this in the working group at
    some length.
    
    > * The parameter negotiation, described in sections 3.9-12, is very
    general.
    > The free-form text/value format will cost code to parse and may not be
    > justified.
    
    We designed the system so that any non-responses to TEXT commands are
    considered as not supported.  On targets or initiators where text:value is
    too complex, a set of defaults should be chosen and no TEXT commands
    supported.  For targets, the MODE SELECT can set SCSI-like things.  The
    TEXT command covers Network-like things.
    
    > * The action of killing all outstanding IOs on a login or operation
    timeout
    > seems too severe for this process and provides an opening for a denial of
    > service attack.  Also there is no other rationale in the document as to
    why
    > this semantic is useful.
    
    I assume you are referring to what is written in the section on Error
    Handling (section 4.0).
    Denial of service is a problem inherent in all IP based
    protocols, and we cannot completely solve it.
    The initiator can wait a long time before it determines that it has timed
    out.
    TCP ensures ordered delivery as long at there is a connection. What other
    alternative is there other than to completely clean up, once it has been
    decided that we have a connection problem?
    
    
    > * A general mapping of error recovery for iSCSI is needed, i.e. what
    parts
    > need definition versus what will use TCP error recovery mechanisms.
    
    Did you have a particular situation in mind that iSCSI does not cover?
    
    > * In section 3.17, Third party copy needs a much better explanation about
    > authentication, login and how the entire process works.
    
    Again, this is a framework.  When devices start offering third party
    commands that go beyond the provisions of iSCSI, we will extend it.
    We know about and we think we covered the extended copy commands considered
    by the SCSI working group.
    
    > Specifics
    > ---------
    > * It should be stated specifically in sections 2.4 and 3.8 that iSCSI
    data
    > segments cannot overlap.
    
    We agree that the iSCSI should state that data segments should not overlap
    (and will do this in the next version). However we would be reluctant to
    require that receiver implementations check for this type of error and
    report it in the status. Is this acceptable?
    
    
    > * The expected data length and flags, i.e. command direction, should be
    > described in the SCB itself and not as separate fields in the SCSI
    command,
    > see section 3.3.
    
    As stated by SAM the SCB contains only the number of data blocks not the
    transfer length. SAM also mandates that the "execution request" include the
    data length and CAM (as well as other standard software interfaces) require
    a residual count report with reference to the length. It make all the
    implementations "more compliant" to include the length.
    For all hardware bridge providers it makes also more sense to have the
    length and direction in a "common" header than to scan SCBs.
    
    
    
    > * Using the task tag and TCP connection 4-tuple (source and destination
    IP
    > addresses and ports) we should have a fully qualified identifier and
    should
    > not need LUN number in the response and task management response, see
    > section 3.3 and 3.6.
    
    You are right - it was so many times on and off! It ended up being there to
    make all controls "target-to-initiator" identical. The last reasoning
    behind getting it in was a "proxy LUN" - i.e. the work was done by a "third
    party". If the returned LUN disagrees with the transmitted LUN
    then it may mean that a proxy satisfied the request.  However, we have not
    specified what action should be taken and I cannot at present think of
    anything useful to do with any proxy-LUN information. We (the working
    group, including you hopefully, in its infinite wisdom!) might decide to
    remove it.
    
    > * The LUN number should be embedded in the data for the AEN, see section
    > 3.4.
    
    We do not specify what goes in the data that is sent in an Asynchronous
    Error Notification. I think SAM-2 requires LUN to be specified (as a
    parameter). We want to be independent of whatever data is packaged, and we
    therefore have to specify the LUN in the header.
    
    > * In section 5.1 a recommendation is made to use 8k as the upper limit
    for
    > small TCP segments.  Depending on the MTU size this recommendation may
    cause
    > fragmentation.  More detail and analysis are needed to justify this
    > recommendation.
    
    8k is an upper limit. If MTU size is smaller, then a smaller data size
    should be used, as implied by the note to the implementer. 8k is also an
    upper limit for good CRC algorithms (perhaps 8k is too big for this also).
    We welcome a more detailed analysis to provide a better recommendation.
    
    > * A standard CRC should be required, see section 6.1.
    
    A agree that a good CRC is a thing to have. I think that a TCP-CRC should
    be mandated for the control channel. This should be set when
    opening the TCP connection for the control channel. There are cases where
    CRC is not desirable for the data connection, as when transferring
    transient voice or video . Hence there ought to be some kind of negotiation
    as to whether CRC will be used for the data channel (like a parameter for
    open). Let's talk some more about it.
    
    > * The target should not gets its name from the initiator, see section
    10.1.
    
    The target can ignore any key:value pairs sent by the initiator, so it need
    not receive its name from the initiator. This feature is useful in case the
    target is actually a front end for many machines and/or disks, in which
    case the initiator can specify to which target it really wants to interact
    with.
    
    > * Section 10.3 needs to provide details on how to prevent reply/reuse.
    Also
    > this text seems to allow passwords in the clear which is not acceptable.
    
    
    The example given is conceptual. You can use encryption if you the
    initiator and target can agree on it, or if it automatically provided by
    the TCP layer. But we are ready to work some more on it.
    
    > * In section 10.5 it states "Once AllowNoRTT has been set to 'yes', it
    > cannot be set back to no".  It should clarify this is for the open
    > connection and closing this connection and opening a new connection will
    > clear this condition.
    
    This was the intention. We will clarify.
    
    > Questions
    > --------
    > * What value does the ability to do an iSCSI ping add to the existing
    > ability to do an ICMP ECHO?  If little or none, this should be omitted,
    see
    > section 3.15.
    
    This is very valuable.  First, ICMP may be blocked by a firewall.  Second,
    it is very useful to test certain pathological data sets over particular
    networks.  Third, when a TCP link is not being used, no data is sent.  This
    makes it almost impossible to detect if the connection has been broken.
    Having a ping command allows the TCP connection to be tested periodically.
    And it tests more than just the TCP/IP stack - a valuable add-on in many
    settings.
    
    > TCP-RDMA
    > --------
    > Although the premise of TCP acceleration is quite useful the concept of
    RDMA
    > does not apply for our application of internet SCSI.  We will handle the
    > moving of data as implementation specific and not as generic design such
    as
    > RDMA.
    
    As they say - we all leave in free world... I would agree that you have a
    strong case for a controller but I am not that confident about a general
    purpose host adapter - like a NIC card (not SCSI specific)
    
    > --- end forwarded message by harwood, jack ---
    
    Regards,
    Julo
    
    Julian Satran (on behalf of all my colleagues),
    IBM Research at Haifa
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:08:17 2001
6315 messages in chronological order