SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Flow Control



    
    
    Somesh,
    
    I kept quiet on this - but as it risks to get unnecessarily complex IMHO I
    can't anymore.
    
    I am not altogether convinced that there is a consensus on flow control.
    Let us reiterate the reasons for wanting command flow control:
    
    - for long latency pipes you want the to ship  commands and data ahead of
    time to keep the pipes full
    - but you want also to avoid the command queueing situation in which you
    can be forced to drop commands and refill the queue.
    - you want to keep all devices as busy as possible
    
    The last item as well as the whole SCSI queuing issue is best taken care at
    the SCSI layer - as
    it is the only one that might need to keep per-LU-state.
    
    For the first two items - excepts for some artifacts - observe that
    commands are not a significant
    consumer of either bandwidth or target resources. A high number of commands
    in transit
    will readily keep the pipes full if they are followed by data and pose no
    strain on a target
    where they can be queued at the iSCSI layer.
    
    Data will be flow-controlled by the target limits for immediate data and
    the TCP windows
    and by simple conservative ordering rules we can avoid both deadlock and
    throwing away data.
    
    What you are suggesting us to look into - flow controlling per connection -
    is - I am afraid
    not adding to much.
    
    And last - but not least - if you implement sessions with one connection -
    and use multiple sessions
    you can flow control every connection but then you have to add a wedge
    driver to do load
    distribution.
    
    Regards,
    Julo
    
    
    
    "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com> on
    09/10/2000 03:36:46
    
    Please respond to "GUPTA,SOMESH (HP-Cupertino,ex1)"
          <somesh_gupta@am.exch.hp.com>
    
    To:   IPS@ece.cmu.edu
    cc:    (bcc: Julian Satran/Haifa/IBM)
    Subject:  RE: iSCSI: Flow Control
    
    
    
    
    Hi all,
    
    Assuming that we have consensus, especially on [1] below (minimum
    connections is 1), I think we should try and resolve the flow
    control issue.
    
    It seems to me that there is sufficient consensus that command
    flow control is needed -
    
    [1]   To enable fastest possible flow of commands given the
          capabilities of the target & initiator, and accomodating
          increased latencies of IP networks
    
    [2]   To significantly minimize the queue full condition. And to
          provide a recovery mechanism at the iSCSI level when command
          overflow happens at the target.
    
    [3] Some of the debate seems to be around whether the credit mechanism
        should be static or dynamic.
    
    I believe that static is a subset of
    dynamic (where you never change the value being advertised). I don't
    disagree with Charles when he says that it will take experimentation
    to determine how to best adjust the credit dynamically. However,
    it is important to provide for it in the protocol so that when a
    vendor does figure out how best to adjust the credit, they have a
    protocol mechanism to do so. Even though it is an implementation
    that provides full rate performance, it is the protocol that
    enables it (take TCP window scaling option e.g.).
    
    [4] Another question that comes up is - Should the credit be per
        connection or per session (multiple connections)?
    
    The current draft does provide for a session wide "flow control"
    through MaxCmdRn. I believe that it is better to have flow
    control on a per connection basis. This enables each connection
    (which might be different NICs) to operate independently of
    each other. Having a session wide flow control would cause
    sync points in both the initiator and the target.
    
    Also a smaller field could be used if it is just to indicate
    a credit window.
    
    [5] The credit should be a "pretty good effort" and not a "guarantee".
    
    This allows smart targets to overcommitt as the number of initiators
    logged in increases (while reducing the credit available to the
    initiators) and increase the credit and reduce overcommittment as
    the number of initiators logged in decreases.
    
    Some mechanism is required to recover from the infrequent case where
    command buffers get exhausted and have to be thrown away.
    
    [6] I would recommend that iSCSI provide a way to recover from
    command overflow and also maintain ordering.
    
    The current proposal does not have a drop notification. It has
    an ack mechanism (ExpCmdRn). I think for the purpose of drop
    notification, it is better to be able to indicate the range of
    commands dropped. TCP acks do tell me which commands
    reached the target, and command responses tell me which were processed.
    
    When a target suffers from command exhaustion, it could behave
    in 2 different ways - one is to drop all the commands it receives
    till it detects a retransmission. In this case it would send a drop
    notification of all commands it receives till it starts receiving
    the command from where the drop started.
    
    The other would be to store all the commands it is able to provide
    buffers for and provide NAKs for only those that it has dropped.
    This would be more efficient.
    
    In this case, we should also agree on what the semantics of the
    processing of the out or order commands are. Should they be
    processed only when the gaps are filled? Or can they be processed
    in any order?
    
    [7] There was some discussion of whether we should propose a slow
    start algorithm or a fast start algorithm.
    
    I think we should a fast start algorithm at this level. At TCP
    level, the slow start algorithm is important because the two
    ends are unaware of the state of the network and have to probe it.
    At the iSCSI level, the target should be reasonably knowledgable
    about the its own state and be able to provide a credit or
    reduce/increase it per login as the conditions change (hopefully
    with some hysteresis built in).
    
    [8] On flow control of immediate data, should we first work out
    the command flow control and then turn our efforts to the
    data flow control?
    
    Once we can agree on some of the basic issues, then it should be
    relatively easy to work out the credit indication/numbering
    details etc.
    
    Somesh
    
    > -----Original Message-----
    > From: Black_David@emc.com [mailto:Black_David@emc.com]
    > Sent: Wednesday, October 04, 2000 5:13 PM
    > To: ips@ece.cmu.edu
    > Subject: iSCSI sessions: Step 2
    >
    >
    > With my WG co-chair hat on, it's time to call
    > consensus on some of this ...
    >
    > Late last week, I sent the "Let's try again" message
    > on iSCSI sessions, and since then I've only seen
    > one thread of comments to it from a combination of
    > Matt Wakeley and Doug Otis.  The important content
    > of that thread is Matt renewing his position that
    > more than one connection ought to be REQUIRED.  Lest
    > this seem like annoyance, Matt deserves credit for
    > being patient with the WG's indirect progress towards
    > consensus that made it necessary for him to renew his
    > objection on multiple occasions.  As I read Matt's
    > email, it looks like a good flow control solution
    > for the single TCP connection iSCSI session case
    > might satisfy him, but the flow control discussion
    > is still ongoing.
    >
    > In any case, I am stating the following two items
    > as WG rough consensus, over Matt's renewed objection
    > in the first case:
    >
    > [1] Multiple TCP connections per iSCSI session
    >    remain OPTIONAL.
    > [2] Multiple TCP connections per iSCSI session
    >    will be specified as part of the base
    >    iSCSI protocol.
    >
    > Given that it's two months after the Pittsburgh meeting
    > I hope the rough consensus will hold on these items;
    > anyone other than Matt should object to me directly,
    > if necessary, I'll (reluctantly) reopen these issues
    > one more time (yes, this is a hint).
    >
    > Moving on to the topic of models for multiple connection
    > sessions, let me start by trying to winnow the approaches
    > to Asymmetric sessions before taking up Asymmetric vs.
    > Symmetric again.  Four approaches to Asymmetric sessions
    > have been discussed.  I have not seen anyone other than
    > Pierre Labat support his Balanced model in which a single
    > stream of control moves from TCP connection to TCP connection
    > within a session. Therefore I believe it is the WG
    > rough consensus that:
    >
    > [3] The Balanced Asymmetric model in which a single
    >    control stream moves from TCP connection to TCP
    >    connection in an iSCSI session will not be pursued.
    >
    > Similarly, I saw no objections to the note at the end of
    > Julian's email, indicating that the Collapsed Asymmetric
    > model in which data is allowed on the command connection
    > even when there are multiple TCP connections in an iSCSI
    > session is technically inferior to both the Pure Asymmetric
    > and Symmetric models. Therefore I believe it is the WG
    > rough consensus that:
    >
    > [4] The Collapsed Asymmetric model in which data is allowed
    >    on the command connection in multiple connection
    >    iSCSI sessions will not be pursued.
    >
    > The Pure Asymmetric model was originally described as
    > requiring two TCP connections per session.  Kalman Meth
    > proposed a modification to it that allowed it to use a
    > single connection for both command and data.  Between
    > Kalman being the originator of the Pure Asymmetric model,
    > lack of objection to his proposal, and rough consensus [2]
    > above, I believe it to be the WG rough consensus that:
    >
    > [5] The Pure Asymmetric model will only be considered
    >    in the modified form that allows an iSCSI session
    >    to contain a single TCP connection on which both
    >    command and data flow.
    >
    > If all five of the above consensuses (consensii?) hold,
    > that would be serious progress.  Objections to these
    > should be sent to the list, except that I would ask
    > Pierre Labat not to object to [3] in the absence of
    > other objections to it.
    >
    > Now comes the hard part - Symmetric vs. modified
    > Pure Symmetric (modified by [5] above).  There are
    > over 1000 email messages in my mailbox for the ips
    > mailing list for the past two months, and I freely
    > admit to not having reviewed them in detail.  I suggested
    > in the "Let's try again" email that more weight should
    > be given to those working on implementations, especially
    > hardware, and have not seen any objections to that
    > suggestion.  My impression is that the opinion of such
    > people has been in favor of the Symmetric model -
    > Matt Wakeley (Agilent), and Somesh Gupta (HP) come
    > to mind as examples.  I'm not confident that this is
    > the WG consensus, but it appears to me that the
    > WG is headed in that direction.  Please comment on
    > this - the absence of comments/objections will be
    > taken as a sign of agreement.
    >
    > There has been no comment on the error recovery issue
    > since my email.  Given this and the prior statements that
    > TCP solves many of the tape error scenarios that are motivating
    > FCP error recovery, I think the authors of the next version
    > of the iSCSI draft are entitled to use their best technical
    > judgement in determining how much error recovery to specify
    > across multiple TCP connections in an iSCSI session, and
    > the WG will review it when the next version of the draft
    > appears.
    >
    > We might be getting close to the end of the session issues.
    > Carefully considered comments are encouraged, but I'd ask
    > everyone to consider their comments carefully before sending
    > them, given our past experiences with this set of issues.
    >
    > Thanks,
    > --David
    >
    > ---------------------------------------------------
    > David L. Black, Senior Technologist
    > EMC Corporation, 42 South St., Hopkinton, MA  01748
    > +1 (508) 435-1000 x75140     FAX: +1 (508) 497-8500
    > black_david@emc.com       Mobile: +1 (978) 394-7754
    > ---------------------------------------------------
    >
    
    
    
    


Home

Last updated: Tue Sep 04 01:06:46 2001
6315 messages in chronological order