SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Flow Control



    Somesh,
    
    > Hi all,
    >
    > Assuming that we have consensus, especially on [1] below (minimum
    > connections is 1), I think we should try and resolve the flow
    > control issue.
    >
    > It seems to me that there is sufficient consensus that command
    > flow control is needed -
    >
    > [1]   To enable fastest possible flow of commands given the
    >       capabilities of the target & initiator, and accomodating
    >       increased latencies of IP networks
    
    The credit scheme as I have recommended would be carried within each frame
    an not just within the response PDU to reduce latency of control.
    
    > [2]   To significantly minimize the queue full condition. And to
    >       provide a recovery mechanism at the iSCSI level when command
    >       overflow happens at the target.
    
    The initiator should regulate the number of outstanding commands.  This
    regulation will not impact performance at the device level.
    
    > [3] Some of the debate seems to be around whether the credit mechanism
    >     should be static or dynamic.
    
    The credit scheme that I have recommended would be dynamic.
    
    > I believe that static is a subset of
    > dynamic (where you never change the value being advertised). I don't
    > disagree with Charles when he says that it will take experimentation
    > to determine how to best adjust the credit dynamically. However,
    > it is important to provide for it in the protocol so that when a
    > vendor does figure out how best to adjust the credit, they have a
    > protocol mechanism to do so. Even though it is an implementation
    > that provides full rate performance, it is the protocol that
    > enables it (take TCP window scaling option e.g.).
    >
    > [4] Another question that comes up is - Should the credit be per
    >     connection or per session (multiple connections)?
    
    As the transport's primary function is to provide aggregation down to the
    medium, then it would not be either on the connection, nor the end point as
    it is now.  It should be at the medium as recommended.
    
    > The current draft does provide for a session wide "flow control"
    > through MaxCmdRn. I believe that it is better to have flow
    > control on a per connection basis. This enables each connection
    > (which might be different NICs) to operate independently of
    > each other. Having a session wide flow control would cause
    > sync points in both the initiator and the target.
    >
    > Also a smaller field could be used if it is just to indicate
    > a credit window.
    
    The credit window should not be carried per connection as you suggest.  The
    medium is what needs to be controlled.
    
    > [5] The credit should be a "pretty good effort" and not a "guarantee".
    >
    > This allows smart targets to overcommitt as the number of initiators
    > logged in increases (while reducing the credit available to the
    > initiators) and increase the credit and reduce overcommittment as
    > the number of initiators logged in decreases.
    >
    > Some mechanism is required to recover from the infrequent case where
    > command buffers get exhausted and have to be thrown away.
    
    As the credit scheme that I recommended provides the highest resolution of
    control as well as implements a reduction acknowledgement, there should be
    little reason to toss commands or frames.
    
    > [6] I would recommend that iSCSI provide a way to recover from
    > command overflow and also maintain ordering.
    >
    > The current proposal does not have a drop notification. It has
    > an ack mechanism (ExpCmdRn). I think for the purpose of drop
    > notification, it is better to be able to indicate the range of
    > commands dropped. TCP acks do tell me which commands
    > reached the target, and command responses tell me which were processed.
    >
    > When a target suffers from command exhaustion, it could behave
    > in 2 different ways - one is to drop all the commands it receives
    > till it detects a retransmission. In this case it would send a drop
    > notification of all commands it receives till it starts receiving
    > the command from where the drop started.
    
    If the initiator restricts commands, then there would never be a drop
    requirement.  In addition, such limit on outstanding commands does not
    represent a practical constraint on performance.
    
    > The other would be to store all the commands it is able to provide
    > buffers for and provide NAKs for only those that it has dropped.
    > This would be more efficient.
    >
    > In this case, we should also agree on what the semantics of the
    > processing of the out or order commands are. Should they be
    > processed only when the gaps are filled? Or can they be processed
    > in any order?
    
    As TCP does not provide for out of sequence processing, there is little
    concern within this transport.  Only when substantial buffers are remaining,
    would out of sequence processing become useful.  As these buffers should be
    at the device, and as such handling is already defined at the device, no
    further definitions are required.
    
    > [7] There was some discussion of whether we should propose a slow
    > start algorithm or a fast start algorithm.
    >
    > I think we should a fast start algorithm at this level. At TCP
    > level, the slow start algorithm is important because the two
    > ends are unaware of the state of the network and have to probe it.
    > At the iSCSI level, the target should be reasonably knowledgable
    > about the its own state and be able to provide a credit or
    > reduce/increase it per login as the conditions change (hopefully
    > with some hysteresis built in).
    
    This is not TCP.  Why use TCP if you wish to modify TCP?  Resist
    re-engineering TCP. On a LAN, this is not a problem and on a WAN, this is a
    required feature of TCP.
    
    > [8] On flow control of immediate data, should we first work out
    > the command flow control and then turn our efforts to the
    > data flow control?
    >
    > Once we can agree on some of the basic issues, then it should be
    > relatively easy to work out the credit indication/numbering
    > details etc.
    
    To adapt to different flow control schemes, the encapsulation should be a
    separate documentation from flow control and have flow control either as a
    separate control PDU or as a prefix defined within the flow-control draft.
    This would remove the load on having one person define everything and allow
    the control mechanism to change without damaging encapsulation.  I would add
    that service management should also have the same split in documents.
    
    Doug
    
    
    >
    > Somesh
    >
    > > -----Original Message-----
    > > From: Black_David@emc.com [mailto:Black_David@emc.com]
    > > Sent: Wednesday, October 04, 2000 5:13 PM
    > > To: ips@ece.cmu.edu
    > > Subject: iSCSI sessions: Step 2
    > >
    > >
    > > With my WG co-chair hat on, it's time to call
    > > consensus on some of this ...
    > >
    > > Late last week, I sent the "Let's try again" message
    > > on iSCSI sessions, and since then I've only seen
    > > one thread of comments to it from a combination of
    > > Matt Wakeley and Doug Otis.  The important content
    > > of that thread is Matt renewing his position that
    > > more than one connection ought to be REQUIRED.  Lest
    > > this seem like annoyance, Matt deserves credit for
    > > being patient with the WG's indirect progress towards
    > > consensus that made it necessary for him to renew his
    > > objection on multiple occasions.  As I read Matt's
    > > email, it looks like a good flow control solution
    > > for the single TCP connection iSCSI session case
    > > might satisfy him, but the flow control discussion
    > > is still ongoing.
    > >
    > > In any case, I am stating the following two items
    > > as WG rough consensus, over Matt's renewed objection
    > > in the first case:
    > >
    > > [1] Multiple TCP connections per iSCSI session
    > > 	remain OPTIONAL.
    > > [2] Multiple TCP connections per iSCSI session
    > > 	will be specified as part of the base
    > > 	iSCSI protocol.
    > >
    > > Given that it's two months after the Pittsburgh meeting
    > > I hope the rough consensus will hold on these items;
    > > anyone other than Matt should object to me directly,
    > > if necessary, I'll (reluctantly) reopen these issues
    > > one more time (yes, this is a hint).
    > >
    > > Moving on to the topic of models for multiple connection
    > > sessions, let me start by trying to winnow the approaches
    > > to Asymmetric sessions before taking up Asymmetric vs.
    > > Symmetric again.  Four approaches to Asymmetric sessions
    > > have been discussed.  I have not seen anyone other than
    > > Pierre Labat support his Balanced model in which a single
    > > stream of control moves from TCP connection to TCP connection
    > > within a session. Therefore I believe it is the WG
    > > rough consensus that:
    > >
    > > [3] The Balanced Asymmetric model in which a single
    > > 	control stream moves from TCP connection to TCP
    > > 	connection in an iSCSI session will not be pursued.
    > >
    > > Similarly, I saw no objections to the note at the end of
    > > Julian's email, indicating that the Collapsed Asymmetric
    > > model in which data is allowed on the command connection
    > > even when there are multiple TCP connections in an iSCSI
    > > session is technically inferior to both the Pure Asymmetric
    > > and Symmetric models. Therefore I believe it is the WG
    > > rough consensus that:
    > >
    > > [4] The Collapsed Asymmetric model in which data is allowed
    > > 	on the command connection in multiple connection
    > > 	iSCSI sessions will not be pursued.
    > >
    > > The Pure Asymmetric model was originally described as
    > > requiring two TCP connections per session.  Kalman Meth
    > > proposed a modification to it that allowed it to use a
    > > single connection for both command and data.  Between
    > > Kalman being the originator of the Pure Asymmetric model,
    > > lack of objection to his proposal, and rough consensus [2]
    > > above, I believe it to be the WG rough consensus that:
    > >
    > > [5] The Pure Asymmetric model will only be considered
    > > 	in the modified form that allows an iSCSI session
    > > 	to contain a single TCP connection on which both
    > > 	command and data flow.
    > >
    > > If all five of the above consensuses (consensii?) hold,
    > > that would be serious progress.  Objections to these
    > > should be sent to the list, except that I would ask
    > > Pierre Labat not to object to [3] in the absence of
    > > other objections to it.
    > >
    > > Now comes the hard part - Symmetric vs. modified
    > > Pure Symmetric (modified by [5] above).  There are
    > > over 1000 email messages in my mailbox for the ips
    > > mailing list for the past two months, and I freely
    > > admit to not having reviewed them in detail.  I suggested
    > > in the "Let's try again" email that more weight should
    > > be given to those working on implementations, especially
    > > hardware, and have not seen any objections to that
    > > suggestion.  My impression is that the opinion of such
    > > people has been in favor of the Symmetric model -
    > > Matt Wakeley (Agilent), and Somesh Gupta (HP) come
    > > to mind as examples.  I'm not confident that this is
    > > the WG consensus, but it appears to me that the
    > > WG is headed in that direction.  Please comment on
    > > this - the absence of comments/objections will be
    > > taken as a sign of agreement.
    > >
    > > There has been no comment on the error recovery issue
    > > since my email.  Given this and the prior statements that
    > > TCP solves many of the tape error scenarios that are motivating
    > > FCP error recovery, I think the authors of the next version
    > > of the iSCSI draft are entitled to use their best technical
    > > judgement in determining how much error recovery to specify
    > > across multiple TCP connections in an iSCSI session, and
    > > the WG will review it when the next version of the draft
    > > appears.
    > >
    > > We might be getting close to the end of the session issues.
    > > Carefully considered comments are encouraged, but I'd ask
    > > everyone to consider their comments carefully before sending
    > > them, given our past experiences with this set of issues.
    > >
    > > Thanks,
    > > --David
    > >
    > > ---------------------------------------------------
    > > David L. Black, Senior Technologist
    > > EMC Corporation, 42 South St., Hopkinton, MA  01748
    > > +1 (508) 435-1000 x75140     FAX: +1 (508) 497-8500
    > > black_david@emc.com       Mobile: +1 (978) 394-7754
    > > ---------------------------------------------------
    > >
    >
    
    


Home

Last updated: Tue Sep 04 01:06:45 2001
6315 messages in chronological order