SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: reusing ISID for recovery



    
    Julo,
    
     Comments in line of your last response.
    
    Jim Hafner
    
    
    Julian Satran/Haifa/IBM@IBMIL@ece.cmu.edu on 08/27/2001 09:50:00 am
    
    Sent by:  owner-ips@ece.cmu.edu
    
    
    To:   ips@ece.cmu.edu
    cc:
    Subject:  RE: iSCSI: reusing ISID for recovery
    
    
    
    Jim,
    
    Option A is clearly unacceptable as an initiator may do harm.
    <JLH>
    How can it do harm? It should only do this when it's "lost it's way" and so
    doesn't have any state to harm.  If it can't tell that it has a session
    with that ISID live, then having that session killed at the target end
    isn't going to cause any problems.   If it can tell and still blows the
    protocol, then all bets are off anyway.  [As an aside, I believe this is
    the behavior of FC, though admittedly, it's at a lower and less
    sophisticated layer then we are talking about.]  There's a further summary
    of these issues below.
    </JLH>
    
    Recovery is not done by the target neither in B nor in C.
    <JLH>
    Certainly, recovery is done by the target! Perhaps your definition of
    "recovery" is different than mine.  I include "cleanup" in "recovery".
    It's not trying to get back the previous state, so much as get out from
    under the bad state it's in.  So, in both cases, it's supposed to cleanup
    dead session/connections.  In C, it's not even clear to the initiator that
    this is going to happen!  In B, it knows it's explicitly asking for this
    cleanup (with the bit set).
    </JLH>
    
    In B an initiator has to add a bit (he may be tempted to put it in as "my
    code is safe" while in C
    it requires the initiator to logout first before he can reinitiate the
    session.
    
    B is an optimization over C - and quite dangerous.
    <JLH>
    If B uses the bit indiscriminantly, then it's its own fault and tough
    cookies.  So, I don't see how this is "quite dangerous".
    
    I personally think that C is more dangerous, because the behavior is less
    predictable to the initiator.  How can the initiator judge what the target
    thinks of the state of the connections/sessions?  Maybe the target sees the
    connections as dead, even though the initiator doesn't (yet?).
    </JLH>
    
    <JLH>
    Ask the question why the initiator is taking this action and what state is
    he in.  I see three cases:
    
    1) If the initiator knows he has (or thinks he has) a session that he wants
    to restart, then he can logout and login and this ISID rule enforcement is
    *never* triggered. This is true whether he sees the session as live or
    dead.
    
    2) If the initiator knows he has a session and still proposes to break the
    rule, then tough.  Either he has to work harder (option B) or he gets what
    he gets (option A) in predictable fashion.  He knows what state he's in and
    he knows the consequences of his actions.
    
    3) The only time he should break his side of the protocol (reuse the ISID)
    is if he has lost knowledge of the previous session.  In that case,
    everyone might as well kill that session since context is lost on the
    initiator side anyway.  That means that option A is just fine. Option B
    carries the extra benefit of assisting him in error detection (that he's
    lost his way).   He may not know enough a priori to know he's lost his way
    (which is why I have a weak preference for B, though all of this discussion
    is leaning me more towards A).
    </JLH>
    
    
    Julo
    
    
    Jim Hafner/Almaden/IBM@IBMUS@ece.cmu.edu on 27-08-2001 19:22:12
    
    Please respond to Jim Hafner/Almaden/IBM@IBMUS
    
    Sent by:  owner-ips@ece.cmu.edu
    
    
    To:   ips@ece.cmu.edu
    cc:
    Subject:  RE: iSCSI: reusing ISID for recovery
    
    
    
    
    Julo,
    
    Here's the way I look at it.  I think this is an "error detection/recovery"
    issue.
    
    With a rule that the ISID doesn't get reused by an initiator to a given
    target portal group,  it's the obligation of the initiator not to do it.
    If  the initiator does it anyway, then it is acting in protocol error and
    it is the job of the target to enforce that rule.   If the initiator still
    does it anyway, it's probably because it has "lost its way", then it is in
    error state (but how does it know this?).  I prefer option B. The target
    enforces the protocol rule only.  An unexpected reject is the "error
    detection" mechanism for the intiator.  The "forced" login is part of the
    error recovery, as driven by the initiator which is doing the error
    recovery.
    
    I see option A as a simpler mechanim that makes the "explicit" error
    recovery request by the initiator an "implicit" request with no direct
    detection mechanism on the initiator side.   I'd go with this choice if it
    is decided that the only time we'd expect an initiator to make this request
    is after a reboot or other massive reset, where it wouldn't have clue about
    it's previous state.  If there are cases where we think it won't "do the
    right thing" when it could, then I would say that option B as a safer
    mechanism.
    
    In short, it gets the target out of doing error recovery on behalf of the
    initiator without explicit (option B) or implicit (Option A) direction.
    It's not in the business of "guessing" if the initiator needs recovery.
    
    Jim Hafner
    
    
    Julian Satran/Haifa/IBM@IBMIL@ece.cmu.edu on 08/27/2001 07:50:09 am
    
    Sent by:  owner-ips@ece.cmu.edu
    
    
    To:   ips@ece.cmu.edu
    cc:
    Subject:  RE: iSCSI: reusing ISID for recovery
    
    
    
    
    Jim,
    
    It would be interesting to hear your arguments. The functions is clearly
    there - only that an initiator has to perform two steps
    Why is it important (except for boot) that a living initiator do it in one
    step.
    
    Regards,
    Julo
    
    Jim Hafner/Almaden/IBM@IBMUS@ece.cmu.edu on 27-08-2001 17:16:33
    
    Please respond to Jim Hafner/Almaden/IBM@IBMUS
    
    Sent by:  owner-ips@ece.cmu.edu
    
    
    To:   ips@ece.cmu.edu
    cc:
    Subject:  RE: iSCSI: reusing ISID for recovery
    
    
    
    
    Julo,
    You wrote:
      And I would like to remind you all that we where on this exact thread
    more
      tan 3 months ago (other players).
      I just restated the rationale for an (apparent) newcomer.
    
    On the other hand, I don't recall this issue ever being called for
    concensus.  Opinions were posted and nothing followed up.
    
    I happen to agree with Marjorie.
    
    Jim Hafner
    
    
    Julian Satran/Haifa/IBM@IBMIL@ece.cmu.edu on 08/25/2001 12:22:33 AM
    
    Sent by:  owner-ips@ece.cmu.edu
    
    
    To:   ips@ece.cmu.edu
    cc:
    Subject:  RE: iSCSI: reusing ISID for recovery
    
    
    
    
    It may be another OS running on the same machine or CPU complex or it could
    be an attack.
    In any case if the initiator is up and fine he can as well do logout.
    It is a rare enough event for us not to try to optimize.
    
    The reboot is the only case in which the initiator can't logout and about
    which we care.
    
    And I would like to remind you all that we where on this exact thread more
    tan 3 months ago (other players).
    I just restated the rationale for an (apparent) newcomer.
    
    Julo
    
    "KRUEGER,MARJORIE (HP-Roseville,ex1)" <marjorie_krueger@hp.com>@ece.cmu.edu
    on 25-08-2001 03:51:43
    
    Please respond to "KRUEGER,MARJORIE (HP-Roseville,ex1)"
          <marjorie_krueger@hp.com>
    
    Sent by:  owner-ips@ece.cmu.edu
    
    
    To:   ips@ece.cmu.edu
    cc:
    Subject:  RE: iSCSI: reusing ISID for recovery
    
    
    
    I don't see how Option A is prone to "wild closing of sessions".  The
    target
    is only looking for sessions with this particular initiator and closing
    them
    if the ISID matches.
    
    If an initiator doesn't have a valid TSID (login w/TSID=0), it means it has
    lost state entirely (reboot) or knows it wants to immediately reset a
    session (NIC failure).  How could there possibly be a case where the
    initiator has an active valid session with the same ISID, but just doesn't
    know about it??  Rejecting the login seems pointless, since obviously the
    initiator either has a bug or intends to quickly reset the session.
    
    The behavior chosen (Option C) will cause the initiator's recovery to be
    delayed while the target NOPs all the connections and waits for them to
    time
    out - this will only delay the initiators recovery unnecessarily.  I can't
    help but think this will cause long term problems for the protocol.
    
    Marjorie Krueger
    Networked Storage Architecture
    Networked Storage Solutions Org.
    Hewlett-Packard
    tel: +1 916 785 2656
    fax: +1 916 785 0391
    email: marjorie_krueger@hp.com
    
    > -----Original Message-----
    > From: Julian Satran [mailto:Julian_Satran@il.ibm.com]
    > Sent: Thursday, August 23, 2001 9:39 PM
    > To: ips@ece.cmu.edu
    > Subject: Re: iSCSI: reusing ISID for recovery (Was: RE: iSCSI
    > - Change -
    > Login/Text...)
    >
    >
    >
    > Mallikarjun,
    >
    > On  your point 1 that is what is stated today in the draft.
    >
    > On your point 2 option C is the one we took in the draft, after some
    > debate.
    >
    > Option A is prone to "wild closing of sessions" and option B is also
    > relaying to much on the good behaviour of the
    > client. It also introduces a "feature" that complicates login/logout.
    >
    > Our postion on this is  that the initiator should logout the session
    > explicitly if it can (and in this case it can as the target
    > has ascertained
    > that the session is alive).
    >
    > I agree that you may want to update the stayte diagram to
    > reflect this.
    >
    > Julo
    >
    >
    > "Mallikarjun C." <cbm@rose.hp.com>@ece.cmu.edu on 24-08-2001 01:46:40
    >
    > Please respond to cbm@rose.hp.com
    >
    > Sent by:  owner-ips@ece.cmu.edu
    >
    >
    > To:   ips@ece.cmu.edu
    > cc:
    > Subject:  iSCSI: reusing ISID for recovery (Was: RE: iSCSI - Change -
    >       Login/Text...)
    >
    >
    >
    > Julian and all:
    >
    > This thread mirrors another discussion some of us are
    > having in a different forum.  Following (two bullets 1
    > & 2 below) is what I proposed there, attempting to address
    > two issues -
    >      a) how to recover sessions when target and the initiator
    >            have conflicting views of the same TCP connection(s)?
    >            (Initiator NIC fails, but there's no I/O activity,
    >            and the target doesn't see any connection failure.)
    >      b) More specifically, how to address the above problem
    >            when the initiator *does not want* to re-instate failed
    >            connections since it only implements the mandatory
    >            session recovery?
    >
    > This could add clarity or muddle things up here, though hopefully
    > the former...
    >
    > 1 If login is sent with the same ISID, same TSID, same CID and X-bit,
    >   then it means a failed connection is being re-instated (whether
    >   or not there are multiple connections in the session).  This login
    >   attempt must be done before the connection timeout (transition M1),
    >   or if this is the only connection in the session, also before the
    >   session timeout (transition N6) - to be counted as a connection
    >   reinstatement effort.
    >            o CmdSN counters (CmdSN, ExpCmdSN) are continued.
    > Initiator
    >              must do command plugging when there's a mismatch
    >              between its CmdSN and ExpCmdSN in the login response.
    >            o Since this is an implicit connection logout, all
    > the active
    >              tasks on the connection are either internally terminated,
    >              or made non-allegiant (based on ErrorRecoveryLevel=x/y,
    >              TBD) for recovery.
    >
    > 2 If login is sent with the same ISID and TSID=0, the session (if it
    >   exists on the target) is being cleared and any active connections
    >   that the target sees must be immediately (at the end of the login
    >   process including any initiator authentication) transport reset.
    >   Initiator may attempt this only after it ascertains a
    > session failure
    >   on its end (ie. all connections entered RECOVERY_START).
    >            o CmdSN counters get reset.  Initiator has to perform the
    >              currently defined session recovery actions.
    >            o All active tasks of the session are internally
    > terminated.
    >
    >
    > Essentially, I was proposing extending the same notion of "implicit
    > logout" of a connection to the session level.  The options that I
    > see are -
    >
    >        A) Should iSCSI let it happen by default as stated above (ie.
    >           same ISID, TSID=0 always wipes out the pre-existing session
    >           on target, since we are mandating it to be used only when
    >           initiator sees a session failure)?
    >        B) Should iSCSI mandate making this intended cleanup explicit
    >           by setting a bit (Say C-bit, for Clear) in the Login Command
    >           PDU to prevent an accidental session cleanup with a buggy
    >           initiator code?
    >        C) Should iSCSI merely state that targets must ascertain
    >           the connection state(s) whenever a new session creation
    >           attempt is made with a known ISID and TSID=0?  (sort of
    >           defeats the intention of the initiator wanting quicker
    >           session recovery since the Login command PDU would have
    >           to idle till target ascertains the connection state(s)).
    >
    > I prefer A, or B.
    >
    > Going with A or B means that the description of transition N3
    > in the session state diagram would have to change to:
    >      Last LOGGED_IN connection had ceased to be LOGGED_IN,
    >         or a Login Command requesting clearing the session (also
    >         with C-bit set, if option B) was received by the target.
    >
    > The transition N7's description would have to be augmented as
    > well to:
    >         Session recovery attempt with an implicit logout,
    >         or connection reinstatement/new CID addition.
    >
    > Comments?
    > --
    > Mallikarjun
    >
    >
    > Mallikarjun Chadalapaka
    > Networked Storage Architecture
    > Network Storage Solutions Organization
    > MS 5668   Hewlett-Packard, Roseville.
    > cbm@rose.hp.com
    >
    >
    > >Stephen,
    > >
    > >That can happen as the target may set-up completely new TCP
    > connections
    > >(the old sockets are still there and look OK).
    > >Untill the login is  progessing he assumes that this is just another
    > >open-session attempt. Then he checks the old session and the
    > session is
    > >dead (initiator has closed the connections).
    > >
    > >The target has to distinguish only between a session that is
    > alive (and
    > >reject the new one) and one that its dead in which case it
    > can clean-up.
    > >
    > >Julo
    > >
    > >"Wheat, Stephen R" <stephen.r.wheat@intel.com> on 23-08-2001 22:50:56
    > >
    > >Please respond to "Wheat, Stephen R" <stephen.r.wheat@intel.com>
    > >
    > >To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
    > >cc:
    > >Subject:  RE: iSCSI - Change - Login/Text commands with the
    > binary stage
    > co
    > >          de
    > >
    > >
    > >
    > >Julian,
    > >
    > >I don't understand your answer.  For the scenario given, I
    > would presume
    > >then that the target would reject the new session attempt,
    > as it sees the
    > >previous session still "alive".  What is there to tell the
    > target that
    > this
    > >is any different from when the Initiator is erroneously using the
    > >repetitive
    > >session id?
    > >
    > >Thanks,
    > >Stephen
    > >
    > >-----Original Message-----
    > >From: Julian Satran [mailto:Julian_Satran@il.ibm.com]
    > >Sent: Thursday, August 23, 2001 11:15 AM
    > >To: ips@ece.cmu.edu
    > >Subject: Re: iSCSI - Change - Login/Text commands with the
    > binary stage
    > >co de
    > >
    > >
    > >Stephen,
    > >
    > >1.If the initiator goes away for a while and reboots and there was no
    > >activity on the connections
    > >the target may see a session alive (I am not sure that it
    > has to appear on
    > >the state diagram but maybe).
    > >
    > >2.Again - I am not sure that the curent state diagram
    > includes death of
    > the
    > >initiator
    > >
    > >Julo
    > >
    > >"Wheat, Stephen R" <stephen.r.wheat@intel.com>@ece.cmu.edu
    > on 23-08-2001
    > >19:58:34
    > >
    > >Please respond to "Wheat, Stephen R" <stephen.r.wheat@intel.com>
    > >
    > >Sent by:  owner-ips@ece.cmu.edu
    > >
    > >
    > >To:   ips@ece.cmu.edu
    > >cc:
    > >Subject:  Re: iSCSI - Change - Login/Text commands with the
    > binary stage
    > co
    > >      de
    > >
    > >
    > >
    > >Julian,
    > >
    > >1.3.6 ISID states that the target should check to see if the
    > old session
    > is
    > >still active when a duplicate session is detected.
    > >
    > >I have two questions, the second only if you answer in the
    > affirmative on
    > >the first ;^)
    > >
    > >1. Is there a properly executed sequence of events (i.e., no
    > coding error
    > >on
    > >the target side) where the session is not active, but the
    > target hadn't
    > >taken notice of it?  It appears this as a protocol-specified
    > means to work
    > >around a flaw in a target's implementation.  I interpret the
    > state diagram
    > >transitions as being atomic wrt other commands.  I.e., the
    > last logout
    > >would
    > >result in the various transitions of the connection/session
    > prior to the
    > >initiator starting the session up again.  And the target would have
    > >completed the transitions prior to handling a new session request.
    > >
    > >2. If you answered (1) in the affirmative, then the word
    > "Active" is not
    > >consistent with the 6.3 Session State Diagram.  Does this
    > mean the target
    > >got lost, due to transport failures of any sort, in its
    > transition from
    > >Q3-to-Q2-to-Q1?  It sounds like the intent is to close the
    > old session if
    > >the session was in Q2 or Q4, presuming if it were in Q1,  it
    > would not
    > have
    > >been found as a duplicate.
    > >
    > >Stephen
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    >
    >
    >
    >
    >
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:03:53 2001
6315 messages in chronological order