SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iscsi : changes involving tgt portal group tag.



    John Hufferd wrote:
    
    > 
    > 1. Not specifying a *port* in the Login dialogue explicitly
    >     is something I am concerned could cause surprises down
    >     the road.  Given that a Login is meant to establish an I_T
    >     nexus to a port (not to a node), I am rather surprised to see
    >     the opposition simply because the proposal is coming late.
    > [Huff/]
    >     based on my previous note, I do not buy this as a problem, since I do
    >     not think this occurs without manual intervention and a significant
    >     time interval (and most likely a power down).  This means that it would
    >     seem to be a natural thing for the initiator to attempt to rediscover
    >     the connection.  It seems that simple wordage that Jim Hafner has
    >     suggested for the draft meets this issue.
    
    John,
    
    The procedure to re-config a target portal group is specific to each
    product and while it may be reasonable for some product installation
    manuals to recommend that all sessions be terminated and the target be
    taken offline for a re-config, I don't believe the spec should base its
    correct-ness upon this requirement.
    
    After all, with multi-connection session architecture, iscsi does allow
    for the target to continue to service active session traffic while being
    able to de-commision individual NICs and re-assign them to other portal
    groups. Consider also that such a network portal re-assign may only be a
    logical admin operation and does not always require the target to be
    taken offline or powered off.
    
    Since there is no iscsi protocol specified async notification and
    authentication mechanism that prevents connections from being
    accidentally established to incorrect portal groups, there is a
    possiblity of high-end arrays that advertise 24 x 7 support and online
    re-config capabilities, causing initiators to accidentally log into the
    wrong portal group during such re-configs.
    
    This can be solved in 2 steps :
    
    a) Have a new async pdu reason code that says "portal group
    re-configured" which allows currently logged-in initiator sessions to be
    notified and in turn, trigger re-discovery.
    
    b) Send the TPGT as a part of the login and require the tgt port to
    authenticate the port name/identifier upon login. 
    
    I don't see these as major changes in the spec. They will block
    initiators from accidentally logging into the wrong portal groups, which
    needs to be protected against, since it can result in a number of side
    effects. If we want to minimize the changes, perhaps, the TPGT could be
    introduced as a login key, instead of being in the login pdu header,
    thereby, causing no change in the login pdu format.
    
    > 
    >     One of the reasons that I am concerned about late proposals, is that
    >     the full review of impacts tends not to be done adequately.  All my
    >     experience has shown me that the largest number of errors and retrofits
    >     occur with the last items added to a product, or spec.  In fact I
    >     believe there can be a strong correlation between time of arrival of a
    >     change, and the probability of unforeseen impacts.  So yes, I would
    >     hate to make changes this late for a problem that I am not sure even
    >     exist, and if it does, a rediscovery fixes the problem.
    
    I agree with your risk assessment. However, we do have a correctness
    issue in that the protocol does not authenticate port name/identifier
    upon login and does not have an async notification scheme to existing
    initiators which will prevent accidental [re-]login to incorrect portal
    groups.
    
    To depend on Unit Attentions to solve this problem is insufficient due
    to the following reasons :
    
    a) The "REPORTED LUNS DATA HAS CHANGED" UA can get cleared if the target
    were to be power cycled, prior to I/O activity from the initiator.
    
    b) UAs can get cleared if several other UA conditions that caused the
    target to exceed the number of concurrent UAs it can queue and deliver.
    
    c) Requiring that the initiator's legacy SCSI ULP stacks be modified in
    order to react to these UAs to address an iscsi specific problem is not
    a good idea, since, iscsi drivers must not require changes in the O.S.
    SCSI ULPs. Further, iscsi driver writers may not control the O.S. SCSI
    ULPs and the change may not be under their control.
    by the time the next I/O comes in from an initiator, and reacting to UAs
    requires a change in the legacy SCSI ULPs of the O.S' that will run
    iscsi, or requires all the iscsi initiators to be 
    
    It is common for all other serial scsi transports (FCP, SRP) to perform
    port name/identifier authenticatio upon login.
    
    > [\Huff]
    > 
    > 2.  > manual reconfiguration (including a probable power down), that the
    >      Target
    >      > will maintain this key state ..
    >     This and a lot of your other text below dwells on the unlikelihood of
    >     target not maintaining the state - I agree with you.  My point is
    >     *not* that a target would, but the need to design the quickest and
    >     most reliable way to communicate the loss of state back to the
    >     initiator.
    >     I believe addition of TPGT to the Login Request PDU accomplishes that.
    
    > 
    >     [Huff/]
    >     Since I feel this type of thing is rare if a problem at all,
    
    This is debatable, since I can envision a field engineer using the
    portal group re-config as a quick customer site workaround upon
    detecting a bug in the multi-connection session implementation in a
    target, or a bug in the co-operation of multiple network portal types in
    supporting a multi-connection session. 
    
    Without losing the connectivity of the target, it can be converted from
    a (2 x 4) connectivity array to a (1 x 8) connectivity array, causing
    minimal degradation in its performance and no downtime of the customer's
    data.
    (m x n => no. of portal groups  x no. of network portals).
    
    Initial implementations of a new protocol are not without their share of
    bugs and it would be a useful feature to not have to bring down the
    target to perform such re-configs.
    
    >     I think
    >     that documentation about not affecting the TPG if state is outstanding,
    >     and a suggestion to the Initiator that if an unusual amount of time
    >     goes by with the Session Down, that a Rediscovery should be done (as if
    >     they would not do that anyway).  So, because of it being rare, if a
    >     problem at all, I am not convinced that the right approach is to
    >     optimize the response time to restart a session that has been down for
    >     a long time anyway.  If it take an extra discovery, I do not think this
    >     is a problem.
    >     [\Huff]
    
    We seem to be talking about different scenarios here ! I have called out
    an issue regarding the re-config of portal groups without requiring a
    down-time in the storage (i.e. no disruption to existing sessions),
    while you seem to be referring to a session being down for a long time
    above. We don't seem to be talking about the same scenario (?).
    
    Again, I agree that a product installation guide can resolve this issue
    by requiring all initiators to be quiesced and the storage to be taken
    offline for any re-config. However, this limitation should not be
    imposed on a scsi transport protocol for ensuring its correctness and
    should not limit implementation's capabilites of providing 24x7 uptime.
    
    Thanks in advance for considering all aspects of this issue.
    
    Regards,
    Santosh
    
    
    
    
    -- 
    ##################################
    Santosh Rao
    Software Design Engineer,
    HP-UX iSCSI Driver Team,
    Hewlett Packard, Cupertino.
    email : santoshr@cup.hp.com
    Phone : 408-447-3751
    ##################################
    


Home

Last updated: Fri Mar 15 23:18:09 2002
9144 messages in chronological order