SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    iSCSI: draft review


    • To: ips@ece.cmu.edu
    • Subject: iSCSI: draft review
    • From: Pierre Labat <pierre_labat@hp.com>
    • Date: Tue, 07 Nov 2000 23:27:46 -0800
    • Content-Type: multipart/mixed;boundary="------------BE32356750584E7999A29E4B"
    • Organization: Hewlett Packard ATM-SISL
    • Sender: owner-ips@ece.cmu.edu

    Julian,
    
    Some comments on the new draft.
    
    Regards,
    
    Pierre
    
    
    
    Retry/restart
    =============
    
    In 1.2.2 Ordering and iSCSI numbering,
       1.2.3 Timers and timeouts,
    
    You talk about "restart" and "restart bit"
    
    In 1.2.2.1 Command numbering,
       2.1.3 Opcode-specific fields
       2.2.1 Flags & Task Attributes
       4.1 Connection failure
    
    you talk about "retry" and "retry bit"
    
    
    Is it the same thing?
    Could you put everywhere "retry" or "restart"?
    I looked for the "restart bit" and was unable
    to locate it.
    
    
    
    1.2.2.1 Command numbering
    =========================
    
    Add the phrase after the explaination of ExpCmdRN:
    
    "A command can be acknowledged only if the TCP connection,
    on which it has been received or the TCP connection associated
    to the command by a retry, is valid"
    
    It is to avoid the kind of scenario:
    
    ExpCmdRN=7
    There are two TCP connections.
    
    a) the initiator sends the command 7 and 10 over the connection 1
    
    b) the initiator sends the commands 8 and 9 over the connection 2
    
    c) the target receives the commands 8 and 9 but can't
       acknowledge them because it is waiting for the command 7
    
    d) connection 2 drops on the initiator side, the initiator
       logout the connection 2 and gets the last ExpCmdRN=7
       from the target
    
    e) the target receives the commands 7 and 10 over the connection 1
       and increments its variable ExpCmdRN to 11,
       but the initiator is not yet aware of that.
    
    f) the initiator retries the command 8 with the same CmdRN=8
    
    g) the target drops the retry because CmdRN<ExpCmdRN
    
    
    The target in step e) would have to block ExpCmdRN on 7
    till receiving a retry for 7.
    
    
    1.2.3 Timers and timeouts
    =========================
    
    It seems to me that a timer is missing. This one is
    handled by the target.
    It is started when no more TCP connections are
    valid in a session and reset when a new (valid)
    TCP connection is established.
    It defines how much time the target must wait
    before freeing up the session once all the TCP
    connections are out.
    
    In some application (server farm for ex), it is possible
    that the initiator disappear (the server is out of service)
    forever and nobody will ever intend a target reset.
    Hence the target can rely only on itself to free the
    session resources.
    But it has not to do that as soon as all the TCP
    connections drop, it needs to wait some time
    to give a chance to the initiator to recover
    (create a new TCP connection).
    
    
    About the timer T1,T2,T3 i think that T2 and T3
    are very CPU expensive when TCP is on the host.
    They have to be reset/restarted for each data PDUs
    ouch...
    
    
    
    2.6  SCSI Task Management Command
    =================================
    
    In the case of an Abort task.
    It should be specified that the target MUST returns "Function Complete"
    even if the target is unable to find trace of the task
    referenced by the  "Referenced Task Tag" field of the
    Abort task.
    
    It is needed for some cases as the following:
    
    a) a TCP connection drops on the initiator side
    
    b) the initiator (iSCSI layer) doesn't want to retry the command(s)
       but rather abort it and let the upper layers do whatever
       they think is the best to handle the error.
    
       Hence the initiator "logout" the failed TCP connection
       then send an "Abort task".
    
       If the original command didn't make it to the target,
       the abort must however return OK because all the
       resources on the target are released for this task
       and as the logout is done, it can't be a ghost command
       coming after the "abort task". Hence we can consider
       that the command is aborted.
    
    
    
    2.17 Logout Command
    ===================
    
    This logout command is a very good thing,
    it simplifies a lot the recovery.
    
    But i don't see how a session with a maximum of one TCP
    connection can use the logout command to recover.
    
    In your previous draft there were a "RecoverCID"
    in the login message. Hence when the second connection
    was opened the target knew from the beginning that it
    was just to replace a failed connection, hence it
    could accept this connection because the total number
    of connection(s) will remain 1.
    
    Now with the new draft, when the initiator does
    a second login, the target doesn't know that it is
    to replace a failed connection, hence it can reject
    the login.
    
    Unless the logout message is at the same time a login?
    I see login parameters at the bottom of the header.
    But in this case, there would be the session id
    that i don't see in the header.
    
    
    
    
    2.18 Logout Response
    ====================
    
    Adding the "ExpCmdRN" and the "MaxCmdRN" in the Logout Response
    will speed up the recovery.
    When the Logout Response comes, the initiator needs to know the
    last value of "ExpCmdRN" to decide if it uses the same CmdRN or
    not for the retries.
    It could get "ExpCmdRN" using a NOP but it is a waste of time.
    And in some case it can not rely on the completion of other
    commands to get "ExpCmdRN".
    Hence it is simple and faster to put "ExpCmdRN" and the "MaxCmdRN"
    in the Logout Response.
    
    
    
    
    4.1 Connection failure
    ======================
    
    Requiring that acknowledged commands (whose which CmdRN
    is less than ExpCmdRN) use new CmdRN can generate
    a deadlock such as the one described below.
    
    To avoid this kind of deadlock the acknowledged commands
    must be retried non numbered (CmdRN=0)
    
    The phrase:
          -the initiator will reissue all outstanding commands with their
          original Initiator Task Tag and their original CmdRN if they
          are not acknowledged yet or a new CmdRN if they were
          acknowledged; the retry (X) flag in the command PDU will be set
    
    must be changed in:
          -the initiator will reissue all outstanding commands with their
          original Initiator Task Tag and their original CmdRN if they
          are not acknowledged yet or non numbered if they were
          acknowledged; the retry (X) flag in the command PDU will be set
    
    Deadlock scenario:
    ------------------
    
    A session with one TCP connection
    
    a) Because the target experience a resource shortage,
       the initiator is command flow controlled.
       All the commands have been acknowledged by the target.
       ExpCmdRN=MaxCmdRN
       At this point the target can not handle any more command.
       Normally the resource shortage vanishes when commands
       are completed.
    
    b) But, no luck, just at that time the TCP connection drops
       on the initiator side. The target cannot free up resources
       because the completions are not acknowledged (StatRN).
    
    c) The initiator creates a second TCP connection and "logout"
       the failed one.
       At this point the target doesn't free up resources because
       it thinks that the retries will reuse these resources.
       Hence the target, always short in resource, doesn't increase
       MaxCmdRN.
    
    d) The initiator, that now has to send the retries with new CmdRNs
       can NOT because the MaxCmdRN has not changed. And on the other
       side the target don't  want to increase MaxCmdRN.
    
    e) We have a deadlock
    
    In fact to recover, we don't need extra resource on the target,
    we can do just using again (retry) the one already allocated
    (for the acknowledged commands).
    However requiring new CmdRNs to recover is perceived by the
    target as requesting new resources.
    


Home

Last updated: Tue Sep 04 01:06:29 2001
6315 messages in chronological order