SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI: error recovery



    
    
    Matt,
    
    I am going to use Jim Hafner's scheme to mark my answers - Look at /<JS>...
    <JS>/
    
    Julo
    
    Matt Wakeley <matt_wakeley@agilent.com> on 25/10/2000 02:20:58
    
    Please respond to Matt Wakeley <matt_wakeley@agilent.com>
    
    To:   IPS Reflector <ips@ece.cmu.edu>
    cc:
    Subject:  iSCSI: error recovery
    
    
    
    
    There has been a lot of discussion on how the Status reference numbers are
    (can be) used for error detection and recovery.  There is even a (optional)
    method for numbering the Data PDUs now.
    
    Let's clarify what "errors" we are trying to recover from, and how the RNs
    are
    meant to be used.  The example is as follows.  An iSCSI session has
    multiple
    TCP connections over *separate* physical links.  If one of the physical
    links
    fails, it is desirable to "recover" the SCSI I/Os that were occurring on
    the
    TCP connection(s) that were established over that link.  We should *not*
    attempt to recover "errors" that are caused due to data being discarded
    after
    it has been delivered from TCP to the upper layers (iSCSI, SCSI, whatever).
    
    Now there has already been discussion on how the TCP timeouts are
    (generally)
    longer than most SCSI command timeouts, so I'm only discussing link errors
    that can be detected fairly quickly.  For example, if the physical link
    gets
    yanked, the MAC can relatively quickly determine the link is down and
    notify
    the appropriate management entity.
    
    The goal is to have a mechanism for the initiator to determine what
    commands
    are outstanding on the failed connection. Likewise, it's desirable for the
    target to retain the data and/or status of I/Os until they are acknowledged
    by
    the initiator, so that in the event of a link failure, the target can
    "replay"
    the I/O.
    
    >From the ExpCmdRN, the initiator knows which commands it sent on the
    failed
    TCP connection where received by the target and those that were not.  Any
    commands received by the target, but not completed (no status pdu received
    before the failure) should be resent on another TCP connection with the
    "retry" bit set.  Any commands not received by the target are resent on
    another TCP connection without the "retry" bit.
    
    The target keeps the context (status and maybe data) of SCSI I/Os it's
    executed until it has positive acknowledgment from the initiator that the
    I/O
    is complete at the initiators end.  This acknowledgment is indicated in the
    ExpStatRN received from the initiator.  Acknowledged I/Os are then
    deallocated
    in the target.
    
    Now for some issues I have with the (current) iSCSI draft:
    
    In section 2.2.2 it states "As the only cause for long delays in responses
    can
    be failed connections and received responses free-up resources, we felt
    that
    score boarding responses at the initiator could be accomplished by simple
    bitmaps and there is no need to flow-control responses."
    
    Score boarding, especially with bit maps,  is an operation that can be
    somewhat CPU heavy in the normal "performance path" of the iSCSI layer. If
    the
    ExpStatRN was local to each TCP connection, rather than global across the
    iSCSI session, then there would be no requirement for score boarding.  The
    initiator would simply increment the StatRN received on each connection for
    use in the ExpStatRN for that connection.
    /<JS>
    I formulated it wrong.  As commands are associate with a connection at both
    initiator and target
    we always know what commands to reissue and there is no need to flow
    control responses.
    there is no scoreboarding involved.
    <JS>/
    >From an earlier email: "1.1.1.3   Data PDU numbering
    Incoming Data PDUs MAY be numbered by a target to enable fast recovery of
    long
    running READ commands. Data PDUs are numbered with DataRN.  NOP command
    PDUs
    carrying the same Initiator Tag as the Data PDUs are used to acknowledge
    the
    incoming Data PDUs."
    
    Since the only "error" we are trying to recover from is the very rare event
    that a physical link fails, I fail to see what the benefit is to be able to
    "recover" at the PDU level.  Plus, you'll have to build into the protocol a
    mechanism to request retransmission of particular data PDUs.  Let's
    simplify
    and just send the command with the "retry" bit set.
    /<JS>
    You don't have to build a mechanism to request retransmission.
    I assume that a clever target will keep only unacked data (the whole point
    of data PDU numbering is to lower the amount of data a target has to keep
    for recovery). At command restart it will resent what it has. Obviously a
    target may decide to ignore data acks (especially if it can reread the
    media) and I assume disk targets will do just that and tapes will use the
    acks.
    <JS>/
    Also from an earlier email:
    
    
    > >Mallikarjun,
    > >
    > >Thanks for your comments.
    > >
    > >Initiator scoreboarding is not considered. I will try to emphasize this
    > >even more in the new draft.
    > >The party responsible for reporting length is the target.  As
    overlapping
    > >ranges are not explicitly
    > >forbidden this would be a harder task than apparent. Reporting counts
    > >becomes entirely a question of faith!
    >
    > I didn't realize that (what FC calls as) data overlay is allowed, FCP
    > requires this initiator capability to be explicitly stated in session
    > establishment (process login).  Is there a particular reason why this
    > is chosen to be allowed by default in iSCSI?
    >
    
    Again, in the interests of simplicity, I request that data overlay be
    forbidden.  Period.  Otherwise, the initiator would have to perform score
    boarding at the byte level to be positively sure that each byte was really
    received.
    /<JS>
    That is an interesting point.  I would argue that in the interest of
    simplicity
    we will stay neutral.  If we explicitely forbid it the every Initiator is
    bound
    to check (enforce) it and that is a lot of work.  I assume we will want to
    use
    SHOULD.  My point about scoreboarding is that initiators are not required
    to
    check (enforce) the overlap.
    
    <JS>/
    >
    > Given that the physical number of bytes transferred could be more (data
    > overlay case) or less (command retry case), may I suggest that the
    discussion
    > about Residual under/overflow (section 3.3.1) and Residual Count (section
    > 3.3.2) make it explicitly clear that it's the status of "logical" # of
    > bytes that is being reported?  That way, initiator implementations can
    > always rely on those fields regardless of the history of the task.
    >
    > I assume you took note of my comments on the need to change wording and
    > payload definition of NOP PDU.
    >
    -Matt Wakeley
    
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:06:35 2001
6315 messages in chronological order