SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: error recovery



    Prasenjit,
    
    I agree StatRN is not useful for detecting a failed connection as it is not
    deterministic with respect to time.  I also think it is a mistake to leave
    link status solely within the realm of the OS as the ULP knows when failure
    detection is critical and when the link is idle.  Perhaps indicating two
    levels of timeout for idle and in-use would be helpful in quantifying these
    timeout limits.
    
    Although StatRN would be of little use in detecting a failure, it would be
    helpful in a faster recovery once a failure has been detected.  This would
    be true for any number of connections.  The time for link recovery should be
    considered.  Would it be impractical to limit detection and recovery to less
    than 60 seconds if in-use?  The idle detection could be left to the OS with
    keep-alive recommendations.
    
    Doug
    
    > I think Matt's answer has gone the most about clarifying
    > the need for stat_rn.
    >
    > I was concentrating more on the one connection case
    > (which I think we cannot ignore).
    >
    > Let us assume that there is a facility for "quick" detection
    > of media failures.
    >
    > In this situation, only the initiator can "quickly" detect a failed
    > connection and then build a separate new connection to
    > the target. (The opposite is not true as the intitiator
    > is not listening on any port). Moreover, if the target
    > has a limit of one connection per session and does
    > not also "quickly" detect a failed connection, it may think that the
    > old connection is still alive and reject the new connection.
    >
    > In summary, while the stat_rn mechanism is useful
    > for the multiple connections per session model, I find its
    > use to be extremely limited in the one connection per session
    > model.
    >
    > I leave it up to the authors whether they should make stat_rn
    > mandatory for the single connection per session model.
    >
    > Prasenjit
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >    Prasenjit Sarkar
    >    Research Staff Member
    >    IBM Almaden Research
    >    San Jose
    >
    >
    > Matt Wakeley <matt_wakeley@agilent.com>@ece.cmu.edu on 10/24/2000 05:20:58
    > PM
    >
    > Please respond to Matt Wakeley <matt_wakeley@agilent.com>
    >
    > Sent by:  owner-ips@ece.cmu.edu
    >
    >
    > To:   IPS Reflector <ips@ece.cmu.edu>
    > cc:
    > Subject:  iSCSI: error recovery
    >
    >
    >
    > There has been a lot of discussion on how the Status reference numbers are
    > (can be) used for error detection and recovery.  There is even a
    > (optional)
    > method for numbering the Data PDUs now.
    >
    > Let's clarify what "errors" we are trying to recover from, and how the RNs
    > are
    > meant to be used.  The example is as follows.  An iSCSI session has
    > multiple
    > TCP connections over *separate* physical links.  If one of the physical
    > links
    > fails, it is desirable to "recover" the SCSI I/Os that were occurring on
    > the
    > TCP connection(s) that were established over that link.  We should *not*
    > attempt to recover "errors" that are caused due to data being discarded
    > after
    > it has been delivered from TCP to the upper layers (iSCSI, SCSI,
    > whatever).
    >
    > Now there has already been discussion on how the TCP timeouts are
    > (generally)
    > longer than most SCSI command timeouts, so I'm only discussing link errors
    > that can be detected fairly quickly.  For example, if the physical link
    > gets
    > yanked, the MAC can relatively quickly determine the link is down and
    > notify
    > the appropriate management entity.
    >
    > The goal is to have a mechanism for the initiator to determine what
    > commands
    > are outstanding on the failed connection. Likewise, it's desirable for the
    > target to retain the data and/or status of I/Os until they are
    > acknowledged
    > by
    > the initiator, so that in the event of a link failure, the target can
    > "replay"
    > the I/O.
    >
    > >From the ExpCmdRN, the initiator knows which commands it sent on the
    > failed
    > TCP connection where received by the target and those that were not.  Any
    > commands received by the target, but not completed (no status pdu received
    > before the failure) should be resent on another TCP connection with the
    > "retry" bit set.  Any commands not received by the target are resent on
    > another TCP connection without the "retry" bit.
    >
    > The target keeps the context (status and maybe data) of SCSI I/Os it's
    > executed until it has positive acknowledgment from the initiator that the
    > I/O
    > is complete at the initiators end.  This acknowledgment is
    > indicated in the
    > ExpStatRN received from the initiator.  Acknowledged I/Os are then
    > deallocated
    > in the target.
    >
    > Now for some issues I have with the (current) iSCSI draft:
    >
    > In section 2.2.2 it states "As the only cause for long delays in responses
    > can
    > be failed connections and received responses free-up resources, we felt
    > that
    > score boarding responses at the initiator could be accomplished by simple
    > bitmaps and there is no need to flow-control responses."
    >
    > Score boarding, especially with bit maps,  is an operation that can be
    > somewhat CPU heavy in the normal "performance path" of the iSCSI layer. If
    > the
    > ExpStatRN was local to each TCP connection, rather than global across the
    > iSCSI session, then there would be no requirement for score boarding.  The
    > initiator would simply increment the StatRN received on each
    > connection for
    > use in the ExpStatRN for that connection.
    >
    > >From an earlier email: "1.1.1.3   Data PDU numbering
    > Incoming Data PDUs MAY be numbered by a target to enable fast recovery of
    > long
    > running READ commands. Data PDUs are numbered with DataRN.  NOP command
    > PDUs
    > carrying the same Initiator Tag as the Data PDUs are used to acknowledge
    > the
    > incoming Data PDUs."
    >
    > Since the only "error" we are trying to recover from is the very
    > rare event
    > that a physical link fails, I fail to see what the benefit is to
    > be able to
    > "recover" at the PDU level.  Plus, you'll have to build into the
    > protocol a
    > mechanism to request retransmission of particular data PDUs.  Let's
    > simplify
    > and just send the command with the "retry" bit set.
    >
    > Also from an earlier email:
    >
    >
    > > >Mallikarjun,
    > > >
    > > >Thanks for your comments.
    > > >
    > > >Initiator scoreboarding is not considered. I will try to emphasize this
    > > >even more in the new draft.
    > > >The party responsible for reporting length is the target.  As
    > overlapping
    > > >ranges are not explicitly
    > > >forbidden this would be a harder task than apparent. Reporting counts
    > > >becomes entirely a question of faith!
    > >
    > > I didn't realize that (what FC calls as) data overlay is allowed, FCP
    > > requires this initiator capability to be explicitly stated in session
    > > establishment (process login).  Is there a particular reason why this
    > > is chosen to be allowed by default in iSCSI?
    > >
    >
    > Again, in the interests of simplicity, I request that data overlay be
    > forbidden.  Period.  Otherwise, the initiator would have to perform score
    > boarding at the byte level to be positively sure that each byte was really
    > received.
    >
    > >
    > > Given that the physical number of bytes transferred could be more (data
    > > overlay case) or less (command retry case), may I suggest that the
    > discussion
    > > about Residual under/overflow (section 3.3.1) and Residual
    > Count (section
    > > 3.3.2) make it explicitly clear that it's the status of "logical" # of
    > > bytes that is being reported?  That way, initiator implementations can
    > > always rely on those fields regardless of the history of the task.
    > >
    > > I assume you took note of my comments on the need to change wording and
    > > payload definition of NOP PDU.
    > >
    > -Matt Wakeley
    >
    >
    >
    >
    >
    
    


Home

Last updated: Tue Sep 04 01:06:35 2001
6315 messages in chronological order