SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: remove recovery from transport-layer connection failure(?)



    Title: RE: iSCSI: remove recovery from transport-layer connection failure(?)

    Comments inline...

    > -----Original Message-----
    > From: Stephen Bailey [mailto:steph@cs.uchicago.edu]
    > Sent: Tuesday, September 26, 2000 10:16 PM
    > To: ips@ece.cmu.edu
    > Subject: Re: iSCSI: remove recovery from transport-layer connection
    > failure(?)
    >
    >
    > > Currently, iSCSI is spec'ed to recover from transport-layer
    > > connection failures.
    > >
    > > The main motivation for this decision was to support tape backup
    > > applications that are quite sensitive to any failures that get
    > > propogated to their layer.
    > >
    > > So, perhaps we can remove the requirement of recovering from
    > > transport-layer connection failures in iSCSI. This would simplify
    > > the protocol somewhat.

    Yes, it would simplify the protocol somewhat, and would also yield a protocol that would not accomodate tapes (or any other sequential device).

    I guess the question to be answered is wheter or not iSCSI will be a suitable transport/LLP for all SCSI traffic, as defined in SAM/SAM-2. If so, some means of recovering from transport layer problems must be built into the spec.

    It may be worth pointing out here that one of the most significant barriers to early widespread market acceptance of SANs has been the inability to back up data over the SAN. The most significant barrier to backup was the lack of support in FC for LLP/transport level recovery of commands. As I have mentioned before, issuing the same command twice to a sequential device corrupts data.

    If, on the other hand, iSCSI is not to be SAM/SAM-2 compliant...

    > >
    > > Thoughts?
    >
    > I'm all for eliminating command recovery.
    >
    > There seem to be several reasons advanced for command recovery.
    >
    > The first seems to be based upon an inappropriate analogy to FCP.
    > Command recovery had to be added to FCP-2 because the FC layer is
    > unreliable.  A single dropped FC frame leads to a failed FCP command.
    > This clearly upsets tape operation even when the link is performing
    > nominally.  In FCP, without command recovery, with some observable
    > frequency, you will get an expected error that leads to complete,
    > irrecoverable failure of a transfer stream.  The other thing that
    > makes FCP-2 command recovery work well is when you are doing a write,
    > which is 90% (maybe it's 99%?) of tape operation, the target can
    > return an early indication of most frame drops, rather than waiting
    > for a timer to expire.
    >
    > TCP's reliability solves this problem in another way.  By the time you
    > get a TCP connection failure, you have already exhausted a set of
    > reliability mechanisms which guarantee, with high certainty, that
    > further data can not be transferred between the two endpoints.
    >
    > `the two endpoints' phrase suggests the other reason advanced for
    > command recovery.  That is, to permit path failover for commands which
    > are not idempotent, such as tape write sequential.  The
    > problem with this, is that it is not clear HOW iSCSI command recovery
    > can actually work properly, given a TCP connection failure indication.
    > It takes a long time for a TCP connection to fail, and by that time,
    > I'm not sure recovery would reasonably be possible.  Perhaps I'm in
    > error on this assumption.  Can a tape guru (Joe from Exabyte?) comment
    > on whether recovery would be possible after many seconds (tens,
    > hundreds) have elapsed?

    Putting my tape guru hat on ;-)

    There are numerous backup applications that set command timeouts in excess of ten minutes. When one considers the sequential nature of tape, one will realize that this is an eminently reasonable thing to do.

     
    > The SCSI layer has never been solely responsible for ensuring reliable
    > backup.  Macro scale things go wrong with tape (run off the end, get
    > eaten, etc..) with relatively high frequency.  A low level backup
    > engine like tar or dump will fail on a SCSI error, and that's OK.
    > There must also be a higher level software component like Amanda,
    > which manages retries, including operator intervention, to ensure
    > reliable backup.

    A mature tape drive, coupled with a mature backup application, is unlikely to run off the end or get eaten. I guess 'relatively high frequency' must be qualified. Most installations will encounter neither of these problems ever in the service life of the system.

    The problem area that I foresee is where commands get lost in transmission. With disks, the command can merely be resent. This is fine, as the associated data goes to/is taken from the same LBA on the media. Whether it was the command or the response that got lost in transmission, data is fine. With sequential devices, if it was truly the command that got lost in transmission, the command can safely be resent. However, if the response gets lost rather than the command, reissuing a WRITE command results in two copies of the data on the tape - one at the LBA the host application expects it to be, and another starting at the next sequential LBA. The host's mapping of what data it thinks is at each LBA is corrupted, leading to its subsequent inability to recover data from that point to the end of the tape.

     
    > It seems like whether iSCSI has a command recovery mechanism should be
    > a function of whether somebody can stand up and say for sure that it
    > solves a real problem.  So far it only seems like it MIGHT solve a
    > problem.  Who can say `this solves MY problem!'?

    Command recovery solves the problem outlined above. It also solves the problem of being a SAM/SAM-2 compliant LLP.

    Joe
    Exbayte

    > Steph



Home

Last updated: Tue Sep 04 01:06:58 2001
6315 messages in chronological order