SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI: remove recovery from transport-layer connection failure(?)



    > Currently, iSCSI is spec'ed to recover from transport-layer
    > connection failures.
    > 
    > The main motivation for this decision was to support tape backup
    > applications that are quite sensitive to any failures that get
    > propogated to their layer.
    >
    > So, perhaps we can remove the requirement of recovering from
    > transport-layer connection failures in iSCSI. This would simplify
    > the protocol somewhat.
    > 
    > Thoughts?
    
    I'm all for eliminating command recovery.
    
    There seem to be several reasons advanced for command recovery.
    
    The first seems to be based upon an inappropriate analogy to FCP.
    Command recovery had to be added to FCP-2 because the FC layer is
    unreliable.  A single dropped FC frame leads to a failed FCP command.
    This clearly upsets tape operation even when the link is performing
    nominally.  In FCP, without command recovery, with some observable
    frequency, you will get an expected error that leads to complete,
    irrecoverable failure of a transfer stream.  The other thing that
    makes FCP-2 command recovery work well is when you are doing a write,
    which is 90% (maybe it's 99%?) of tape operation, the target can
    return an early indication of most frame drops, rather than waiting
    for a timer to expire.
    
    TCP's reliability solves this problem in another way.  By the time you
    get a TCP connection failure, you have already exhausted a set of
    reliability mechanisms which guarantee, with high certainty, that
    further data can not be transferred between the two endpoints.
    
    `the two endpoints' phrase suggests the other reason advanced for
    command recovery.  That is, to permit path failover for commands which
    are not idempotent, such as tape write sequential.  The
    problem with this, is that it is not clear HOW iSCSI command recovery
    can actually work properly, given a TCP connection failure indication.
    It takes a long time for a TCP connection to fail, and by that time,
    I'm not sure recovery would reasonably be possible.  Perhaps I'm in
    error on this assumption.  Can a tape guru (Joe from Exabyte?) comment
    on whether recovery would be possible after many seconds (tens,
    hundreds) have elapsed?
    
    The SCSI layer has never been solely responsible for ensuring reliable
    backup.  Macro scale things go wrong with tape (run off the end, get
    eaten, etc..) with relatively high frequency.  A low level backup
    engine like tar or dump will fail on a SCSI error, and that's OK.
    There must also be a higher level software component like Amanda,
    which manages retries, including operator intervention, to ensure
    reliable backup.
    
    It seems like whether iSCSI has a command recovery mechanism should be
    a function of whether somebody can stand up and say for sure that it
    solves a real problem.  So far it only seems like it MIGHT solve a
    problem.  Who can say `this solves MY problem!'?
    
    Steph
    


Home

Last updated: Tue Sep 04 01:07:03 2001
6315 messages in chronological order