SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    iSCSI : Digest Error recovery causes data corruption


    • To: IPS Reflector <ips@ece.cmu.edu>
    • Subject: iSCSI : Digest Error recovery causes data corruption
    • From: Santosh Rao <santoshr@cup.hp.com>
    • Date: Mon, 29 Jan 2001 20:28:35 -0800
    • Content-Type: multipart/mixed;boundary="------------9083715174E84FD9DEE9EBE2"
    • Organization: Hewlett Packard, Cupertino.
    • Sender: owner-ips@ece.cmu.edu

    Julian & All,
    
    Section 5.5 on digest errors states that an initiator MUST "discard
    and re-start" a task when it encounters a header or data digest
    error, provided it can recognize the initiator task tag.
    
    I assume the above reference to re-start is to the use of the "retry"
    bit. (?)
    
    If so, there is a possibility of this error recovery mechanism leading
    to
    data corruption. The probability is reduced with the removal of partial
    data recovery (based on DataRN/SN). However, data corruption can
    still occur as follows :
    
    - Last Data PDU on a READ I/O is returned from target to initiator.
    
    - Initiator detects a header or data digest error on this last Data PDU
       and discards the PDU.
    
    - Initiator re-starts the task (using the "retry" bit).
    
    - Target sends in the Status PDU on the previous instance of the
       command. (It is not clear from the spec what the initiator does with
       stale frames that continue to arrive on the previous instance of the
       I/O. For now, I assume the initiator will, by some mechanism
       discard such frames.)
    
    - When the target receives the "retry" of this command, it thinks it
        has sent back all the data and so, it only sends back Status for
    this
        "retry".
    
    - Initiator has no count based checks and so, depends (trusts !)
       the target with its status, based on which it reports a successful
       I/O completion to the initiator's SCSI ULP, [indicating no residual
       count, since target thought it sent all the data].
    
    - SCSI ULP assumes a completed I/O and notifies application,
       [since it depends on the initiator notifying it with an appropriate
       service response on an underflow, which the initiator in this case
       did not detect].
    
    - Application encounters data corruption, due to the missing Data
        PDU which was discarded by the initiator on a digest error,
        and which was never re-sent by the target, since it does partial
        recovery by only sending the status.
    
    The StatSN based partial status recovery can lead to such dangerous
    corner cases causing possible data corruption scenarios.
    
    Regards,
    Santosh
    
    begin:vcard 
    n:Rao;Santosh 
    tel;work:408-447-3751
    x-mozilla-html:FALSE
    org:Hewlett Packard, Cupertino.;SISL
    adr:;;19420, Homestead Road, M\S 43LN,	;Cupertino.;CA.;95014.;USA.
    version:2.1
    email;internet:santoshr@cup.hp.com
    title:Software Design Engineer
    x-mozilla-cpt:;21088
    fn:Santosh Rao
    end:vcard
    


Home

Last updated: Tue Sep 04 01:05:38 2001
6315 messages in chronological order