|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] iSCSI : Digest Error recovery causes data corruption
Julian & All,
Section 5.5 on digest errors states that an initiator MUST "discard
and re-start" a task when it encounters a header or data digest
error, provided it can recognize the initiator task tag.
I assume the above reference to re-start is to the use of the "retry"
bit. (?)
If so, there is a possibility of this error recovery mechanism leading
to
data corruption. The probability is reduced with the removal of partial
data recovery (based on DataRN/SN). However, data corruption can
still occur as follows :
- Last Data PDU on a READ I/O is returned from target to initiator.
- Initiator detects a header or data digest error on this last Data PDU
and discards the PDU.
- Initiator re-starts the task (using the "retry" bit).
- Target sends in the Status PDU on the previous instance of the
command. (It is not clear from the spec what the initiator does with
stale frames that continue to arrive on the previous instance of the
I/O. For now, I assume the initiator will, by some mechanism
discard such frames.)
- When the target receives the "retry" of this command, it thinks it
has sent back all the data and so, it only sends back Status for
this
"retry".
- Initiator has no count based checks and so, depends (trusts !)
the target with its status, based on which it reports a successful
I/O completion to the initiator's SCSI ULP, [indicating no residual
count, since target thought it sent all the data].
- SCSI ULP assumes a completed I/O and notifies application,
[since it depends on the initiator notifying it with an appropriate
service response on an underflow, which the initiator in this case
did not detect].
- Application encounters data corruption, due to the missing Data
PDU which was discarded by the initiator on a digest error,
and which was never re-sent by the target, since it does partial
recovery by only sending the status.
The StatSN based partial status recovery can lead to such dangerous
corner cases causing possible data corruption scenarios.
Regards,
Santosh
begin:vcard n:Rao;Santosh tel;work:408-447-3751 x-mozilla-html:FALSE org:Hewlett Packard, Cupertino.;SISL adr:;;19420, Homestead Road, M\S 43LN, ;Cupertino.;CA.;95014.;USA. version:2.1 email;internet:santoshr@cup.hp.com title:Software Design Engineer x-mozilla-cpt:;21088 fn:Santosh Rao end:vcard
Home Last updated: Tue Sep 04 01:05:38 2001 6315 messages in chronological order |