SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues



    
    
    Santosh,
    
    By enforce I meant - enforce like in the legalese - i.e., police.
    If you have a MUST that you are never going to check better find a better
    solution.
    Checking it entails scoreboarding that no other SCSI protocol does or
    needs.
    
    
    Sequencing is simple (and that is what FC does) and lets the target master
    the transfer
    the way it usually does for all SCSI protocols.
    
    I feel we have spent already too much time on this single issue -:)
    
    Julo
    
    
    
    Santosh Rao <santoshr@cup.hp.com> on 30/01/2001 10:25:45
    
    Please respond to Santosh Rao <santoshr@cup.hp.com>
    
    To:   ips@ece.cmu.edu (ips)
    cc:
    Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues
    
    
    
    
    > julian_satran@il.ibm.com wrote:
    > >
    > > Santosh,
    > >
    > > The trouble with forbidding a certain behavior is that you have to
    enforce
    > > it (i.e.,  check and signal errors for units that do not behave).
    >
    > No you don't have to "enforce" it at all.  Using Santosh's example, if a
    > target broke the rules and performed data overlay, then the initiator
    will
    > always mark the I/O as bad, and the market forces will take over
    (customer
    > will buy a compliant target).
    
    Julian,
    
    Removing support for overlapped data xfer's has multiple benefits:
    1)   It provides initiators a reliable way of ensuring
         that the I/O did complete without any underrun.
    
    2)   It simplifies SCSI Assist implementations that no longer
         need to deal with overlapped data xfer conditions.
    
    3)   It is simpler than performing book-keeping
         on DataSN to ensure that all DataSNs have been
         received. (IOW, score-boarding at a DataSN level,
         instead of at a byte level.)
    
    I'd say the count based solution is preferrable, given the above.
    All protocols inherently enforce certain behaviour by mandating
    features (the use of MUST, shall). I don't think that's a
    strong enough reason to reject this proposal.
    
    To summarize :
    o    Dis-allow overlapped data xfer's.
    
    o    Initiators to perform a count check as is done in FC.
    
    o    On detecting an underrun, the command may be retried
         BUT WITHOUT SETTING the "retry" bit. This is
         particularly important because targets that implement
         status recovery may be ignorant of the fact that the
         initiator encountered a digest error [which caused
         the underrun] and so, they just send back a Status
         PDU under the belief that the command is complete,
         whereas, the initiator wants all the Data PDUs
         to be re-sent.
    
    Regards,
    Santosh
    
    
    >
    >    Besides
    > > - the whole philosophy of the SCSI set of protocols is that the target
    is
    > > the master and the initiator should let the target decide how to
    fulfill
    > > the command.   That is why we chose not to impose restrictions above
    those
    > > imposed by SCSI.  The whole set of issues is also raised only because
    we
    > > provide also for storage proxies - otherwise a stronger checksum at TCP
    > > level and recovery at TCP level would have done what we wanted and
    recovery
    > > of the type we are dealing now with would have been done at TCP level.
    > > I am confident that we can reinstate DataSN a simple mean to sequence
    (not
    > > ack) data packets and considerably simplify recovery.
    >
    > If there is a data digest failure, the iSCSI PDU is discarded, and the
    test
    > that Santosh describes will fail.  At that point, the command is
    "retried",
    > and using my example of the retry implementation in the thread "iSCSI:
    I/O
    > (command) recovery" error recovery is performed.  No need for DataSN...
    >
    > > And do not forget that raising the error up to ULP with a service
    response
    > > will make the recovery far more expensive (as Prasenjit has already
    stated)
    > > - far more than current wedge drivers do as these rarely consider
    commands
    > > in flight and the need to keep order in a target that is not yet aware
    that
    > > something went wrong.
    >
    > I agree that erorrs due to the transport should not be propagated to the
    ULP.
    > In the case of a digest failure, this means that the TCP checksum
    indicated
    > the segment was good, meaning that a middle box corrupted the TCP segment
    and
    > sent it out with a "fixed" TCP checksum.
    >
    > The simple iSCSI error recovery using the retry should handle this corner
    case
    > very well.
    >
    > -Matt
    >
    >
    > >
    > > Julo
    > >
    > > Santosh Rao <santoshr@cup.hp.com> on 28/01/2001 00:07:08
    > >
    > > Please respond to Santosh Rao <santoshr@cup.hp.com>
    > >
    > > To:   Julian Satran/Haifa/IBM@IBMIL
    > > cc:   ips@ece.cmu.edu
    > > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window
    issues
    > >
    > > Julian,
    > >
    > > The missing Data PDU could be detected if the initiator were to
    > > perform a count check operation upon receiving SCSI Response PDU,
    > > along the lines of :
    > >
    > > no. of bytes xfer'ed =
    > >      (Expected Data Xfer Length) - (Basic Residual Count)
    > >
    > > where,
    > > Expected Data Xfer Length -> as specified in SCSI Command PDU
    > > Basic Residual Count -> as specified in SCSI Response PDU
    > >
    > > However, this is currently not possible due to overlapped data
    > > transfers being allowed by iSCSI. If iSCSI were to dis-allow
    > > overlapping data xfer's and initiators used a count check
    > > [as is done in FC], this would also address the problem.
    > >
    > > Regards,
    > > Santosh
    > >
    > > >
    > > >
    > > >
    > > > If the header is a data header we can hardly trust the ULP to
    recognize
    > > the
    > > > error (he might be unaware
    > > > of a missing packet).  With data numbering this situation could have
    been
    > > > discovered at "status time".
    > > > The only thing we could do is restart all commands but this is
    equivalent
    > > > to a connection restart for all practical purposes.  Dropping data
    > > > numbering might have some more "side-effects" like this.
    > > > As the combination of values - tag, address, offset may stil let some
    > > > implementations to assume that they have
    > > > a correct task identifier I don't see a point in mandating a recovery
    > > > behavior and the implementer may choose to:
    > > >
    > > > -retry/restart command
    > > > -logout drop and rebuild connection login and restart/retry
    > > > -abort all task sets (practically reset the target!) and report for
    all
    > > > commands a "delivery system failure" (kick-in the ULP recovery) and
    if
    > > you
    > > > suspect the link quality rebuild it; this later behavior means also
    that
    > > > you have to stop delivering anything on any link  to the target to
    avoid
    > > > out of order execution until you have finished the cleanup - pretty
    > > drastic
    > > >
    > > > With data numbering recovery could have stayed within the confines of
    a
    > > > command even if a header was bad.
    > > > Perhaps we should leave the DataSN only as a sequencer so that at
    > > > status-time the initiator should be able to find if a data packet was
    > > > dropped (no ExpDataSN on a NOP).
    > > >
    > > > Regards,
    > > > Julo
    > > >
    > > >
    > > >
    > > >
    > > > Michael Krause <krause@cup.hp.com> on 27/01/2001 04:59:12
    > > >
    > > > Please respond to Michael Krause <krause@cup.hp.com>
    > > >
    > > > To:   Julian Satran/Haifa/IBM@IBMIL
    > > > cc:   ips@ece.cmu.edu
    > > > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window
    > > issues
    > > >
    > > >
    > > >
    > > >
    > > > At 07:40 PM 1/25/2001 +0200, julian_satran@il.ibm.com wrote:
    > > >
    > > >
    > > > >1) The initiator task tag cannot be trusted when a header digest
    error
    > > > >is seen. What does the phrase "provided it can recognize the
    initiator
    > > > >task tag" mean ?
    > > > >How can an initiator reliably claim that the initiator task tag is
    > > > >trustworthy ?
    > > > >
    > > > ><js> an initiator may choose to provide some redundancy in the tag
    > > itself
    > > > ></js>
    > > >
    > > > I'm aware of some techniques for inserting redundant information in
    tags
    > > > which limits the potential error exposure when a multi-bit error
    occurs,
    > > > however these are not fail-safe leading to potential incorrect
    operation
    > > -
    > > > perhaps benign in many cases; perhaps not in others. As such, if a
    header
    > > > digest error occurs, the PDU should be silently discarded and
    recovery
    > > > should be left to the ULP.  There is little to no value having two
    > > > mechanisms to solve the same problem.
    > > >
    > > > Mike
    > > >
    > > >
    > > >
    > > >
    > > >
    > >
    > > --
    > > #################################
    > > Santosh Rao
    > > Software Design Engineer,
    > > HP, Cupertino.
    > > email : santoshr@cup.hp.com
    > > Phone : 408-447-3751
    > > #################################
    >
    
    
    --
    #################################
    Santosh Rao
    Software Design Engineer,
    HP, Cupertino.
    email : santoshr@cup.hp.com
    Phone : 408-447-3751
    #################################
    
    
    
    


Home

Last updated: Tue Sep 04 01:05:37 2001
6315 messages in chronological order