SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: more on StatRN



    
    
    I'm aware tape timeouts could exceed 3 minutes or an hour, but tape
    commands
    are highly causal and there are existing SCSI mechanisms to deal with tape
    error recovery.  Also, there are ways in SCSI to report intermediate
    status, so you
    really dont need a hearbeat mechanism.
    
    I'm really trying to understand the motivation for stat_rn,
    
    Prasenjit
    
    
       Prasenjit Sarkar
       Research Staff Member
       IBM Almaden Research
       San Jose
    
    
    "Douglas Otis" <dotis@sanlight.net>@ece.cmu.edu on 10/20/2000 09:36:55 AM
    
    Sent by:  owner-ips@ece.cmu.edu
    
    
    To:   Prasenjit Sarkar/Almaden/IBM@IBMUS, "Randall R. Stewart"
          <randall@stewart.chicago.il.us>
    cc:   <ips@ece.cmu.edu>
    Subject:  RE: iSCSI: more on StatRN
    
    
    
    Prasenjit,
    
    The timeouts for streaming devices do range beyond 3 minutes.  There are
    also system parameters that affect the rate TCP time out as well.  In other
    words, do not expect either to timeout before the other.  With timeouts in
    the 10 minute range, a heart-beat would be desired in the range of tens of
    seconds if no other communications.  This should allow a reasonably quick
    response to a network failure after several successive failed responses.
    In
    iSCSI speak, it could be an iSCSI version of Echo (ping).  SCTP has
    Heartbeat detection.
    
    Doug
    
    > The ballpark figure for SCSI varies but by 3 minutes you can be rest
    > assured that SCSI will give up on a command, and will have probably
    > issued a lun/target reset.
    >
    > I've other arguments against the stat_rn mechanism, but I'll wait till
    > this is resolved,
    >
    > Prasenjit
    >
    >    Prasenjit Sarkar
    >    Research Staff Member
    >    IBM Almaden Research
    >    San Jose
    >
    >
    > "Randall R. Stewart" <randall@stewart.chicago.il.us>@ece.cmu.edu on
    > 10/20/2000 06:01:55 AM
    >
    > Sent by:  owner-ips@ece.cmu.edu
    >
    >
    > To:   Prasenjit Sarkar/Almaden/IBM@IBMUS
    > cc:   ips@ece.cmu.edu
    > Subject:  Re: iSCSI: more on StatRN
    >
    >
    >
    > Prasenjit:
    >
    > Being a transportish geek I don't know what the "failure" time is
    > on SCSI... can you give a ball-park figure?
    >
    > Another thought on this issue, is if SCSI retransmits, when
    > it times out (I think it does??), this just adds more
    > to the queue of things in TCP that are attempting to be sent.
    >
    > On the TCP failure side, in most cases that I have seen
    > a TCP connection fail, I have always seen it around 3 minutes
    > or more before the failure was report...
    >
    > R
    >
    > Prasenjit Sarkar/Almaden/IBM wrote:
    > >
    > > If the time TCP takes to give up on a connection is more than the time
    > SCSI
    > > takes
    > > to give up on a command, the stat_rn mechanism would not be useful.
    > >
    > > While I know the values for certain operating systems, I would like to
    > hear
    > > from
    > > people who can assert confidently that the TCP fail connection time <
    > SCSI
    > > command failure time.
    > >
    > > Prasenjit
    > >
    > >    Prasenjit Sarkar
    > >    Research Staff Member
    > >    IBM Almaden Research
    > >    San Jose
    > >
    > > "Mallikarjun C." <cbm@rose.hp.com>@ece.cmu.edu on 10/19/2000 07:40:16
    PM
    > >
    > > Please respond to cbm@rose.hp.com
    > >
    > > Sent by:  owner-ips@ece.cmu.edu
    > >
    > > To:   ips@ece.cmu.edu
    > > cc:
    > > Subject:  Re: iSCSI: Question on StatRN usage
    > >
    > > Julian,
    > >
    > > Thanks for the clarifications, I am pleased to understand that
    > > there's no overloading of any reference #s - the usage of new
    > > term "DataRN" in your new draft makes it a lot clearer.
    > >
    > > Some comments.
    > >
    > > >Mallikarjun and Prasanjit,
    > > >
    > > >Sorry for the confusion.
    > > >
    > > >The text is confusing and I have corrected it the new text. StatRN is
    > > >mandatory (it is the only way we have to ACK status and is not related
    > to
    > > >ordering).
    > >
    > > Eventhough StatRN itself may not be used by an initiator for ordering
    > > (unless it
    > > wants to order completions, for whatever reason), StatRNs are
    > themseleves
    > > are in a monotonically increasing order.  It is helpful to state this
    > > explicitly.
    > >
    > > >
    > > >As for the data the intent was to use StatRN to just number
    > data packets
    > > >for a given command (start with whatever you want) and have them acked
    > > with
    > > >a NOP with the same task tag (this is important for input data
    > for which
    > > we
    > > >have no other way of acking them). Those numbers are not related to
    the
    > > >Status numbers. No ordering or recovery is required up to command
    > restart.
    > > >I assume that numbers will not wrap unless a target sends more blocks
    > than
    > > >bytes (and it can!) but even then
    > > >no harm is done.
    > > >At recovery the restarted command will be followed by a NOP with the
    > same
    > > >initiator tag indicating what is the
    > > >the block expected. The initiator does not have to do any
    scoreboardong
    
    > > >only keep the counters.
    > > >The target can free early resources and iSCSI can recover eve long
    > reads.
    > > >For writes evidently R2T does the job but it means that write data can
    > be
    > > >recovered only with R2T.
    > >
    > > This implies that in case an iSCSI implementation is counting the # of
    > > bytes transferred in/out during a task, it shall not assume an error if
    > > the count is the less than expected transfer size - if the retry bit
    > > was set (This is especially true for writes, where the initiator
    doesn't
    > > know from which point target starts issuing R2Ts).  I would suggest
    > adding
    > > this comment as well to enable better interoperability.
    > >
    > > >Should we overload on CmdRN/ExpCmdRN to shorten recovery? I don't see
    a
    > > >need.
    > >
    > > NO, I don't see either.  My concern was that overloading these RNs for
    > > data would become a scalability bottleneck, when a session
    > spans mulitple
    > > NICs.  I am glad that it's not what was intended.
    > >
    > > Comments on your next email:
    > >    >The NOP message PDUs are not associated with a task, are meant for
    > >    >immediate delivery, and their only purpose is synchronizing the
    > > ordering
    > >    >registers of the target and initiator.
    > >
    > > I would like to point out that NOP PDUs are indeed associated with a
    > task!
    > > They are associated with a task whose read data they are ack'ing (given
    > > that the DataRN is only task-unique).  Also, I would like to point out
    > > that the current definition of NOP payload does not have Initiator Task
    > Tag
    > > - it needs to be added.
    > >
    > > Thanks.
    > > --
    > > Mallikarjun
    > > M/S 5601
    > > Networked Storage Architecture
    > > HP Storage Organization
    > > Hewlett-Packard, Roseville.
    > > cbm@rose.hp.com
    > >
    > > phone: (916) 785-5621
    > > fax:   (916) 785-2875
    >
    > --
    > Randall R. Stewart
    > randall@stewart.chicago.il.us or rrs@cisco.com
    > 815-342-5222 (cell) 815-477-2127 (work)
    >
    >
    >
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:06:36 2001
6315 messages in chronological order