RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"

To: "'someshg@yahoo.com'" <someshg@yahoo.com>, julian_satran@il.ibm.com, ips@ece.cmu.edu
Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
From: Venkat Rangan <venkat@rhapsodynetworks.com>
Date: Fri, 30 Mar 2001 19:56:33 -0800
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
Somesh,

Your point is valid for data-SACK. But there is also another
benefit that SACK was supposed to provide - the ability to
fill StatSN holes and acknowledge timely reception of status
by the initiator.

This leads to an obvious question: how does a StatSN hole at the
initiator created in the first place? Since iSCSI Response PDUs
are dominated by the iSCSI header, the most likely cause is a
header digest failure of the iSCSI Response PDU, detected at
the initiator, but which escapes TCP Checksum. I am not sure that
the 'Sense Len' portion of a iSCSI Response is large enough to
suffer from CRC-which-escapes-TCP-checksum error condition.

Some would argue that if there is an iSCSI header digest failure
which escapes TCP Checksum, the entire connection should be reset.
If that is the case, header digest error can only be recovered
at the session level and not at the connection level.

There is another viewpoint that one could keep the connection and
advance to the marker/synch point and recover; in this case, the
StatSN SACK is useful in allowing the target to quickly send the
missing status responses and release its resources associated
with unacknowledged status responses.

When recovering over to a new connection, you still may have holes
in StatSN because you may have received on the old connection
several well-formed Response PDUs after the one that had the digest
failure. When connection-level recovery occurs, if the initiator
throws these away, it will have to request target to resend all
responses beyond the acknowledged StatSN. Having the SACK reduces
to resending only those that had errors.

Simple minded initiators and targets can choose to only do connection
level or session level recovery. The Retry bit is then present
only to keep the space of sequence numbers consistent in the wake of
command recovery on new connection.

Venkat Rangan
Rhapsody Networks Inc.
http://www.rhapsodynetworks.com

-----Original Message-----
From: Somesh Gupta [mailto:someshg@yahoo.com]
Sent: Friday, March 30, 2001 4:26 PM
To: julian_satran@il.ibm.com; ips@ece.cmu.edu
Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"


Sorry to have been missing for a while. Hope you will
appreciate my being back in action :-). It was a fairly
clear consensus in Orlando that applications broke up
their transfers into reasonably small chunks i.e. they
did not have very long running transfers.

Therefore the consensus was that a command level recovery
mechanism was sufficient instead of an ack/sack for each
data PDU.

The SACK mechanism was a post Orlando invention. Without
an ack mechanism (for every data PDU), the SACK mechanism
just imposes additional burden on either end of the session,
without really much benefit.

The benefit of having SACK is of saving bandwidth in case
the data part of the data PDU failed an integrity check
(but passed TCP checksum). This is a rare enough case that
as a percentage, the bandwidth loss from retransmitting
all the data associated with a read or write command is
very very small.

In addition, it avoids the complexity of restarting
something from the middle, as compared to from the begining.

To me it seems that there is significant simplicity (from
implementation, reliability and recovery process) from
having smaller data transfer per command.

I would really like to get rid of the SACK command.

Somesh

> -----Original Message-----
> From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
> julian_satran@il.ibm.com
> Sent: Wednesday, March 28, 2001 6:57 AM
> To: ips@ece.cmu.edu
> Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
>
>
>
>
> Mallikarjun,
>
> Last summer I thought that recovery within a connection should be left to
> TCP. It is simple and could be made available through IPsec (if no new
> option of any form can be added).
>
> Two things killed this:
>
>    The requirement to have a data encapsulation that can pass through
>    application proxies (like a storage router)
>    The "NO WAY" message we got from IESG-Security on a CRC only IPSec
>    header
>
>
> As for the ACK - I am very much in favor of it (it is a no brainer) and
> implementations are in fact allowed to drop even unacked data.
>
> I am bound by the Orlando meeting decision to drop it. Except the regular
> "oppose everything" crowd the two vocal opponents where Somesh Gupta and
> Matt Wakeley.
>
> David may want or not to re-open the issue - I am not going to ask for it.
>
> Regards,
> Julo
>
> "Mallikarjun C." <cbm@rose.hp.com> on 28/03/2001 00:45:02
>
> Please respond to cbm@rose.hp.com
>
> To:   Black_David@emc.com
> cc:   Julian Satran/Haifa/IBM@IBMIL, cbm@rose.hp.com, someshg@yahoo.com,
>       steph@cs.uchicago.edu, John Hufferd/San Jose/IBM@IBMUS,
>       ldalleore@snapserver.com, venkat@rhapsodynetworks.com
> Subject:  RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
>
>
>
>
> David and Julian,
>
> I appreciate both your views, and should I say that they're
> along predicted lines :-)
>
> - David's right in saying that the situation is akin to FC's.
>   However, I would like to point out that FC is an unreliable
>   transport, and hence is forced to pick up a lot of the transport
>   baggage (at least in FCP-2, as I understand), in addition
>   to being a SCSI encapsulation layer.  Unfortunately, even with
>   TCP being the "reliable" transport, iSCSI is going along the
>   same lines - ie. transport baggage + SCSI encapsulation.  My
>   point is - if this is indeed a necessary evil, why don't we
>   complete iSCSI's transport functionality by data-ACKs?
>
> - If data SACK is introduced mostly to make up for TCP's shortcomings,
>   we're making its usage (and implementation) drastically less appealing
>   since the only way error recovery algorithms can *rely* on data SACK
>   is when replay is supported (or, "ReplaySupport=yes"  in my proposal),
>   which is extremely expensive.  IOW, we're defining data SACK in the
>   draft and not providing any incentives to implement and use it!
>
> - I submit that since iSCSI is being hailed as the ideal SCSI Transport
>   protocol in its definition so far (and I believe, rightly so - mandating
>   command ordering, bi-di support, SCSI CRN support to name a few
> examples),
>   the perfectly SCSI-legal R/W interactions that break in other transports
>   *do not* have to break in iSCSI.
>
> - A last idea (may seem radical at this point) in regards to iSCSI
>   being a "full transport". This provides us an opportunity to "cast
>   off" the transport baggage in future when we truly move to a "reliable"
>   transport (perhaps TCP with CRCs/SCTP ?) - if we do a good job of
>   keeping the encapsulation stuff separate from the transport stuff.
>   (Julian, I heard from Randy that ideas similar to this were explored
>   in your Haifa meeting.  And yes, he recalls they were given up since
>   TCP was supposed to be reliable and granularity of recovery was deemed
>   one I/O.)
>
> With that said, may I request David (with his co-chair hat on, :-))
> to add some binding comments/observations on this discussion?
>
> If we decide to leave data SACKs as unattractive to implement, the draft
> should in the least add a statement like - "Note that satisfying all
> possible data SACK requests for a task with an unacknowledged status
> implies implementing the I/O replay buffer on the part of targets."
> --
> Mallikarjun
>
>
> Mallikarjun Chadalapaka
> Networked Storage Architecture
> Network Storage Solutions Organization
> MS 5668   Hewlett-Packard, Roseville.
> cbm@rose.hp.com
>
>
>
>
> >I think Julian's basically right -- I would point
> >out that any case of write after read that breaks
> >over iSCSI will also break over Fibre Channel.
> >On FC, the scenario starts with a frame CRC failure
> >on read data at the Initiator, so applications
> >have to cope and typically do so by enforcing
> >ordering at the app rather than using SCSI task
> >ordering.
> >
> >While SCSI has clever tools like ACA and task
> >ordering that appear to allow dependent operations
> >to be sent to the target concurrently, in practice
> >they don't work and/or aren't used (funny thing,
> >those two reinforce each other ;-) ).  Hence
> >a minimal approach to them is in order:
> >- Make sure the result will interoperate.
> >- Make sure T10 doesn't ding us for leaving something
> >    completely out.
> >- Don't specify anything not needed for the above.
> >
> >My 0.02,
> >--David
> >
> >> -----Original Message-----
> >> From:  julian_satran@il.ibm.com [SMTP:julian_satran@il.ibm.com]
> >> Sent:  Tuesday, March 27, 2001 9:23 AM
> >> To:    cbm@rose.hp.com
> >> Cc:    someshg@yahoo.com; steph@cs.uchicago.edu; hufferd@us.ibm.com;
> >> cbm@rose.hp.com; ldalleore@snapserver.com; Venkat Rangan;
> >> Black_David@emc.com
> >> Subject:    Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
> >>
> >>
> >>
> >> Mallikarjun,
> >>
> >> I commiserate with you at the lack of ack for data but the Orlando
> meeting
> >> stated - no.  Recall that I kept the number only as a mechanism to
> detect
> >> missing packets.
> >>
> >> You can achieve the effect you want by keeping around data for a while
> >> (you
> >> determine how long and then discard).
> >>
> >> If a SACK comes and you can recover - fine. If not you either reaccess
> the
> >> media (if you know how) or reject
> >> and let the initiator retry.
> >>
> >> You should not worry about R/W conflicts as programs bound to have such
> >> conflicts either:
> >>
> >> 1)can live with them or
> >> 2)protect themselves through some locks and rely on
> "operation-end-status"
> >> to keep results deterministic.
> >>
> >> Regards,
> >> Julo
> >>
> >>
> >>
> >> "Mallikarjun C." <cbm@rose.hp.com> on 27/03/2001 03:34:16
> >>
> >> Please respond to cbm@rose.hp.com
> >>
> >> To:   cbm@rose.hp.com, someshg@yahoo.com, steph@cs.uchicago.edu, Julian
> >>       Satran/Haifa/IBM@IBMIL, John Hufferd/San Jose/IBM@IBMUS
> >> cc:   Black_David@emc.com
> >> Subject:  iSCSI ERT: data SACK/replay buffer/"semi-transport"
> >>
> >>
> >>
> >>
> >> Hi Error Recovery Team,
> >>
> >> iSCSI can discard PDUs because of digest errors and request
> >> retransmissions using the iSCSI data SACK.  To deal with such
> >> an eventuality, targets that want to support data SACK have
> >> the following options:
> >>
> >> (A) maintain a complete "replay" buffer for the entire I/O since
> >>   a SACK could come anytime before the status is ack'ed by the
> >>   initiator. [ simple, but extremely expensive in memory resources]
> >>
> >> (B) (re-introduce data-ACKs into the draft, and) implement data-ACKs.
> >>   Thus enables keeping only those I/O buffers that haven't been ack'ed
> >>   by the initiator. IOW, become a real full transport! [ everyone
> disliked
> >>   it earlier...]
> >>
> >> (C) re-access the medium for data retransmission requests.  Now there
> >>   are 3 sub-cases in this to handle the changed data on the medium in a
> >>   write-after-read scenario.  (SEE NOTE.1 at the bottom on how it is
> >> legal.)
> >>      (1) On seeing any write, stall till status is ack'ed for all the
> >>             previous reads (basically drain the pipe). [simple, but
> incurs
> >>             an additional roundtrip delay for all writes].
> >>      (2) A variation of the above, keep an eye only on the prior
> >>             overlapping reads. [more BW efficient, but complicated to
> >>             resolve the block dependencies in a stream of
> reads followed
> >>             by writes]
> >>         (3) Document the caveat and leave it upto the applications
> >>             to avoid this case since this leads to data integrity
> issues.
> >>             [pushing to apps since the transport can't get it right!]
> >>
> >> My first preference is (B), followed by (A), and I suggest we not go
> >> to (C) at all with its inherent dangers.
> >>
> >> Doing (B) naturally completes the transport job that iSCSI has taken
> >> on itself in view of TCP's claimed unreliable checksum.  That is the
> >> right thing to do architecturally instead of being a "semi-transport"!
> >>
> >> Comments?
> >> --
> >> Mallikarjun
> >>
> >>
> >> Mallikarjun Chadalapaka
> >> Networked Storage Architecture
> >> Network Storage Solutions Organization
> >> MS 5668   Hewlett-Packard, Roseville.
> >> cbm@rose.hp.com
> >>
> >>
> __________________________________________________________________________
> >> Note.1: A Read followed by a Write (to the same blocks) is perfectly
> legal
> >>         if SCSI sets the ORDERED task attribute on both the
> commands AND
> >>         sets the NACA bit to one to indicate that Write shall be
> executed
> >>         only if the Read did not fail (result in a Check Condition).
> >>
> >>         In the current case, since Read completed just fine from SCSI's
> >>         point of view, SCSI is moving on to execute Write.  Those read
> >> buffers
> >>         had been freed up since iSCSI received an ACK at the TCP level,
> >> and
> >>         since iSCSI has no other way to have the data ack'ed!
> >>
> >>
> >>
> >>
> >
>
>
>
>


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Prev by Date: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Next by Date: Re: Unsolicited Data Questions
Prev by thread: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Next by thread: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:05:13 2001
6315 messages in chronological order