Re: iSCSI: Markers

To: "John Hufferd" <hufferd@us.ibm.com>, <ips@ece.cmu.edu>
Subject: Re: iSCSI: Markers
From: Stuart Cheshire <cheshire@apple.com>
Date: Fri, 11 Jan 2002 17:39:17 -0800
Content-Type: text/plain; charset="US-ASCII"
Sender: owner-ips@ece.cmu.edu
I'm new to this list, so I should introduce myself.

My name is Stuart Cheshire; I'm the author of Consistent Overhead Byte 
Stuffing (COBS), the framing technique from which COWS derives. I'm not 
working on any iSCSI product, but if COBS can contribute to iSCSI, then 
I'm happy to offer a little of my time, as much as I can spare, to help 
clarify what COWS does and does not do, to help people make an informed 
decision whether or not COWS is the right solution for iSCSI.

----

Assumption: A high-performance receiver is harder than a high-performance 
sender.

This is because the sender is in control. It knows where the data is 
coming from in memory, and where it is going to on the network. The 
sending host knows and can control all aspects of the communication: what 
order iSCSI messages are delivered onto the wire, how big each one is, 
and at what time they are sent. If the sender wants to do some kind of 
housekeeping that prevents it from sending packets for a few 
milliseconds, then it has the option of doing that without terrible 
consequences.

The receiver has a much harder time. It never knows what packet is going 
to arrive next, or how big it will be, or where it will be from, or where 
it will have to go to in memory. Packet loss/corruption/reordering makes 
things even more unpredictable. A receiver doesn't have the luxury of 
being able to not receive packets for a few milliseconds if it is busy 
with something else.

For this reason, it makes sense to see what the sender can do to make the 
receiver's life a little easier. If the receiver could receive each TCP 
segment and process it in isolation, determining where to place it in 
memory solely from information within that TCP segment, without reference 
to data from other TCP segments (which may not have arrived yet), then it 
would be easier to make a high-performance receiver.

What can we do to enable independent segment processing and idempotent 
direct data placement at the receiver?

My first choice would be to add a couple of extra bits to the TCP header; 
a "start of message" bit and an "end of message" bit. The "start of 
message" bit indicates that the first byte of TCP data in the segment is 
also the first byte of an iSCSI message; the "end of message" bit 
indicates that the last byte of TCP data in the segment is also the last 
byte of an iSCSI message. When a receiver receives a TCP segment with 
both bits set, it knows with certainty that it has one (or more) complete 
iSCSI messages in the TCP segment and can immediately decode enough of 
the iSCSI message header(s) to determine where in memory to place the 
data.

Unfortunately, adding extra bits to the TCP header is not viable. From a 
political point of view, trying to change the TCP on-the-wire protocol is 
a non-starter. From a practical point of view, there are too many routers 
and firewalls and similar devices that will throw away TCP packets with 
bits they don't understand.

Given that out-of-band framing using header bits is not possible, the 
alternative is in-band framing using only information in the TCP data 
stream itself.

If we can design our sender to normally send exactly one iSCSI message 
per TCP segment, and we have a way for our receiver to reliably verify 
that the received TCP segment contains exactly one iSCSI message, then 
the receiver can implement idempotent direct data placement for each TCP 
segment as it is received, without reference to state from previous TCP 
segments on that connection (which may not have arrived yet).

The problem left to solve is how the receiver can reliably verify that 
the received TCP segment contains exactly one iSCSI message. It can do 
this by checking to see whether the TCP segment data begins with some 
special marker pattern, as long as it knows that this special marker 
pattern cannot appear anywhere within the body of valid iSCSI message 
data. This necessarily entails processing ("stuffing") the body of the 
iSCSI message to eliminate inadvertent occurrences of the special marker 
pattern before sending, and then reversing this transformation to restore 
the original data after reception.

If the receiver finds that the segment does not begin with the special 
marker pattern, then it knows that the sender segmentation has not been 
maintained (or it is talking to an old TCP sender that doesn't support 
sender segmentation) and it has to fall back to treating the TCP data 
stream as a raw unstructured byte stream, with message boundaries 
indicated by occurrences of the the marker pattern. The important thing 
is that the receiver still works correctly, even though the performance 
will be lower.

This prefer-sender-segmentation-but-verify approach is important. If the 
outgoing data is not processed to guarantee that the special marker 
pattern cannot occur, then malicious users might be able to subvert the 
protocol by putting contrived patterns in their data. Remember the days 
where you could make a user's modem hang up by sending them an email 
containing the text "+++ATH"? (Apologies to anyone reading this via modem 
who just had their telephone line hang up.)

Another benefit of using in-band framing like this is that we can deploy 
it immediately using unmodified TCP stacks. In the future we can use 
enhanced sender TCP implementations that take steps to maintain segment 
boundaries, and smart receivers will get a performance boost from that, 
but it is a compatible upgrade that changes only the implementation, not 
the on-the-wire protocol.

Of course, we don't get anything for free. If we want to receiver to be 
able to determine with 100% certainty that it has received a complete 
iSCSI message in one TCP segment, then the sender will have to do some 
work to enable that. This is the cost of COWS. It gives 100% framing 
certainty, but at the cost of checking the outgoing data for inadvertent 
occurrences of the special marker pattern, and eliminating them. There's 
no way for a sender to tell whether the outgoing data contains 
inadvertent occurrences of the special marker pattern if the sender is 
not willing to look at the data.

On the plus side, the cost of COWS encoding is modest compared to some 
alternatives. COWS-encoding adds a little header but otherwise doesn't 
change the size of the outgoing data, ever. No matter how many 
occurrences of the framing marker pattern are found, the encoded output 
length is always exactly the same: the length of the input plus the 
length of the fixed-size framing header (typically two words). If the 
framing marker pattern is chosen to be something that is rare in normal 
(non-malicious) data, then in the common-case the encoding step will be a 
read-only operation: scan the data, determine that it contains no framing 
markers, set the COWS header to indicate that the data contains no 
framing markers, and send it.

In contrast, when using Fixed Interval Markers, if a marker happens to 
fall in the middle of the data you are sending, then it creates a 'hole' 
in the middle of data that used to be contiguous, and the block of 
outgoing data changes size. On the receiving side, the 'hole' created by 
the marker has to be repaired in the process of transferring the data 
into memory. When using Fixed Interval Markers, when a receiver gets a 
TCP segment that contains no marker, it cannot reliably determine what it 
is supposed to do with that segment (where to put it in memory) without 
referring the state from the previous TCP segments of that connection. I 
don't believe that FIM can provide efficient idempotent direct data 
placement for inbound TCP segments, because you can't rely on any given 
received segment containing a marker via which the receiver can verify 
that the segment contains a complete iSCSI message.

In summary:

My first choice would be to modify the TCP protocol to support 
preservation of upper-level message boundaries.

Given that this is not possible, I think COWS provdes a good alternative.

Stuart Cheshire <cheshire@apple.com>
 * Wizard Without Portfolio, Apple Computer
 * Chairman, IETF ZEROCONF
 * www.stuartcheshire.org
Follow-Ups:
- RE: iSCSI: Markers
  - From: "Somesh Gupta" <somesh_gupta@silverbacksystems.com>
Prev by Date: iSCSI TUF/PDU alignment - Procedural clarification
Next by Date: Minutes from FCIP concall 1/9/02
Prev by thread: RE: Re: iSCSI: Markers
Next by thread: RE: iSCSI: Markers
Index(es):
- Date
- Thread
Home
Last updated: Sat Jan 12 18:17:54 2002
8376 messages in chronological order