RE: effect of initializing CRC reg to 1's depends on implementati on? iSCSI

To: "'Mark Bakke'" <mbakke@cisco.com>
Subject: RE: effect of initializing CRC reg to 1's depends on implementati on? iSCSI
From: "CAVANNA,VICENTE V (A-Roseville,ex1)" <vince_cavanna@agilent.com>
Date: Mon, 17 Dec 2001 11:05:39 -0700
Cc: Paul Koning <ni1d@arrl.net>, "CAVANNA,VICENTE V (A-Roseville,ex1)" <vince_cavanna@agilent.com>, ips@ece.cmu.edu, Luben Tuikov <ltuikov@yahoo.com>, "THALER,PAT (A-Roseville,ex1)" <pat_thaler@agilent.com>, "SHEEHY,DAVE (A-Americas,unix1)" <dave_sheehy@agilent.com>
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
Mark,

I have to admit the iSCSI spec is _not_ ambiguous but only by virtue of
having provided examples whose results can only be obtained by a circuit for
which initializing to 1s is equivalent to initializing to 0s and XORing 1s
with the most significant 32 bits of the message. 

I do not think we should rely on examples to serve as the specification. For
this reason I recommend that we modify the spec in one of 3 ways:

1. explicitly show the reference implementation- what I call the
multiply-divide implementation and which I have sent to the reflector. It is
basically the ethernet reference implementation modified for the iSCSI poly.
2. use Paul's description
3. use Luben's formal description

Please note that what I have been calling the multiply-divide implementation
(a serial implementation) does not restrict you to processing one bit at a
time. The circuit can be easily modified to process n bits at a time without
changing its properties. What I have been calling the divide-only serial
implementation also does not restrict you to processing one bit at a time.
This circuit accomplishes the same division (you have to pre-multiply the
message by x^32) externally but this circuit (and all its parallel variants)
responds differently to initial conditions than the multiply-divide
implementaion.

See my inline comments below for more details.

Vince
|-----Original Message-----
|From: Mark Bakke [mailto:mbakke@cisco.com]
|Sent: Monday, December 17, 2001 6:55 AM
|To: Luben Tuikov
|Cc: Paul Koning; VICENTE V CAVANNA; ips@ece.cmu.edu
|Subject: Re: effect of initializing CRC reg to 1's depends on
|implementati on? iSCSI
|
|
|This is becoming somewhat disturbing, given the comment about
|the Ethernet CRC.  The intent of the document is absolutely,
|positively to keep the CRC identical to Ethernet, with the
|exception of the generator, just as Paul had said.  I'm not
|enough of a mathematician to complain about the definition; if
|there's a good way to fix it so it says what we intend, we
|should do so.  We got around this problem somewhat by simply
|providing a set of examples.  If someone has an implementation
|that does not match the examples, they simply try a few
|variations until they get it right.  This has worked very well
|for everyone I have talked to who is implementing the CRC.
|
|Here is some text that I had suggested adding instead of (or 
|in addition to) the mathematical description a while ago:
|
|
|The generator polynomial selected is evaluated in [Castagnioli93].
|When using the CRC the CRC register must be initialized to all 1s
|(0xFFFFFFFF).  Input bytes are processed with bit 7 being the most

The point I have been trying to make is that initializing the register to 1s
does not necessarily result in the same process as described in the ethernet
spec. Initializing the register to 1s only results in the desired results
(i.e. agrees with the examples in the iSCSI spec) if you are using the
implementation that I have been calling the multiply-divide implementation.
This is the implementation that ethernet had in mind and iSCSI had in mind.
On the other hand this is not the only implementation, and in particular is
not hte implementation that Luben tried at first. Actually Luben tried the
mathematical version of the divide-only implementation. I have since tried
the actual implementation and that is what finally convinced me that the
iSCSI spec is still ambiguous.

|significant bit.  Before transmission, the CRC bits must be
|reflected (bit 31 swapped with bit 0, bit 30 swapped with bit 1,
|etc.), and complemented.  The CRC bytes must be transmitted in
|network order.  Padding bytes, when present, in a segment covered
|by a CRC, should be set to 0 and are included in the CRC.
|
|Running the CRC-32C generator on an input stream that includes a
|valid CRC-32C will result in the constant remainder 0x1c2d19ed,
|(without being reversed and complemented).
|
|
|The above, plus the examples, have seem to work out just fine
|for the implementors.  Getting the math right would be nice, but
|most of us implementors don't look at it anyway, and prefer the
|more practical description.  If we are going to include the math,
|it has to match the examples.

The only reason that the above has worked right for implementors is that
they all use the equivalent of a multiply-divide implementation as was
described in the ethernet spec but that has not been described in any manner
in iSCSI and that I have since recommended that we specify.

I have to admit that by giving examples the iSCSI spec has removed the
ambiguity since nobody will be able to get results that agree with the
examples _unless_ they use the multiply-divide implementation but why not
make it explicit and eliminate the need for implementors to have familiarity
with ethernet and its implementations. Please note that any parallel
implementations (that process 8 or 16 or 32 input bits at once) for ethernet
are equivalent to the serial implementation that is shown as reference in
the ethernet spec.

I can however implement the iSCSi (and ethernet) CRCs using what I call a
divide-only circuit and, if I initialize the circuit to 1's, and process the
examples I will not get the results in the iSCSI spec but I will still get
the magic constant at the receiver! What I will get instead are the same
results as if I had XORed the most significant 32 bits of the messages in
the examples with the magic constant and then processed them.

|
|Anyway, my guess is that is what you are discussing; it was just
|disturbing to think that anything but the Ethernet + new poly
|was being considered, since that has been agreed on for a long
|time, and is being widely implemented.
|
|--
|Mark
|
|
|Luben Tuikov wrote:
|> 
|> --- Paul Koning <ni1d@arrl.net> wrote:
|> > Excerpt of message (sent 14 December 2001) by Luben
|> > > This is actually equivalent to XORing the first
|> > > 32 bits of the message with the magic constant as Vince
|> > > and I have shown.
|> >
|> > Are you saying that you believe the intent of the current
|> > spec is NOT
|> > the same as the Ethernet CRC, i.e., complement the first
|> > 32 bits,
|> > multiply by x^32, then divide by G(x)?
|> 
|> In our profession we cannot talk about beliefs and
|> intentions, more so for documents, rfc, etc.
|> 
|> If you show the current description of the CRC of the draft
|> to a mathematician, they will object to the same thing I've
|> objected.
|> 
|> The reason is that they don't have the preconception about
|> what circuit is being used, and any such higher level
|> notions.
|> 
|> All they would have and all I had was long division in Z_2.
|> (Actually Z_2[x] since we deal with polynomials.)
|> 
|> Then I started from first principles of CRC, polydivision,
|> etc, etc. -- similarly to how Williams proceeds in his
|> paper, but my treatment was purely algebraic.
|> 
|> The same applies to anyone looking at a description of an
|> algorithm. (An ``algorithm'' has a precise definition in
|> Computer Science.)
|> 
|> In its current form the description of the algorithm to
|> compute the CRC in the iSCSI draft is not consistent, just
|> because there is an unspoken assumption that a SMD circuit
|> will be used. We cannot afford to do this in a definition
|> document, unless we refer explicitly to it. We cannot even
|> assume that a circuit will be used...
|> 
|> Please also note that the Ethernet spec CRC algorithm, is
|> an optimized version of a basic
|> prefix-postfix-initialize-the-
|> CRC-register-to-some-constant algoritm. BUT as SMD it
|> operates on NON-AUGMENTED messages. FOR THAT REASON you
|> cannot say that we have to multiply M(x) by x^32!!!
|> 
|> I.e. SMD algoritms like the one in the Ethernet spec,
|> operate on non-augmented messages, while simple Division
|> (D) algorithms operate on augmented messages.
|> 
|> Any D can be optimized to SMD, but the constant has to be
|> changed. Also, prefixing a message is equivalent to loading
|> the CRC register with that prefix (WLG), but this is not so
|> for SMD. (elaborated below)
|> 
|> Further more for any SMD there is an equivalent D,
|> including for the Ethernet spec SMD; and for any D there is
|> an equivalent SMD.
|> 
|> > If that is your interpretation, then we absolutely MUST
|> > fix the spec,
|> > I'm quite sure that the intent of the spec is as I wrote
|> > it, i.e.,
|> > Ethernet except for the generator.
|> 
|> Probably it is. But lets not hasten here.
|> We can further generalize the explanation of the algorithm.
|> 
|> > > If this is changed, then the message in its form:
|> > >   M'(x) = x^32 M(x) + x^(32+n) I(x)
|> > >
|> > > yields to better optimizations as Vince and I have
|> > shown.
|> > > (see his circuits and worksheet4.pdf, which I posted
|> > > yesterday)
|> > >
|> > > I.e. the message is augmented (mult by x^32, aka
|> > postfixed)
|> > > and prefixed by 32 1's (aka CRC=32 1's).
|> >
|> > But that is NOT what the spec currently says.  What you
|> > describe may
|> > be a fine CRC but it is not the one that's in the spec.
|> > Initializing
|> > the CRC register to all 1 is not the same as prefixing 32
|> > 1 bits onto
|> > the message.
|> 
|> It is, in a simple division, e.g. in D. This is clearly and
|> simply explained in the Williams paper. It is an equivalent
|> to a long division.
|> 
|> Now, in D, if I initialize the CRC to all 1's and then do
|> division of M(x) by G(x), so that I catch 0's at the
|> beginning of the message,
|> 
|> (which is equivalent to prefixing the message by 1's of the
|> number of the width of the CRC, and then do just simple
|> division,(CRC=0), since after so many clicks the CRC will
|> be loaded with the prefixing bits, and also 0/G(x) = 0)
|> 
|> then I can find a constant which I can XOR to the first 32
|> bits of the message and then perform the division, still in
|> D, with the same effect. In this particular case that
|> constant is the magic constant. (We are one step closer to
|> SMD...)
|> 
|> (Now we jump to SMD, by means of worksheet4.pdf, Prefixing
|> part.)
|> 
|> Now, in the Ethernet CRC spec, that constant is 32 1's, and
|> XORing 32 1's with the 32 MSb of the message is equivalent
|> to complementing them.
|> 
|> BUT the Ethernet spec uses SMD, not D, so I claim that
|> there is an equivalent D, which for a suitable initial CRC
|> (also equivalent to prefixing the message with it) value
|> will yield IDENTICAL results as Ethernet SMD.
|> 
|> Now using a bit of algebra similar to worksheet4.pdf which
|> I posted a couple of days ago and numerical methods
|> (solving a linear system of 33 unknowns) I have found such
|> a constant:
|> 0x2a26f826 which in simple division, D, will yield results
|> identical to SMD of Ethernet.
|> 
|> So, here is an abstractization of the Ethernet CRC SMD,
|> explained in a simple, simple, simple division only way:
|> 
|> TX:
|> 1. Mutliply the message, M(x), by x^32.
|> 2. Prefix the result of 1. with 0x2a26f826.
|> 3. Find the remainder of the result of 2.
|> 4. Complement that remainder and add it to the
|>    result of 1.
|> 5. Send the result of 4. to the recipient.
|> 
|> RX: (same as steps 1-3 in TX)
|> 1. Mutliply the message by x^32.
|> 2. Prefix the result of 1. with 0x2a26f826.
|> 3. Find the remainder of the result of 2.
|> 
|> The result of 3 should be 0x1c2d19ed (the magic constant).
|> 
|> Of course I've ommitted the mirroring of bytes for clarity
|> and brevity. It can be mentioned on the side, e.g. How to
|> build the message: mirror the bytes == to swapping the bits
|> inside the bytes, ....
|> 
|> Also note that step 4 in TX, adding means exactly that,
|> since we already mult. by x^32, so the remainder will
|> nicely fit in the last 32 0 bits.
|> 
|> Also note that step 2, can be made clearer:
|> let C(x), be the polynomial representation of 0x2a26f826.
|> Let n = deg G(x), k-1 = deg M(x) (there are k bits), then
|> step 2 is:
|> 
|> 2. Add x^n x^k C(x) to the message M(x).
|> 
|> Or we can swap step 1 and 2:
|> 
|> 1. Add x^k C(x) to M(x).
|> 2. Multiply the result of 1. by x^n.
|> 
|> I hope this makes things a bit clearer.
|> 
|> So from this D specification above, I can derive SMD
|> exaclty as it is in the Ethernet spec. I'll include this
|> derivation in the paper -- it will be pure math, but in
|> english it goes like this: First we notice that after 32
|> clicks the CRC register will contain only 32 1's XOR-ed
|> with the message, then we see that we can just load the crc
|> with ones and then kick one byte off the left xor it with
|> the next message bit...
|> 
|> The above equivalence I've tested empirically.
|> 
|> > What does "better optimizations" mean?
|> 
|> See above.
|> 
|> > > In an SMD, one would have to set the CRC to the magic
|> > > constant and then proceed as the algorithm indicates,
|> > > (this is identical to what you'd find in Ethernet, ...
|> >
|> > No it isn't.  The Ethernet spec appendix C, and, more
|> > importantly, the
|> > formal definition of the CRC in the body of the spec,
|> > makes it quite
|> > clear that it isn't.  I don't understand how you arrive
|> > at any other
|> > conclusion.
|> 
|> I didn't mean identical as in the constant. I meant
|> identical in a way of equivalence of algorithms, that one
|> can be derived from the other, etc. See above. Or simpler
|> answer: Math.
|> 
|> -l
|> 
|> =====
|> --
|> 
|> __________________________________________________
|> Do You Yahoo!?
|> Check out Yahoo! Shopping and Yahoo! Auctions for all of
|> your unique holiday gifts! Buy at http://shopping.yahoo.com
|> or bid at http://auctions.yahoo.com
|
|-- 
|Mark A. Bakke
|Cisco Systems
|mbakke@cisco.com
|763.398.1054
|
Follow-Ups:
- RE: effect of initializing CRC reg to 1's depends on implementati on? iSCSI
  - From: Luben Tuikov <ltuikov@yahoo.com>
Prev by Date: RE: Wot I `know' about COWS in hardware
Next by Date: Outboard Tunnel Mode
Prev by thread: RE: Wot I `know' about COWS in hardware
Next by thread: RE: effect of initializing CRC reg to 1's depends on implementati on? iSCSI
Index(es):
- Date
- Thread
Home
Last updated: Wed Dec 19 18:17:43 2001
8151 messages in chronological order