SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Towards a more effective PDU format



    
    	I rather like the new header format proposed here, although
    I would prefer to see a separate digest for the AHS
    segments. I like that the new format allows the next BHS to
    be found if anything except the current BHS is damaged,
    which is not the case with the current format since a
    damaged AHS length causes loss of sync. If a BHS is damaged
    I think recovery is going to be expensive and including more
    bytes in the digest covering the BHS increases the
    likelihood of having to perform that recovery, even though
    the BHS itself might not be damaged. How expensive is it to
    add the second digest covering all the AHS segments? I'm
    guessing that the second digest could be calculated using
    the first as a starting point. I don't see a down side of
    mandating that any AHS digest will be of the same type as
    the BHS digest.
    
    	On a related point, if a BHS is damaged I would prefer to
    see the connection immediately dumped rather than any
    re-sync activity performed. I don't want to get into the
    debate about the merits versus risks of re-syncing but I
    think the fact that it has been talked about for so long
    shows that the risks are not obvious. Either way I think
    there is a more fundamental reason to prefer connection
    dumping. In order to re-sync there must be a data stream, if
    a BHS is damaged during a period of light activity there may
    be a significant delay before any more data is sent. In the
    limit the next PDU may be a retry of the damaged PDU by the
    iSCSI or host SCSI layers. Those timeouts are going to be in
    seconds or more likely tens of seconds, whereas dumping the
    connection will immediately cause the recovery to begin. The
    time to make a new connection will be significantly shorter
    than the worst case re-sync time. Further, unless we can
    guarantee that re-syncing will always succeed there will
    have to be a point where we give up and dump the connection
    anyway. Also, even if we do manage to re-sync we still incur
    a timeout for the damaged PDU to be resent. Again, if the
    connection is dumped that timeout doesn't need to happen.
    
    	I should point out that when I say dumped here I mean a
    graceful TCP close.
    
    	- Rod Harrison
    
    -----Original Message-----
    From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On
    Behalf Of
    Robert D. Russell
    Sent: Thursday, March 15, 2001 10:06 PM
    To: julian_satran@il.ibm.com
    Cc: ips@ece.cmu.edu
    Subject: Re: iSCSI: Towards a more effective PDU format
    
    
    Julo:
    
    Even if we give up on the idea of having a second header
    digest, I still
    believe that some changes should to be made to the PDU
    Format proposed in
    version 5 to make it simpler, more efficient to process, and
    more robust.
    
    I would therefore propose eliminating the use of the current
    WN Next-Qualifier
    scheme and replacing it as follows (this is just a slight
    modification of my
    earlier proposal in order to eliminate the second header
    digest).
    
    1. Every PDU must start with a 48-byte Basic Header (BHS)
    that can be read
       in a single "read" operation.  The format of this is
    identical to that
       proposed in section 2.2.4 of version 5, but with the
    addition of 2
       fields (4-bytes) as described next.
    
    2. Every BHS contains 2 fields at fixed locations:
    
       a) AHS_length, containing the total length of all
    Additional Header
          Segments (AHS) that follow.  This field is 0 if there
    are no AHSs.
    
       b) DATA_length, containing the total length of all data
    that follows
          the AHSs.  This field is 0 if there is no data.
    
    3. If the AHS_length field in the BHS is non-zero, all the
    AHSs immediately
       follow the BHS in the PDU.  The AHS_length value gives
    the total number of
       bytes in all the AHSs, allowing a single "read" operation
    to read them all.
       (The total header consists of 48+AHS_length bytes.)
    
    4. If header digests are in use, one header digest covering
    the BHS and
       all the AHSs (if any) immediately follows the last AHS.
    This digest
       does NOT require a separate read, since the first read
    (of the BHS)
       should be for 48+(length-of-header-digest) bytes.  If the
    AHS_length
       field is zero, there are no more reads.  If the
    AHS_length field is
       non-zero, exactly that many more bytes need to be read in
    a second read.
       These additional bytes are appended to those obtained in
    the first read.
       The header digest will always be the last
    (length-of-header-digest) bytes.
    
    4. If the DATA-length field in the BHS is non-zero, all the
    data
       immediately follows the header digest.  If data digests
    are in use,
       a data digest immediately follows the data.
    
    5. To save space within the BHS, the AHS-length field can be
    restricted to a
       single byte, and can be in units of 4-byte words.  This
    allows up to 1020
       bytes of AHSs to follow a BHS, which seems to be more
    than enough for the
       uses foreseen (so far, only extended CDB and bidi-read
    info).  Currently,
       byte 2 is unused (reserved) in all PDU types except Login
    and Login
       response, where byte 6 is unused.  Therefore, the
    AHS-length field can
       be added without increasing the BHS size of 48 bytes.
    
    6. Each AHS should start with a word containing a TYPE field
    and a
       LENGTH field.  The TYPE field should be enumerated rather
    than
       bit-field encoded, for easier decoding and future
    expansion.
       The LENGTH field is the number of bytes of additional
    information
       in this AHS that follows this word.
    
    7. The DATA_length field should be a 4-byte field added to
    the 44-byte
       BHS of the current section 2.2.4 to give a new BHS of
    48-bytes.
       However, since the version 5 44-byte header is always
    preceded by
       a 4-byte WN Next-Qualifier that would no longer be
    needed, there is
       no net change in the effective BHS size of 48-bytes.
    
    
    Advantages of this proposal over the current WN
    Next-Qualifer of version 5.
    
    1. Because the receiver gets the AHS-length field from the
    BHS, it can obtain
       the entire set of ALL AHSs that follow the BHS in a
    single "read" operation
       (the current WN Next-Qualifier scheme requires a separate
    "read" for EACH
       additional header segment after the BHS).
    
    2. By limiting the total size of all AHSs to 1020 bytes, a
    receiver can
       preallocate a fixed-length buffer of (48 + 1020 +
    size-of-header_digest)
       bytes for header processing (the current WN
    Next-Qualifier scheme has
       no limit on either total header size nor individual AHS
    sizes).
    
    3. Since each AHS begins with a word containing the TYPE and
    LENGTH in
       fixed positions, an unknown TYPE (i.e., a type introduced
    in the
       future that is received by a legacy receiver) does NOT
    result in
       loss of synchronization during header processing -- the
    receiver
       knows the length of this AHS in any case, and can just
    skip over it
       (the current WN Next-Qualifier type determines how the
    length field
       is to be interpreted -- there is no "general rule", so
    unknown types
       mean loss of synchronization).
    
    4. The format of the AHS is the familiar Type/Length/Value
    (T/L/V) structure
       (the current WN Next-Qualifier is T+1/L+1/V, but this
    does not seem to
       provide any benefit and only adds confusion and
    complexity -- see e-mails
       from David Black and Barry Reinhold).
    
    5. The AHS TYPE is enumerated (the current WN Next-Qualifier
    is bit encoded,
       again without seeming to provide any benefit -- see
    e-mail from David Black).
    
    6. Because the BHS always contains a 4-byte DATA_length
    field, the maximum data
       segment size is 4 Gigabytes (the current WN
    Next-Qualifier scheme, with
       the long data header removed, limits the data segment
    size to 16 Megabytes).
    
    7. The "Header Digest Present" and "Data Digest Present"
    bits have been
       completely eliminated (the purpose of these in the
    current WN Next-Qualifier
       scheme was never explained, but they appear to be
    useless -- the presence
       or absence of digests has to be negotiated, and once
    negotiated, all
       PDUs must obey the negotiated decision).
    
    
    I see no disadvantages of this proposal relative to the
    current WN scheme.
    
    
    Since there is only one header digest, both schemes suffer
    the disadvantage
    of using unreliable data to find the header digest, which
    can lead to
    unnecessary blocking on "reads".
    
    
    Proposed iSCSI PDU Format
    
    
         +------------------------+
         |     required BHS       | > fixed length of 48 bytes
         +------------------------+
         |     optional AHS 1     |\
         | - - - - - - - - - - -  | \
         |     optional AHS 2     |  \
         | - - - - - - - - - - -  |   > total length in
    AHS_length field in BHS
         |        . . . .         |  /
         | - - - - - - - - - - -  | /
         |     optional AHS n     |/
         +------------------------+
         | optional header digest | -- covers preceding (48 +
    AHS_length) bytes
         +------------------------+
         |                        |\
         |     optional data      | > total length in
    DATA_length field in BHS
         |                        |/
         +------------------------+
         |  optional data digest  | -- covers preceding
    (DATA_length) bytes
         +------------------------+
    
    
    Thanks,
    
    Bob Russell
    InterOperability Lab
    University of New Hampshire
    rdr@iol.unh.edu
    603-862-3774
    
    
    On Tue, 13 Mar 2001 julian_satran@il.ibm.com wrote:
    
    
    >
    >
    > Bob,
    >
    > Interesting.  This is close to one of the variants I had
    for IETF-50 (a
    > clear header).
    > The only adavantage it has is that you are able to read
    all the AHSs in one
    > read.
    > The (admitedly academic) disadvantage it has is that you
    are limited in the
    > size of extensions and have some redundancy.
    > The basic issue that was raised - and there is no simple
    way out - is that
    > once you have lost a block (BHS in your suggested layout)
    you are out of
    > synch.  There is no simple way around it (at least not one
    that can be
    > solved by changing layout) and an added digest (only the
    code or silicon to
    > account for it) is IMHO not warranted by what you gain.
    If you are willing
    > to spend silicon or code on this the making header
    failures less probable
    > and recovering if you fail by dropping the connection is a
    better bet
    > (using the redundancy for a coding gain).  But this
    involves some
    > complexity too.
    >
    > Julo
    >
    
    


Home

Last updated: Tue Sep 04 01:05:18 2001
6315 messages in chronological order