SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: RE: Framing Discussion



    [ stuff deleted ]
    
    >> Let's suppose that we have an iSCSI TCP connection in which we
    >> have multiple outstanding I/O's. Thus, the byte stream has
    >> interleaved within it commands and data from different I/O's.
    >> When we detect a dropped segment either through normal TCP congestion
    >> or via SACK, how do we map the missing byte block to the
    >> appropriate context? If we keep the segments around, then we
    >> could match the missing segment easily and re-transmit. But that would
    >> require the NIC to implement a BWDP's worth of transmit buffer memory.
    >>
    >> To have the iSCSI TOE re-transmit directly from the buffer cache, it
    seems
    >> that we would need some sort of context that would allow us to map a byte
    >> window to a specific, meaningful point somewhere in the middle of a CDB
    >> context. Essentially, you need enough context to be able to re-construct
    >> the TCP fifo since the memory in this fifo has since been effectively
    >> re-allocated. Maybe this isn't too hard, but it sure sounds like
    >> a difficult problem for hardware to solve. But, as the software
    >> folks around here keep telling me, "it's just gates" ;-)
    >
    > Yes, multiple I/O's and interleaved data streams require a context manager
    > who maps the missing segment back to its large exchange table to determine
    > how to retransmit the dropped segment.  No, Wayland, I would not do it in
    > hardware.  It is all in microcode.  The microcode size is actually not
    that
    > big. On the contrary, the exchange table can be a few hundred KB's.  All
    you
    > need is a very very fast microengine with small number of gates, a true
    > RISC.  Please keep asking the "dumb" questions.  I am mostly impressed by
    > your questions.
    >
    Thanks for the reply. You'll have to let me know which questions don't
    impress you ;-)
    
    Yes, I am assuming that the re-transmit process will be handled in
    firmware/micro-code. It's still gates though, they just happen to be in
    someone's uP core ;-) It still seems like a tough problem in the general
    case. 
    
    Let's assume a worst case scenario. The iSCSI PDU size is greater than the
    TCP MSS and the network MTU and you are talking to a firewall that is
    re-packaging your TCP stream. No matter how much you try to send nicely
    aligned PDU's, the firewall is going to take your less than MSS size TCP
    segments and package them up so that you get full-size TCP segments by the
    time it hits the target. The target detects a missing segment and keeps the
    left edge of the window constant for three consecutive ACK's. Furthermore,
    we are using the SACK option in TCP to optimize our performance over LFN's.
    Thus, we are presented with the exact blocks that are missing.
    Unfortunately, these missing blocks have fragments of PDU's from different
    I/O's (could be command, could be data). Even worse, since we chose a PDU
    size greater than MSS, some segments might be part of a PDU that does not
    contain an iSCSI header and does contains a digest covering the entire PDU. 
    
    Yeesh!! Thank goodness all I have to do is drop-in an embedded processor
    into our chip. I'll let the firmware folks deal with this problem.
    Certainly, this path does not have to be high-performance since we are going
    into congestion control anyway, but we have to deal with it. We can keep a
    context stack for the current open TCP sessions which contain mappings of
    TCP sequence numbers to specific CDB context locations (either command or
    offset within the gather list). We can keep this stack as deep as the
    maximum number of outstanding (i.e. not ack'd) TCP segments which for
    Randy's example (1.25Gbs and RTT of 100ms) is not too bad (about 8K
    entries). We can recover the contexts needed to re-build this missing TCP
    segment and re-construct entire PDU(s) so that any necessary digests can be
    re-calculated. We can then stage this data in memory somewhere and pull-out
    the exact TCP block that we need to re-send. Lovely.
    
    I'm not saying it's impossible, but I am saying that implementing Fibre
    Channel looks like a walk in the park compared to this stuff.
    
    BTW, regarding the current iSCSI draft. I didn't see a Login/Text key
    associated with negotiating the iSCSI PDU size. Is it assumed that an iSCSI
    implementation should handle any PDU size?
    
    > Y.P. Cheng, Connectom Solutions.
    >
    -Wayland
    


Home

Last updated: Tue Sep 04 01:06:01 2001
6315 messages in chronological order