|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: Flow Control
At 04:56 PM 9/21/00 -0700, Jim McGrath wrote:
>While memory may be getting cheaper, latency and transfer rates are getting
>higher. We have gone from 25 m parallel SCSI buses to transcontinental
>TCP/IP connections; from 1 MB/s to 100 MB/s (and greater) transfer rates.
>These combine to make the maximum amount of data in flight that keeps the
>connection full to be growing much faster than memory cost is declining.
>(Exponential growth rates are applied to both memory cost and transmission
>speed; distance also appears to be growing very fast, although perhaps not
>exponentially).
>
>So while your argument is works if you keep the fabric size the same and
>increase the transfer rate (as it has been with the ATA interface - buffer
>costs have declined over the years), it does not work if the fabric keeps on
>growing as well.
>
>If a fabric introduces 1 ms (two orders of magnitude less than the worse
>cases I have heard) at Gbit speed, then we need 100 Kbytes of buffer space
>for a connection. We don't have enough buffer to reserve this for all
>possible connections we could get (Fibre Channel designs could not reserve 4
>KByte for a smaller number of potential connections until recently).
Something to think about w.r.t. this problem:
RDMA semantics:
Pros:
- Sender only targets memory that it knows is available to use and
thus does not inject more data than what the receiver can use. This
mitigate the overflow problem.
- End-to-end ULP ACKs provide an implicit credit scheme for the
associated target resources.
Con:
- One must "slice" up the target resources among a set of senders
which can create scalability problems depending upon the resources required
per session. This is where SEND semantics have their advantages - one can
use statistical access to deal with burst with minimal buffer overflow
reserves and combine this with the idea described below.
- RDMA support requires additional buffer access / tracking logic
within the endnode to track the impacted memory. The semantics are not
difficult to implement but it is additional cost within the
implementation. Note: SEND semantics have DMA chain costs as well so the
actual delta in implementation will vary depending upon the amount of
resources one can effectively map /register at a given time.
- For small messages, RDMA does not always provide any cost/benefit
advantage which is why most implementations support SEND and RDMA semantics.
>Jim
>
>PS if we actually are starting to need windows greater than 64 KBytes, is
>this a problem? My understanding is that deployed TCP/IP products do not
>easily support extremely large windows. This argues for spreading a single
>SCSI command across multiple TCP/IP connections for pipelining to overcome
>latency, not for bandwidth.
Large window support is not difficult to implement and is supported in many
endnodes. However, memory even in large endnodes is still limited and
subject to oversubscription so if a link cannot replenish its buffers
quickly enough, it drops the incoming packet and the transport
retransmission / congestion management takes over and adjusts the injection
rate.
The question is whether one would like to implement a WRED (weight random
early detection - used today in routing elements) type of system within an
endnode (server, storage, etc.) whereby it would drop inbound packets when
resources are tight based on some criteria of the inbound packet (IP addr,
QoS, TCP port, etc.). This would allow the endnode to control which
services should have priority when the workload approaches / exceeds the
available buffer resources. This would also allow one to vary the amount
of "emergency" reserve buffers discussed by others without having to
communicate any of this end-to-end or specify it within the architecture
beyond the interface and drop value interpretation.
I believe there is value in creating the policy interfaces to communicate
whether a given connection has any special policies associated with it and
one of these policies can be where it is in the drop priority list when
circumstances warrant it. The actual policy would be outside of iSCSI (see
the previous e-mail discussions about QoS and policy from this summer for
other areas where a policy interface would have benefit) to keep iSCSI
opaque to the upper layer / application requirements.
Mike
Home Last updated: Tue Sep 04 01:07:07 2001 6315 messages in chronological order |