RE: Requirements specification

To: ips@ece.cmu.edu
Subject: RE: Requirements specification
From: julian_satran@il.ibm.com
Date: Tue, 8 Aug 2000 09:08:23 +0300
Content-Disposition: inline
Content-type: text/plain; charset=us-ascii
Sender: owner-ips@ece.cmu.edu


Doug,

Again the current protocol allows you to do what you want - e.g., build a
"virtual target"/LU
if this is the way you think things will scale.

The paradox of decreased performance per drive due to the increase in
recording density
is not lost to the storage industry and the major techniques through which
it attempts to mitigate it are caching and striping.

The numbers you quote are pure drive numbers. For the drive-to-controller
cache you might use a "lightweight iSCSI" (software only) or some other
mechanism.

From controller to host - once you use one of the boosting techniques
(caching, stripping) you will need fast channels.  The protocol looks is
very simple (multiplexing LU is just
another field).  You can use it also with a initiator-LU scheme but if we
settle for this design
we can't use it in larger controllers.

Regards,
Julo

"Douglas Otis" <dotis@sanlight.net> on 07/08/2000 19:14:30

Please respond to "Douglas Otis" <dotis@sanlight.net>

To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
cc:
Subject:  RE: Requirements specification




Julo,

As your architecture is based on using a controller to aggregate data
rather
than a switch, you are making choices detrimental to an architecture that
brings the interface closer to the device.  This is reflected in choices
for
configuration, authentication and protocol.  Your architecture is very
close
to a Fibre-Channel gateway.  Alter solicitation within Fibre-Channel, and
there is not a significant difference (Spoofing solicitations within the
gateway would be one means).  As an example, you require data successfully
delivered be retained for possible later solicitation with a controller
based error recovery.  Two means of delivering data and error recovery is
just one example of added complexity due to an inability to scale data
handling.

Although read-channels and data densities improve at a steady pace, as they
have for the past quarter century, mechanics of the drives have not.
Today's drives can deliver 320 Mbits/second of data on the outside
cylinders.  The physical size of the drive in conjunction with the number
of
heads and disks all have substantial impact in a competitive market with
respect to power and cost.  The cost/volume trend takes us to a single
larger disk which paradoxically increases access time as read channel data
rates increase.  You optimize to take advantage of the burst performance of
the read channel with added complexities attempting to time or stage such
transfers through your architectural restrictions where the device becomes
part of this fabric.

Is it logical to design a system where everything is aimed at taking
advantage of the high momentary data rate offered by the read channel, or
by
offering the same throughput using more devices where each interface
bandwidth is 'restricted' with respect to these read channel data rates?
The advantage of such an approach is found with respect to smaller random
traffic.  With more devices, redundancy is easily achieved and parallel
access offers a means of performance improvement by spreading activity over
more devices.  In this case, the switch provides bandwidth aggregation and
each device would only see their traffic, but the client could see the
traffic of hundreds of these devices.  Regardless of the nature of the
traffic, performance would be more uniform and control could be left at the
client.

An 8ms access + latency figure in the high cost drives restricts the number
of 'independent' operations that average 64k byte to 100 per second or 52
Mbit per second.  Forgoing the peak data rate, such an architecture of
'restricted' drives would scale whereas the controller based approach does
not and is vulnerable.  An independent nexus at the LUN is the only design
that offers the required scaling and configuration flexibility.   Switch
and
client aggregation makes sense in cost, performance, capacity, reliability,
and scalability.  Protocol overhead should be addressed in the protocol
itself and not by means of controller aggregation.  There are substantial
improvements to be made in the protocol area without the use of intervening
controllers.

Doug

-----Original Message-----
From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
julian_satran@il.ibm.com
Sent: Saturday, August 05, 2000 4:46 AM
To: ips@ece.cmu.edu
Subject: RE: Requirements specification




Doug,

The current architecture is good for the whole spectrum.
If you are intent on using it for a disk drive you can do so and fill with
0
the fields you are not interested in. You don't have to implement the
functions that
are intended for controllers.

The controller/drive scaling controversy is certainly outside the scope of
iSCSI.


Regards,
Julo

"Douglas Otis" <dotis@sanlight.net> on 04/08/2000 19:20:40

Please respond to "Douglas Otis" <dotis@sanlight.net>

To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
cc:
Subject:  RE: Requirements specification




Julo,

You comments are based on several assumptions reflecting your present
architecture.  Your implementation is done at the controller rather than a
device.  You also assume authentication is done at the controller.  Each
LUN
could belong to a different authority and be an independent (virtual)
device
managed through LDAP.  If you bring the interface to the device, you can
obtain the required scaling that is otherwise difficult at the controller
as
with your architecture.  By combining everything into a single connection,
you do not improve reliability, scalability, availability or fault
tolerance.

Doug
-----Original Message-----
From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
julian_satran@il.ibm.com
Sent: Thursday, August 03, 2000 7:37 PM
To: ips@ece.cmu.edu
Subject: Re: Requirements specification




David,

The one additional requirement is availability/fault-tolerance.

Your arguments about performance are valid. However I doubt that there will
be enough incentives - beyond price - to develop things for high end
controllers and
servers.

Enabling multiple connections brings those applications the performance
required
without any serious implications to the rest of the "family" (as I outlined
in Pittsburgh
controllers and servers that don't need multiple connections/session don't
have to implement them).

Storage traffic requirements will always exceed those of many other
applications.

As for the "one-connection-per-LU" we covered this solution in long
discussions
and even several full fledged implementation - as it is compelingly simple.
However the resource consumption is unjustifiably high and the security
problems are
even worse (the LUs "viewed" by an initiator depend on who he says he is)
than
in the current draft.

Regards,
Julo



David Robinson <David.Robinson@EBay.Sun.COM> on 04/08/2000 02:43:11

Please respond to David Robinson <David.Robinson@EBay.Sun.COM>

To:   ips@ece.cmu.edu
cc:    (bcc: Julian Satran/Haifa/IBM)
Subject:  Requirements specification




To further elaborate on my comments in Pittsburgh on multiple
connections per link and connections per LUN vs per target.

The current requirements specify that the protocol must support
multiple connections per session.  So far the only justification
for this that I have clearly heard is performance, current and future
systems will demand bandwidth that will require aggregation. Is there
any other reason for multiple connections?

My challenge to this requirement is that it is fundementally a link
and transport layer issue that is being exposed to the session layer
due to a perception that current link/transport implementations are not
adequate to meet perceived demand.  The key question here is if this
is a "physics" issue that can't be solved with better implementations
or just bad implementations? I am leaning towards the latter. I expect
that if this protocol is a success, a number of highly tuned adapters
using tricks such as hardware assist will be developed.  Those doing
the development will have direct control over the quality of the
implementation.  Furthermore, the performance critical environments
are likely to be local in nature so preassure to create necessary
switches and routers will also exist.

The advantages of limiting a single connection per session should be
a simplification in the connection management and error handling.  From
the earliest drafts we have already seen restrictions of individual
command/data/status sequences to a single connection to better handle
ordering issues. I forsee further restrictions possibly being
required to cover handling of lost connections when sequences are
received out of across multiple connections. Similarily Steve's
comments on security management of multiple connections is of concern.

The second area that I brought up was the requirement of one session
per initiator target pair instead of one per LUN (i.e. SEP). I am willing
to accept the design constraint that a single target must address
10,000 LUNs which can be done with a connection per LUN. However,
statements of scaling much higher into the areas where 64K port
limitations appear I think is not reasonable.  Given the bandwidth
available on today's and near future drives that will easily
exceed 100MBps I can't imagine designing and deploying storage systems
with over 10,000 LUNs but only one network adapter.  Even with 10+ Gbps
networks this will be a horrible throughput bottleneck that will
get worse as storage adapters appear to be gaining bandwidth faster than
networks. Therefore requiring greater than 10,000 doesn't seem necessary.

>From the performance perspective, a connection per LUN also makes sense.
SCSI command flows are already being constrained to a single connection
in the current proposal for ordering reasons, so the number of
concurrent outstanding requests per LUN is a manageable number. The
concurrency desired by multiple connections per session in the
existing draft will naturally occur with a connection per LUN.  As
each TCP connection is a unique flow existing link layer hardware
that tries to preserve ordering based on a "flow" (likely IP/port pairs)
will give the desired performance properties. Both my objections and
the requirements for multiple connections I question above become moot.

>From a connection management, command ordering, and error recover
perspective things should also get simplier.  Ordering is obviously
maintained and the sender can now recover from connection errors
based on a smaller context and possibly use TCP layer information
to determine what responses were received (ACK windows?).

To summarize I would like to see the requirements changed to reflect
a maximum of 64K LUNs per IP node, require only one transport layer
connection per session, and define a session to be an initiator/LUN
pair.

     -David
Prev by Date: No Subject
Next by Date: RE: Multiple connections & design complexity
Prev by thread: RE: Requirements specification
Next by thread: Re: Requirements specification
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:07:56 2001
6315 messages in chronological order