SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: Requirements specification



    
    
    Doug,
    
    Again the current protocol allows you to do what you want - e.g., build a
    "virtual target"/LU
    if this is the way you think things will scale.
    
    The paradox of decreased performance per drive due to the increase in
    recording density
    is not lost to the storage industry and the major techniques through which
    it attempts to mitigate it are caching and striping.
    
    The numbers you quote are pure drive numbers. For the drive-to-controller
    cache you might use a "lightweight iSCSI" (software only) or some other
    mechanism.
    
    From controller to host - once you use one of the boosting techniques
    (caching, stripping) you will need fast channels.  The protocol looks is
    very simple (multiplexing LU is just
    another field).  You can use it also with a initiator-LU scheme but if we
    settle for this design
    we can't use it in larger controllers.
    
    Regards,
    Julo
    
    "Douglas Otis" <dotis@sanlight.net> on 07/08/2000 19:14:30
    
    Please respond to "Douglas Otis" <dotis@sanlight.net>
    
    To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
    cc:
    Subject:  RE: Requirements specification
    
    
    
    
    Julo,
    
    As your architecture is based on using a controller to aggregate data
    rather
    than a switch, you are making choices detrimental to an architecture that
    brings the interface closer to the device.  This is reflected in choices
    for
    configuration, authentication and protocol.  Your architecture is very
    close
    to a Fibre-Channel gateway.  Alter solicitation within Fibre-Channel, and
    there is not a significant difference (Spoofing solicitations within the
    gateway would be one means).  As an example, you require data successfully
    delivered be retained for possible later solicitation with a controller
    based error recovery.  Two means of delivering data and error recovery is
    just one example of added complexity due to an inability to scale data
    handling.
    
    Although read-channels and data densities improve at a steady pace, as they
    have for the past quarter century, mechanics of the drives have not.
    Today's drives can deliver 320 Mbits/second of data on the outside
    cylinders.  The physical size of the drive in conjunction with the number
    of
    heads and disks all have substantial impact in a competitive market with
    respect to power and cost.  The cost/volume trend takes us to a single
    larger disk which paradoxically increases access time as read channel data
    rates increase.  You optimize to take advantage of the burst performance of
    the read channel with added complexities attempting to time or stage such
    transfers through your architectural restrictions where the device becomes
    part of this fabric.
    
    Is it logical to design a system where everything is aimed at taking
    advantage of the high momentary data rate offered by the read channel, or
    by
    offering the same throughput using more devices where each interface
    bandwidth is 'restricted' with respect to these read channel data rates?
    The advantage of such an approach is found with respect to smaller random
    traffic.  With more devices, redundancy is easily achieved and parallel
    access offers a means of performance improvement by spreading activity over
    more devices.  In this case, the switch provides bandwidth aggregation and
    each device would only see their traffic, but the client could see the
    traffic of hundreds of these devices.  Regardless of the nature of the
    traffic, performance would be more uniform and control could be left at the
    client.
    
    An 8ms access + latency figure in the high cost drives restricts the number
    of 'independent' operations that average 64k byte to 100 per second or 52
    Mbit per second.  Forgoing the peak data rate, such an architecture of
    'restricted' drives would scale whereas the controller based approach does
    not and is vulnerable.  An independent nexus at the LUN is the only design
    that offers the required scaling and configuration flexibility.   Switch
    and
    client aggregation makes sense in cost, performance, capacity, reliability,
    and scalability.  Protocol overhead should be addressed in the protocol
    itself and not by means of controller aggregation.  There are substantial
    improvements to be made in the protocol area without the use of intervening
    controllers.
    
    Doug
    
    -----Original Message-----
    From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
    julian_satran@il.ibm.com
    Sent: Saturday, August 05, 2000 4:46 AM
    To: ips@ece.cmu.edu
    Subject: RE: Requirements specification
    
    
    
    
    Doug,
    
    The current architecture is good for the whole spectrum.
    If you are intent on using it for a disk drive you can do so and fill with
    0
    the fields you are not interested in. You don't have to implement the
    functions that
    are intended for controllers.
    
    The controller/drive scaling controversy is certainly outside the scope of
    iSCSI.
    
    
    Regards,
    Julo
    
    "Douglas Otis" <dotis@sanlight.net> on 04/08/2000 19:20:40
    
    Please respond to "Douglas Otis" <dotis@sanlight.net>
    
    To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
    cc:
    Subject:  RE: Requirements specification
    
    
    
    
    Julo,
    
    You comments are based on several assumptions reflecting your present
    architecture.  Your implementation is done at the controller rather than a
    device.  You also assume authentication is done at the controller.  Each
    LUN
    could belong to a different authority and be an independent (virtual)
    device
    managed through LDAP.  If you bring the interface to the device, you can
    obtain the required scaling that is otherwise difficult at the controller
    as
    with your architecture.  By combining everything into a single connection,
    you do not improve reliability, scalability, availability or fault
    tolerance.
    
    Doug
    -----Original Message-----
    From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
    julian_satran@il.ibm.com
    Sent: Thursday, August 03, 2000 7:37 PM
    To: ips@ece.cmu.edu
    Subject: Re: Requirements specification
    
    
    
    
    David,
    
    The one additional requirement is availability/fault-tolerance.
    
    Your arguments about performance are valid. However I doubt that there will
    be enough incentives - beyond price - to develop things for high end
    controllers and
    servers.
    
    Enabling multiple connections brings those applications the performance
    required
    without any serious implications to the rest of the "family" (as I outlined
    in Pittsburgh
    controllers and servers that don't need multiple connections/session don't
    have to implement them).
    
    Storage traffic requirements will always exceed those of many other
    applications.
    
    As for the "one-connection-per-LU" we covered this solution in long
    discussions
    and even several full fledged implementation - as it is compelingly simple.
    However the resource consumption is unjustifiably high and the security
    problems are
    even worse (the LUs "viewed" by an initiator depend on who he says he is)
    than
    in the current draft.
    
    Regards,
    Julo
    
    
    
    David Robinson <David.Robinson@EBay.Sun.COM> on 04/08/2000 02:43:11
    
    Please respond to David Robinson <David.Robinson@EBay.Sun.COM>
    
    To:   ips@ece.cmu.edu
    cc:    (bcc: Julian Satran/Haifa/IBM)
    Subject:  Requirements specification
    
    
    
    
    To further elaborate on my comments in Pittsburgh on multiple
    connections per link and connections per LUN vs per target.
    
    The current requirements specify that the protocol must support
    multiple connections per session.  So far the only justification
    for this that I have clearly heard is performance, current and future
    systems will demand bandwidth that will require aggregation. Is there
    any other reason for multiple connections?
    
    My challenge to this requirement is that it is fundementally a link
    and transport layer issue that is being exposed to the session layer
    due to a perception that current link/transport implementations are not
    adequate to meet perceived demand.  The key question here is if this
    is a "physics" issue that can't be solved with better implementations
    or just bad implementations? I am leaning towards the latter. I expect
    that if this protocol is a success, a number of highly tuned adapters
    using tricks such as hardware assist will be developed.  Those doing
    the development will have direct control over the quality of the
    implementation.  Furthermore, the performance critical environments
    are likely to be local in nature so preassure to create necessary
    switches and routers will also exist.
    
    The advantages of limiting a single connection per session should be
    a simplification in the connection management and error handling.  From
    the earliest drafts we have already seen restrictions of individual
    command/data/status sequences to a single connection to better handle
    ordering issues. I forsee further restrictions possibly being
    required to cover handling of lost connections when sequences are
    received out of across multiple connections. Similarily Steve's
    comments on security management of multiple connections is of concern.
    
    The second area that I brought up was the requirement of one session
    per initiator target pair instead of one per LUN (i.e. SEP). I am willing
    to accept the design constraint that a single target must address
    10,000 LUNs which can be done with a connection per LUN. However,
    statements of scaling much higher into the areas where 64K port
    limitations appear I think is not reasonable.  Given the bandwidth
    available on today's and near future drives that will easily
    exceed 100MBps I can't imagine designing and deploying storage systems
    with over 10,000 LUNs but only one network adapter.  Even with 10+ Gbps
    networks this will be a horrible throughput bottleneck that will
    get worse as storage adapters appear to be gaining bandwidth faster than
    networks. Therefore requiring greater than 10,000 doesn't seem necessary.
    
    >From the performance perspective, a connection per LUN also makes sense.
    SCSI command flows are already being constrained to a single connection
    in the current proposal for ordering reasons, so the number of
    concurrent outstanding requests per LUN is a manageable number. The
    concurrency desired by multiple connections per session in the
    existing draft will naturally occur with a connection per LUN.  As
    each TCP connection is a unique flow existing link layer hardware
    that tries to preserve ordering based on a "flow" (likely IP/port pairs)
    will give the desired performance properties. Both my objections and
    the requirements for multiple connections I question above become moot.
    
    >From a connection management, command ordering, and error recover
    perspective things should also get simplier.  Ordering is obviously
    maintained and the sender can now recover from connection errors
    based on a smaller context and possibly use TCP layer information
    to determine what responses were received (ACK windows?).
    
    To summarize I would like to see the requirements changed to reflect
    a maximum of 64K LUNs per IP node, require only one transport layer
    connection per session, and define a session to be an initiator/LUN
    pair.
    
         -David
    
    
    
    
    
    
    
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:07:56 2001
6315 messages in chronological order