SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    minutes of iSCSI meeting 19 June 2000



    
    
    
    
    
    iSCSI design team meeting
    Monday, 19 June 2000
    Haifa, Israel
    
    Attendees:
    AA   Alaan Azagury (IBM)
    JD   John Dowdy (IBM)
    SDG  Steve De Grate (NuSpeed)
    RH   Randy Haagens (HP)
    GH   Gabi Hecht (Gadzooks)
    JH   John Hufferd (IBM)
    SL   Steve Legg (IBM)
    JM   John Matze (Veritas)
    KM   Kalman Meth (IBM)
    NN   Nelson Nahum (Storage)
    LDO  Luciano Dalle Ore (Quantumm)
    CS   Costa Sapuntzakis (Cisco)
    JS   Julian Satran (IBM)
    MS   Mark Shifardt (NuSpeed)
    MT   Meir Toledano (IBM)
    MW   Matt Wakeley (Agilent)
    EZ   Efri Zeidner (SanGate)
    
    
    Disclaimer: Rough paraphrase of some of what was said. Some comments may be
     incorrectly attributed.
    
    
    
    Comments on proposed agenda:
    
    JS: Ed Gardner wrote to add action items and produce schedule for next
    month.
    Paul missed his connection, so we'll push off discussion of security until
    tomorrow.
    We'll discuss Error Recovery today instead.
    JH: When will we discuss Discovery?
    JS: After the Pittsburgh IETF meeting.
    
    JS: Can Luciano please provide us with all that is written on Security.
    LDO: Will send out what he has.
    
    
    
    
    
    Overview of Requirements draft:
    
    RH: We first have an Applicablity Statement.
    
    Discussion on Applicability Section, paragraph by paragraph:
    
    JD: Is iSCSI a "mapping" or an "encapsulation?"
    JS: It is a mapping. SAM defines an RPC model. It is somewhat abstract.
    It is not simply a command that can be unwrapped and delivered.
    
    JM: SCSI is sector based rather than block based. Applicability statement
    uses the term "block."
    Are we abstracting that out? Also there is also no mention of tapes and
    other devices in the applicability statement.
    RH: Yes, tapes and other devices semantics were meant to be incuded.
    JH: Tapes are relevant. We don't want to assume it is a controller. Can
    have remote tape.
    RH: One of the things wrong with FC is that they are totally disk drive
    oriented.
    If we keep in mind the model of conneting to large SCSI controllers, we
    won't tie ourselves into a too narrow applicability.
    MW: Whatever can be done over SCSI, we want to do over iSCSI. It doesn't
    matter if you use the term "block" or "sector."
    RH: We'll adjust the language.
    
    CS: Applicability section should be aimed at people who are not convinced
    of the advantages of iSCSI.
    For example, people who cannot imagine placing storage directly on the
    network.
    CS: We should also explain why we choose SCSI for accessing the devices.
    Why not some other block storage protocol?
    JS: Because it is ubiquitous. And this is already said in the Applicability
     statement.
    CS: In the IETF, anybody can get up and say they don't like it. We have to
    justify it.
    We need to simply add a paragraph that there is a large installed base that
     uses SCSI and we want to leverage this base.
    This will defuse the argument against iSCSI.
    JS: Also SCSI is a living protocol.
    LDO: Also mention the timeframe since customers are already asking for it,
    and we can't spend the time to invent new protocols.
    RH: The first paragraph of the Applicability statement already hints to
    these things.
    JH: Let's add another sentence (at xxx location) that explicitly says what
    Costa raised to defelect the objections.
    
    JS: In applications section, can add clusters (in addition to consolidatin
    and pooling).
    RH: Do we have to include desktop? Will iSCSI take over IDE disk interface?
    Several: No.
    EZ: Yes, we do. This is what will take over for the local bus. Similar to
    the Infiniband idea.
    JH: Isn't this included in "Local storage access?"
    
    CS: Should include in applications section: shared DVD players, CD burners,
     etc.
    JH: Scanners?
    JS: We should then also add something about QoS.
    JH: Can we simply put a bullet mentioning these things in the Applicability
     statement, and then not discuss it further.
    RH: We are puttng SCSI over xxxx. We should therefore support all that SCSI
     supports. We are not out to support everything.
    Just once we support SCSI, we should aim to support all that is supported
    by SCSI.
    We'll add some language that this protocol aims to support the various SCSI
     command sets.
    How successful we'll be depends on how well layered things are.
    
    JS: Under topology, simply reference LAN. Delete reference to Ethernet. We
    aren't limiting ourselves to any particular technology.
    
    LdO: Could also add storage over general internet using encryption.
    JH: Isn't that then a VPN?
    LdO: The IETF is about general connectivity over the internet. So this is
    an importatn point.
    RH: We'll adjust language to "Private and Public networks .."
    
    CS: TCP adaptive retransmission is not limited to local area.
    RH: The point is that even in the LAN, there are advantages of TCP for
    error recovery over others (like FC).
    JS: We should state that explicitly.
    JH: This might poke the FC people in the eye. Do we want to say this?
    RH: The way it is written, the point is made without poking in the eye.
    CS: With an Ethernet switch, there can be congestion even in a LAN. So TCP
    is advantageous there also.
    LdO: We should say that we want something that works and get it going fast.
    
    We therefore have to use what we have today: SCSI and TCP. This will
    defelct most of the dissenters.
    
    "The full realization ..."
    CS: Are we saying that this can't be done in software.
    JH: The "full" realization..." Without hardware support, iSCSI will never
    get into servers.
    JM&KM: Let's say "While iSCSI can be implemented totally in software, the
    FULL realization will involve ....."
    
    What will go on these new NICs? Discussion.
    
    A key goal is to not require modifications to existing protocols.
    AA: iSCSI also enables device sharing.
    Won't this have an affect on T10 and existing SCSI protocols, since we now
    enable a totally new application of their protocols?
    Shouldn't we same something here about the possible affect on these
    protocols?
    RH: We'll add a few sentences about the possible sresses on these protocols
     as iSCSI develops.
    JD: Such stresses already exist from video and other things that affect
    evolving of TCP.
    RH: We won't add requirements to these protocols, but we might push them to
     some new features.
    
    Paragraph on security: separate networks for storage traffic.
    RH: Perhaps can add a firewall to allow only the storage traffic through to
     ensure the security.
    JS: We have to address on our own the security needs of iSCSI (as required
    by IETF).
    
                 Enterprise LAN
                 -----------------------------------
                                             |
                                           ----- Storage Management Firewall
                                             |
                 -----------------------------------
                  Storage LAN
    
    Must ensure that IP packets cannot be routed to the Storage LAN.
    Routing will have to be turned off to disable any packets from getting to
    the Storage LAN except through the Storage Management Firewall.
    
    JM: FC is unroutable and therefore get the security.
    For IP we'll have to do something to prevent routing.
    
    RH: In the requirements sections, there are contributions from others
    included. They will be noted in the references.
    
    
    
    
    Overview of iSCSI draft:
    
    JS: initiator, target, TCP connections, session, <command, data, status>
    affinity to a single connection,
    why several connections per session, evolution of  proposals for multiple
    channels, some special iSCSI messages
    (Login, Ping, asynchronous event, task management, text).
    
    JH: How can an initiator figure out how many connections to use?
    CS: This may be an implementatin issue. We just have to provide the
    infrastructure to do it.
    JH: Every scenario raised requires a simulation to determine what is
    optimal. Is there a better way to determine an almost-optimal number of
    connections?
    
    Clarification of multiple tcp connections per single iSCSI session. Can
    have multiple active tasks per iSCSI session.
    A separate iSCSI session defines a separate initiator. Picture on board.
    
    
                 x     x     x     active tasks
                  \    |    /
                   \   |   /
                    \  |  /
                     \ | /
    ------------------------------ iSCSI layer
                       x           iSCSI session
                     / | \
                    /  |  \
                   |   |   |
                   |   |   |       tcp connection group
                   |   |   |       same or different IP address
                   |   |   |
                    \  |  /
                     \ | /
                       x           iSCSI session
    ------------------------------ iSCSI layer
                     / | \
                    /  |  \
                   |   |   |       device servers
    
    
    
    We need a more full discussion of iSCSI sessions in the document.
    
    JM: CDBs can have tacked on to them some vendor specific data (parameters),
     which then messes up our assumption of fixed sized headers.
    MW: Should not have any data sent in command phase.
    CS: In parallel SCSI, the parameters get sent in the data phase.
    MW: Therefore, the parameters shoud be sent only in an iSCSI data phase and
     not in a command phase.
    
    We now have 2 ways of sending parameters: either tacked on to a command or
    as data. This may cause confusion.
    
    What is the maximum length of CDB? Do we limit it artificially? We don't
    want to parse the CDB itself to determine how long it is.
    
    CS&RH: If we get rid of the parameters and have only a CDB, then we are OK.
     We then insist that the parameters get sent in a data phase.
    But then we have to perform another read operation to get the parameters.
    MW: What about WRITE without RTT. It would be nice to have the data
    appended to the command.
    
    JS: The Length field can be split.
    JM: Have an offset field to specify where the data begins.
    
    MW: Need a version number.
    Several: Put it only in Login. No need to have it in each packet.
    
    
    Lunch
    
    
    
    RH: Discussion on RTT. Don't want more than one round-trip delay.
    How long is max SCSI CDB?
    What is the "right" way to communicate command parameters?
    If data will follow command without RTT, can we include it with the command
     packet?
    It would be nice if we could avoid requiring a non-RTT WRITE to have to go
    in a separate command and data packet,
    since this would cause complications on the receiving end to connect the
    packets back together.
    JM: Add another length field in the header to state where the data starts.
    Further discussion. How many fields must we delineate?
    We'll come back to this tomorrow.
    Seems to be concensus that there are 3 fields: iSCSI header, variable
    length CDB (including CDB  extension), data.
    
    
    (1)
             D | C | H   ----->
                       <----- H | S
    
    (2)
             D | H         C | H    ----->
                       <----- H | S
    
    
    (3)
             C | H ----->
                       <----- H | RTT
             D | H ----->
                       <----- H | S
    
    RH: We want to enable (1). (2) causes problems for the target to implement
    since it has to match up Initiator tags
    between command and data, with other commands possibly having inerleaved.
    
    
    MW: Either send all data with header or send all data in separate iSCSI
    Data packets.
    
    Have a bit to indicate that we have immediate data.
    Discussion of fields in i SCSI Command packet. Picture on board of packet
    header.
    Has variable length CDB possibly extending beyond byte 40, followed by
    immediate data.
    Also have an "I" bit/flag to indicate that we have immediate data.
    Have the other fields currently specified in SCSI command header:
    "Length" field at byte 4, "Expected" field at byte 20, CDB begins at byte
    24.
    
    Costa's proposal for specifying legths.
    
    If (I == 1) { /* immediate bit is set */
        length(CDB) = Length - Expected + 16
        length(immediate data) = Expected
    }
    if (I == 0) { /* no immediate data */
        length(CDB) = Length - 24
        length(immediate data) = 0;
    }
    
    The meaning of Length and Expected are essentially unchanged from what is
    currently written in the draft.
    
    Should we also send the CDB length explicitly in the header?
    
    What must be in a header?
    CS: (1) must contain all necesary information. (2) Should allow simple
    implementation.
    (3) Should be as short as possible. There are tradeoffs between these.
    JS: We also don't want to have multiple fields that may conflict with one
    another, thereby requiring consistency checks.
    RH: There are also some symmetry considerations to have consistent headers.
    
    We'll come back to all this tomorrow or Wednesday.
    
    
    
    
    Discussion of error recovery:
    
    JS: We should not attempt to re-do a failed SCSI command.
    We should report that the command may have started and we should report to
    the best of our ability what happened and
    what is the current state. What should we do if an iSCSI connection breaks?
    RH: We should differentiate between what happens at the SCSI layer and what
     happens at the iSCSI layer.
    Let SCSI do its own recovery. Let's concentrate on being a good transport.
    i.e. What do we do when an iSCSI connection fails?
    RH: One possibiliy (1) is to simply let everything die by timeout, the
    upper layer then forces session cancellation from above and cleanup,
    and then create new sessions. This removes from iSCSI almost all
    responsibility. (We would have to add a means to cancel an existing
    session.)
    JH: Why do we have to blow away the entire session? Can't we just deal with
     the commands on the broken connection.
    KM&CS: Commands are sequenced. So a failure on one connection will block
    the execution of commands on another connection,
    thereby causing backup on the entire session.
    KM: This is also dependent on whether sequencing is across the entire
    session (RH's view) or is per LUN (JS's view).
    LdO: Even more basically, if we have a single connection for the session,
    and the connection fails, do we want to try to recover?
    MT: Instead of simply letting the session hang, we can detect that the
    session is broken, and we can let the upper layer know.
    It can then cancel a task, etc and try to start recovery.
    
    This is possiblility (2): hang and notify.
    Another possibility (3) iSCSI recovers from TCP errors. The session stays
    alive as long as one connection of the session still exists.
    How do we recover from a command on the failed connection?
    Most extreme possibility (4) iSCSI session will stay up no matter what (by
    some magic).
    
    Discussion. Arguments. What do we want to do?
    
    MW: FC has methods to determine what commands have actually been delivered
    and then continue from that point.
    
    JS: Let's have a minimum action. All commands that we know about that went
    over a failed connection,
    we can purge from the target, and inform the upper layer. The upper layer
    can then reset a task set, a LUN, or a target reset.
    CS: Please write up details of proposal and we'll discuss it tomorrow or
    Wednesday.
    
    Additional discussion. Can target clean up all of its state when a session
    fails?
    Do we want to try to recover session level failures?
    We can make an attempt to allow application to recover by reporting it,
    etc.
    LdO: Do we want to be more reliable than TCP? Why do we think we can do
    better?
    
    LdO: Hang silently (1) and hang and notify (2) are the same as far as iSCSI
     is concerned. They are different only with regard to implementation.
    RH: SAM seems to imply a notification, but this is not explicitly stated.
    
    When a target sees an error, should it completely clean up all state? or
    wait for the initiator to tell it what it should do?
    
    If a session died with some pending commands, should the iSCSI layer try to
     re-establish the session transparaently to the application?
    
    What state is cleaned up on the target when the session fails? Can cancel
    all outstanding tasks
    SDG: In their prototype, they abort all pending tasks in the SCSI devices.
    iSCSI layer cleans up all state.
    The initiator must then check the device and see where it is up to and what
     the state is. The application layer can do all the necessary recovery
    operations.
    iSCSI simply reports failed commands, and the upper level performs its
    recovery operations.
    
    CS: In order to perform recovery at the iSCSI level, you'll have to save a
    lot of information (at the target) in order to be able to recover in the
    case of failure.
    It may take 3-5 minutes to know that a TCP connection failed. That can be a
     lot of information to hold on to.
    
    Should we add timeout mechanisms to iSCSI to detect failed connections?
    
    CS:  Timeouts should be at the highest level possbile. If the application
    already has a timeout mechanism, we need not add our own.
    
    RH: We expect TCP connections to not fail very often; certainly less often
    than FC. TCP may be even more reliable than SCSI.
    
    MW: If one physical link drops and we still have other links, do we want to
     abort the entire job, or continue transparently
    to the application with degraded performance? If we want the session to be
    able to carry on, then we must define the
    recovery mechanism. As in FC, the initiator can query the target as to what
     state it arrived at, and then continue from
    that point. There are applications that would entirely fail if we report an
     error on some command
    (like backup to tape which would rewind the tape and eject).
    JS: If the data was not acknowledged on the target, then the data was saved
     somewhere in the TCP layer.
    There is a way to recover a TCP flow without going to an upper layer (IP
    takeover, TCP splicing).
    MW: But then you need a way for iSCSI to get the lost information from the
    TCP layer. And what do you do if TCP is implemented in hardware?
    JS: This would now impose a requirement on another Working Group.
    JS: The right layer for this recovery is at the TCP layer rather than at
    the iSCSI layer.
    We are also not likely to do this recovery any better than TCP can.
    
    Let's have someone write up the different possibilities and then discuss
    further.
    
    RH: TCP already handles most of the problems that FC experiences that
    instigated FCP-2;
    dropped packets, disconnected wire for short periods, congestion, etc.
    The cases where TCP actually fails will be very rare so that we can claim
    that the QoS demanded by SAM is achieved,
    and we can fail the command on those exceptional cases where TCP fails, and
     perform hang and notify.
    JS: We should still write up a page or so of exactly what the target does
    to clean up in such a case.
    
    Conclusion: At a minimum we must support hang and notify.
    We still have a question as to whether TCP can be considered reliable
    enough to satisfy the QoS transport SAM requirement.
    
    JM: If necessary, the IP connections will be made more reliable using
    hardware.
    CS: as in IP telephony. Still we can't do anything against a link that is
    physically cut.
    
    
    
    
    
    Laundry List:
    
    Need better discussion of iSCSI session in the document.
    RTT used to communicate X_ID.
    Version # - associate to a session.
    How long is max SCSI CDB?
    What is the "right" way to communicate command parameters?
    If data will follow command without RTT, can we include it with the command
     packet?
    
    
    
    Action items:
    JS and CS will write up a page on details for error recovery for Wednesday.
    CS will present some thoughts on how to perform session recovery.
    We will all think a little more about the length fields in the message
    headers before deciding on Wednesday.
    LdO will send us whatever he has in writing on security.
    
    


Home

Last updated: Tue Sep 04 01:08:14 2001
6315 messages in chronological order