minutes of iSCSI meeting 19 June 2000

To: ips@ece.cmu.edu, scsi-tcp@external.cisco.com
Subject: minutes of iSCSI meeting 19 June 2000
From: meth@il.ibm.com
Date: Mon, 19 Jun 2000 19:17:31 +0300
Content-Disposition: inline
Content-type: text/plain; charset=us-ascii
Delivery-Date: Mon Jun 19 12:21:50 2000
Sender: owner-ips@ece.cmu.edu





iSCSI design team meeting
Monday, 19 June 2000
Haifa, Israel

Attendees:
AA   Alaan Azagury (IBM)
JD   John Dowdy (IBM)
SDG  Steve De Grate (NuSpeed)
RH   Randy Haagens (HP)
GH   Gabi Hecht (Gadzooks)
JH   John Hufferd (IBM)
SL   Steve Legg (IBM)
JM   John Matze (Veritas)
KM   Kalman Meth (IBM)
NN   Nelson Nahum (Storage)
LDO  Luciano Dalle Ore (Quantumm)
CS   Costa Sapuntzakis (Cisco)
JS   Julian Satran (IBM)
MS   Mark Shifardt (NuSpeed)
MT   Meir Toledano (IBM)
MW   Matt Wakeley (Agilent)
EZ   Efri Zeidner (SanGate)


Disclaimer: Rough paraphrase of some of what was said. Some comments may be
 incorrectly attributed.



Comments on proposed agenda:

JS: Ed Gardner wrote to add action items and produce schedule for next
month.
Paul missed his connection, so we'll push off discussion of security until
tomorrow.
We'll discuss Error Recovery today instead.
JH: When will we discuss Discovery?
JS: After the Pittsburgh IETF meeting.

JS: Can Luciano please provide us with all that is written on Security.
LDO: Will send out what he has.





Overview of Requirements draft:

RH: We first have an Applicablity Statement.

Discussion on Applicability Section, paragraph by paragraph:

JD: Is iSCSI a "mapping" or an "encapsulation?"
JS: It is a mapping. SAM defines an RPC model. It is somewhat abstract.
It is not simply a command that can be unwrapped and delivered.

JM: SCSI is sector based rather than block based. Applicability statement
uses the term "block."
Are we abstracting that out? Also there is also no mention of tapes and
other devices in the applicability statement.
RH: Yes, tapes and other devices semantics were meant to be incuded.
JH: Tapes are relevant. We don't want to assume it is a controller. Can
have remote tape.
RH: One of the things wrong with FC is that they are totally disk drive
oriented.
If we keep in mind the model of conneting to large SCSI controllers, we
won't tie ourselves into a too narrow applicability.
MW: Whatever can be done over SCSI, we want to do over iSCSI. It doesn't
matter if you use the term "block" or "sector."
RH: We'll adjust the language.

CS: Applicability section should be aimed at people who are not convinced
of the advantages of iSCSI.
For example, people who cannot imagine placing storage directly on the
network.
CS: We should also explain why we choose SCSI for accessing the devices.
Why not some other block storage protocol?
JS: Because it is ubiquitous. And this is already said in the Applicability
 statement.
CS: In the IETF, anybody can get up and say they don't like it. We have to
justify it.
We need to simply add a paragraph that there is a large installed base that
 uses SCSI and we want to leverage this base.
This will defuse the argument against iSCSI.
JS: Also SCSI is a living protocol.
LDO: Also mention the timeframe since customers are already asking for it,
and we can't spend the time to invent new protocols.
RH: The first paragraph of the Applicability statement already hints to
these things.
JH: Let's add another sentence (at xxx location) that explicitly says what
Costa raised to defelect the objections.

JS: In applications section, can add clusters (in addition to consolidatin
and pooling).
RH: Do we have to include desktop? Will iSCSI take over IDE disk interface?
Several: No.
EZ: Yes, we do. This is what will take over for the local bus. Similar to
the Infiniband idea.
JH: Isn't this included in "Local storage access?"

CS: Should include in applications section: shared DVD players, CD burners,
 etc.
JH: Scanners?
JS: We should then also add something about QoS.
JH: Can we simply put a bullet mentioning these things in the Applicability
 statement, and then not discuss it further.
RH: We are puttng SCSI over xxxx. We should therefore support all that SCSI
 supports. We are not out to support everything.
Just once we support SCSI, we should aim to support all that is supported
by SCSI.
We'll add some language that this protocol aims to support the various SCSI
 command sets.
How successful we'll be depends on how well layered things are.

JS: Under topology, simply reference LAN. Delete reference to Ethernet. We
aren't limiting ourselves to any particular technology.

LdO: Could also add storage over general internet using encryption.
JH: Isn't that then a VPN?
LdO: The IETF is about general connectivity over the internet. So this is
an importatn point.
RH: We'll adjust language to "Private and Public networks .."

CS: TCP adaptive retransmission is not limited to local area.
RH: The point is that even in the LAN, there are advantages of TCP for
error recovery over others (like FC).
JS: We should state that explicitly.
JH: This might poke the FC people in the eye. Do we want to say this?
RH: The way it is written, the point is made without poking in the eye.
CS: With an Ethernet switch, there can be congestion even in a LAN. So TCP
is advantageous there also.
LdO: We should say that we want something that works and get it going fast.

We therefore have to use what we have today: SCSI and TCP. This will
defelct most of the dissenters.

"The full realization ..."
CS: Are we saying that this can't be done in software.
JH: The "full" realization..." Without hardware support, iSCSI will never
get into servers.
JM&KM: Let's say "While iSCSI can be implemented totally in software, the
FULL realization will involve ....."

What will go on these new NICs? Discussion.

A key goal is to not require modifications to existing protocols.
AA: iSCSI also enables device sharing.
Won't this have an affect on T10 and existing SCSI protocols, since we now
enable a totally new application of their protocols?
Shouldn't we same something here about the possible affect on these
protocols?
RH: We'll add a few sentences about the possible sresses on these protocols
 as iSCSI develops.
JD: Such stresses already exist from video and other things that affect
evolving of TCP.
RH: We won't add requirements to these protocols, but we might push them to
 some new features.

Paragraph on security: separate networks for storage traffic.
RH: Perhaps can add a firewall to allow only the storage traffic through to
 ensure the security.
JS: We have to address on our own the security needs of iSCSI (as required
by IETF).

             Enterprise LAN
             -----------------------------------
                                         |
                                       ----- Storage Management Firewall
                                         |
             -----------------------------------
              Storage LAN

Must ensure that IP packets cannot be routed to the Storage LAN.
Routing will have to be turned off to disable any packets from getting to
the Storage LAN except through the Storage Management Firewall.

JM: FC is unroutable and therefore get the security.
For IP we'll have to do something to prevent routing.

RH: In the requirements sections, there are contributions from others
included. They will be noted in the references.




Overview of iSCSI draft:

JS: initiator, target, TCP connections, session, <command, data, status>
affinity to a single connection,
why several connections per session, evolution of  proposals for multiple
channels, some special iSCSI messages
(Login, Ping, asynchronous event, task management, text).

JH: How can an initiator figure out how many connections to use?
CS: This may be an implementatin issue. We just have to provide the
infrastructure to do it.
JH: Every scenario raised requires a simulation to determine what is
optimal. Is there a better way to determine an almost-optimal number of
connections?

Clarification of multiple tcp connections per single iSCSI session. Can
have multiple active tasks per iSCSI session.
A separate iSCSI session defines a separate initiator. Picture on board.


             x     x     x     active tasks
              \    |    /
               \   |   /
                \  |  /
                 \ | /
------------------------------ iSCSI layer
                   x           iSCSI session
                 / | \
                /  |  \
               |   |   |
               |   |   |       tcp connection group
               |   |   |       same or different IP address
               |   |   |
                \  |  /
                 \ | /
                   x           iSCSI session
------------------------------ iSCSI layer
                 / | \
                /  |  \
               |   |   |       device servers



We need a more full discussion of iSCSI sessions in the document.

JM: CDBs can have tacked on to them some vendor specific data (parameters),
 which then messes up our assumption of fixed sized headers.
MW: Should not have any data sent in command phase.
CS: In parallel SCSI, the parameters get sent in the data phase.
MW: Therefore, the parameters shoud be sent only in an iSCSI data phase and
 not in a command phase.

We now have 2 ways of sending parameters: either tacked on to a command or
as data. This may cause confusion.

What is the maximum length of CDB? Do we limit it artificially? We don't
want to parse the CDB itself to determine how long it is.

CS&RH: If we get rid of the parameters and have only a CDB, then we are OK.
 We then insist that the parameters get sent in a data phase.
But then we have to perform another read operation to get the parameters.
MW: What about WRITE without RTT. It would be nice to have the data
appended to the command.

JS: The Length field can be split.
JM: Have an offset field to specify where the data begins.

MW: Need a version number.
Several: Put it only in Login. No need to have it in each packet.


Lunch



RH: Discussion on RTT. Don't want more than one round-trip delay.
How long is max SCSI CDB?
What is the "right" way to communicate command parameters?
If data will follow command without RTT, can we include it with the command
 packet?
It would be nice if we could avoid requiring a non-RTT WRITE to have to go
in a separate command and data packet,
since this would cause complications on the receiving end to connect the
packets back together.
JM: Add another length field in the header to state where the data starts.
Further discussion. How many fields must we delineate?
We'll come back to this tomorrow.
Seems to be concensus that there are 3 fields: iSCSI header, variable
length CDB (including CDB  extension), data.


(1)
         D | C | H   ----->
                   <----- H | S

(2)
         D | H         C | H    ----->
                   <----- H | S


(3)
         C | H ----->
                   <----- H | RTT
         D | H ----->
                   <----- H | S

RH: We want to enable (1). (2) causes problems for the target to implement
since it has to match up Initiator tags
between command and data, with other commands possibly having inerleaved.


MW: Either send all data with header or send all data in separate iSCSI
Data packets.

Have a bit to indicate that we have immediate data.
Discussion of fields in i SCSI Command packet. Picture on board of packet
header.
Has variable length CDB possibly extending beyond byte 40, followed by
immediate data.
Also have an "I" bit/flag to indicate that we have immediate data.
Have the other fields currently specified in SCSI command header:
"Length" field at byte 4, "Expected" field at byte 20, CDB begins at byte
24.

Costa's proposal for specifying legths.

If (I == 1) { /* immediate bit is set */
    length(CDB) = Length - Expected + 16
    length(immediate data) = Expected
}
if (I == 0) { /* no immediate data */
    length(CDB) = Length - 24
    length(immediate data) = 0;
}

The meaning of Length and Expected are essentially unchanged from what is
currently written in the draft.

Should we also send the CDB length explicitly in the header?

What must be in a header?
CS: (1) must contain all necesary information. (2) Should allow simple
implementation.
(3) Should be as short as possible. There are tradeoffs between these.
JS: We also don't want to have multiple fields that may conflict with one
another, thereby requiring consistency checks.
RH: There are also some symmetry considerations to have consistent headers.

We'll come back to all this tomorrow or Wednesday.




Discussion of error recovery:

JS: We should not attempt to re-do a failed SCSI command.
We should report that the command may have started and we should report to
the best of our ability what happened and
what is the current state. What should we do if an iSCSI connection breaks?
RH: We should differentiate between what happens at the SCSI layer and what
 happens at the iSCSI layer.
Let SCSI do its own recovery. Let's concentrate on being a good transport.
i.e. What do we do when an iSCSI connection fails?
RH: One possibiliy (1) is to simply let everything die by timeout, the
upper layer then forces session cancellation from above and cleanup,
and then create new sessions. This removes from iSCSI almost all
responsibility. (We would have to add a means to cancel an existing
session.)
JH: Why do we have to blow away the entire session? Can't we just deal with
 the commands on the broken connection.
KM&CS: Commands are sequenced. So a failure on one connection will block
the execution of commands on another connection,
thereby causing backup on the entire session.
KM: This is also dependent on whether sequencing is across the entire
session (RH's view) or is per LUN (JS's view).
LdO: Even more basically, if we have a single connection for the session,
and the connection fails, do we want to try to recover?
MT: Instead of simply letting the session hang, we can detect that the
session is broken, and we can let the upper layer know.
It can then cancel a task, etc and try to start recovery.

This is possiblility (2): hang and notify.
Another possibility (3) iSCSI recovers from TCP errors. The session stays
alive as long as one connection of the session still exists.
How do we recover from a command on the failed connection?
Most extreme possibility (4) iSCSI session will stay up no matter what (by
some magic).

Discussion. Arguments. What do we want to do?

MW: FC has methods to determine what commands have actually been delivered
and then continue from that point.

JS: Let's have a minimum action. All commands that we know about that went
over a failed connection,
we can purge from the target, and inform the upper layer. The upper layer
can then reset a task set, a LUN, or a target reset.
CS: Please write up details of proposal and we'll discuss it tomorrow or
Wednesday.

Additional discussion. Can target clean up all of its state when a session
fails?
Do we want to try to recover session level failures?
We can make an attempt to allow application to recover by reporting it,
etc.
LdO: Do we want to be more reliable than TCP? Why do we think we can do
better?

LdO: Hang silently (1) and hang and notify (2) are the same as far as iSCSI
 is concerned. They are different only with regard to implementation.
RH: SAM seems to imply a notification, but this is not explicitly stated.

When a target sees an error, should it completely clean up all state? or
wait for the initiator to tell it what it should do?

If a session died with some pending commands, should the iSCSI layer try to
 re-establish the session transparaently to the application?

What state is cleaned up on the target when the session fails? Can cancel
all outstanding tasks
SDG: In their prototype, they abort all pending tasks in the SCSI devices.
iSCSI layer cleans up all state.
The initiator must then check the device and see where it is up to and what
 the state is. The application layer can do all the necessary recovery
operations.
iSCSI simply reports failed commands, and the upper level performs its
recovery operations.

CS: In order to perform recovery at the iSCSI level, you'll have to save a
lot of information (at the target) in order to be able to recover in the
case of failure.
It may take 3-5 minutes to know that a TCP connection failed. That can be a
 lot of information to hold on to.

Should we add timeout mechanisms to iSCSI to detect failed connections?

CS:  Timeouts should be at the highest level possbile. If the application
already has a timeout mechanism, we need not add our own.

RH: We expect TCP connections to not fail very often; certainly less often
than FC. TCP may be even more reliable than SCSI.

MW: If one physical link drops and we still have other links, do we want to
 abort the entire job, or continue transparently
to the application with degraded performance? If we want the session to be
able to carry on, then we must define the
recovery mechanism. As in FC, the initiator can query the target as to what
 state it arrived at, and then continue from
that point. There are applications that would entirely fail if we report an
 error on some command
(like backup to tape which would rewind the tape and eject).
JS: If the data was not acknowledged on the target, then the data was saved
 somewhere in the TCP layer.
There is a way to recover a TCP flow without going to an upper layer (IP
takeover, TCP splicing).
MW: But then you need a way for iSCSI to get the lost information from the
TCP layer. And what do you do if TCP is implemented in hardware?
JS: This would now impose a requirement on another Working Group.
JS: The right layer for this recovery is at the TCP layer rather than at
the iSCSI layer.
We are also not likely to do this recovery any better than TCP can.

Let's have someone write up the different possibilities and then discuss
further.

RH: TCP already handles most of the problems that FC experiences that
instigated FCP-2;
dropped packets, disconnected wire for short periods, congestion, etc.
The cases where TCP actually fails will be very rare so that we can claim
that the QoS demanded by SAM is achieved,
and we can fail the command on those exceptional cases where TCP fails, and
 perform hang and notify.
JS: We should still write up a page or so of exactly what the target does
to clean up in such a case.

Conclusion: At a minimum we must support hang and notify.
We still have a question as to whether TCP can be considered reliable
enough to satisfy the QoS transport SAM requirement.

JM: If necessary, the IP connections will be made more reliable using
hardware.
CS: as in IP telephony. Still we can't do anything against a link that is
physically cut.





Laundry List:

Need better discussion of iSCSI session in the document.
RTT used to communicate X_ID.
Version # - associate to a session.
How long is max SCSI CDB?
What is the "right" way to communicate command parameters?
If data will follow command without RTT, can we include it with the command
 packet?



Action items:
JS and CS will write up a page on details for error recovery for Wednesday.
CS will present some thoughts on how to perform session recovery.
We will all think a little more about the length fields in the message
headers before deciding on Wednesday.
LdO will send us whatever he has in writing on security.
Prev by Date: Re: a tentative agenda with some more details
Next by Date: updated agenda
Prev by thread: Re: a tentative agenda with some more details
Next by thread: updated agenda
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:08:14 2001
6315 messages in chronological order