RE: Please read - proposed WG termination
H.K. Jerry Chu <Jerry.Chu <at> eng.sun.com>
2005-09-01 18:19:23 GMT
[co-chair hat off]
>These performance problems are primarily implementation-specific and have
>little to do with IB technology itself. In addition, nearly all IB
>solutions use a 2KB not the smallest MTU to transfer data - no different
Ethernet is adopting jumboframe to get more firing power. Where is IB's
equivalent of jumboframe?
>As I and others have raised over the years, the enablement
>of IP over IB to perform well is a local HCA issue not a standards
>issue. Addition of checksum off-load support to the HCA is rather trivial
>and does not require standardization (this is what is done for Ethernet
>today and is non-standard). Addition of large send off-load support is a
>local HCA issue not a standards issue and effectively provides the same
>benefit as connected mode.
Yes LSO (or TSO as some call it) is relatively easy. But LRO (large receive
offload) is a heck more difficult. IB connected transports already have all
silicons to do it. Why not just use it?
>The use of multiple QP to spread work across
>CPU for both send / receive ala the multi-queue support I've worked with
>various Ethernet IHV to get in place is again a local HCA issue (does not
>have to be visible as part of the layer 2 address resolution). One can
>construct a very nice performing IP over IB solution but there hasn't been
>much public progress to implement these de facto capabilities found in
>Ethernet solutions on IB. Getting these into a HCA implementation is a
>heck of a lot easier and faster to do than to develop a standard and
>getting all of the OS changes made (the HCA implementation issues can all
>be done underneath the IP stack just like with Ethernet so no real OS impacts).
I don't understand the large MTU issue to the OS (requiring continguous physical
addresses). Aren't all decent hardware capable of scatter/gather these days?
What's more hairy to the OS stack is the per-destination MTU and different
MTU for multicast than for unicast inherited in IPoIB CM.
>>For commercial clusters, if IB is used for storage, then you save a network
>>by having fast IP performance and can use the IB network for both. Why use
>>IB and another network for the commercial cluster, when the other network
>>supports similar bandwidth for storage and IP.
>There will always be Ethernet in any cluster so the fabric is there. The
>question is whether it is just for low-bandwidth / management services or
>for applications. For storage, need to separate the discussion into
>whether it is block or file. For block, IB gateways to Fibre Channel, etc.
>can and are being used today quite nicely. Performance is reasonable and
>the ecosystem costs, target availability, customer "pain", etc. are much
>lower than attempting to move to native IB storage. The same applies to
>file based where IB gateways to Ethernet which then attaches to file
>servers works quite nicely. In fact, the original vision of IB was that of
>an I/O fabric to create modular server solutions. The addition of IPC came
>later in the process when it was found to be relatively low cost to
>define. So, IB is successful in the HPC world and slowly entering some
>commercial solutions. To state that its future relies on getting an IP
>over IB RC solution is perhaps blowing it a bit out of proportion. The
>easier path for all is to simply use the techniques I and others have
>advocated for years now and solve the problems within the HCA
>implementation. Much lower costs and will result in delivering a good
>BTW, RNIC / Ethernet solutions implement these techniques today. With the
>arrival of 10 GbE and the lower prices of RNIC and 10 GbE switch ports,
>lower latency switches (competitive enough with IB for commercial and many
>HPC clusters), etc. the success of IB must lie elsewhere and not on an IETF
>spec. This was noted at the recent IEEE Hot Interconnects conference as
>well so isn't just my opinion.
>>Implementing IPoIB-CM makes IB viable in the HPC cluster and some
>>commercial clusters. Otherwise I don't think it competes economically with
>>other network technologies.
>>Cluster System Performance
>>wombat2 <at> us.ibm.com (845)433-8483
>>Tie. 293-8483 or wombat2 on NOTES
>>"We are not responsible for the world we are born into, only for the world
>>we leave when we die.
>>So we have to accept what has gone before us and work to change the only
>>thing we can,
>>-- The Future." William Shatner
>> Dror Goldenberg
>> <gdror <at> mellanox.c
>> o.il> To
>> Sent by: kashyapv <at> us.ltcfwd.linux.ibm.com,
>> ipoverib-bounces <at> "H.K. Jerry Chu"
>> ietf.org <Jerry.Chu <at> eng.sun.com>
>> margaret <at> thingmagic.com,
>> 08/30/2005 09:32 ipoverib <at> ietf.org,
>> AM Bill_Strahm <at> McAfee.com
>> RE: [Ipoverib] Please read -
>> proposed WG termination
>> > From: Vivek Kashyap [mailto:kashyapv <at> us.ibm.com]
>> > Sent: Tuesday, August 30, 2005 8:39 AM
>> > On Mon, 29 Aug 2005, H.K. Jerry Chu wrote:
>> > > 1. IPoIB connected mode draft-ietf-ipoib-connected-mode-00.txt
>> > > updated recently
>> > Well, in recent days there has been a discussion going on
>> > based on Dror's input. I also made some updates after some
>> > discussion on OpenIB (not on
>> > IETF though). This draft itself became a working group draft
>> > this february
>> > after some lively discussion just before that. It appears to
>> > me that we
>> > should be possible to finalise this draft soon enough.
>> > 20th sept. might be long enough to know one way or the other...
>> > vivek
>>We would like to see IPoIB-CM being finalized in IETF. We see
>>great value in having a standard for connected mode which effectively
>>increases the MTU. We are willing to contribute to the standardization
>>effort. We're also looking at the implementation of IPoIB-CM in Linux.
>>IPoverIB mailing list
>>IPoverIB <at> ietf.org
>>IPoverIB mailing list
>>IPoverIB <at> ietf.org