[LWN Logo]
[LWN.net]
Date: Tue, 3 Apr 2001 16:20:16 -0500
From: "La Monte H.P. Yarroll" <piggy@em.cig.mot.com>
To: netdev@oss.sgi.com
Subject: Minutes for Linux 2.5 Summit Networking WG

These are the notes I took at the Linux 2.5 Summit meeting of the
Networking WG.

Pardon my liberties, especially if I have misquoted anybody or omitted
some important context.  I spend most of my time at layer 4 in the
networking stack, so sometimes the layer 2 discussions went over my
head.  I might also admit that my choice of beverages was perhaps not
optimal for careful concentration.

Please send corrections to <piggy@baqaqi.chi.il.us> and
<piggy1@email.mot.com>.  If I get corrections before Thursday (4 April 
2001), I will submit corrected minutes to Rik Farrow for inclusion in
the Summit proceedings.

Sat Mar 31 23:15:02 CST 2001

new statistics
-------------------------------------------------
[We believe this is the least controversial.]
DB: "We probably want to support one of the standard MIBs."
Do we agree that we need to expand the set of statistics?   Yes.

We don't distinguish between chips that count collisions and chips
that provide only a chip-occured bit.  There is also a collision
overflow (16 collisions), but we don't have to worry about that case.

RFC1643 reports multiple collisions and collided packets separately,
but this is not available for all drivers.

There were no strong opinions.

Jamal thinks we should implement whatever the RFC's recommend.

Action Item:  Do we put the base set of statistics in the netdevice
structure or as its own data structure?

There is no performance issue.  It LOOKS like it should be in a
separate structure.

DB: How about we the log flags (formerly debug level) in the per
interface level?
AC: "I don't care what you call it.  Just put it somewhere in proc."
DB: "I don't care where it goes, but we should pick a good name."

Bitmapped or simply a level?  If you have more than 32 things to debug
something is WRONG.  Consensus:  Bitmapped is fine.

Suggestion:  The procfs interface is the old interface.  ioctl()
provides the bitmap interface.  The mapping is 0->0, 1 -> 0x0001,
 2 -> 0x0003, 3 -> 0x0007, etc...  (i.e. (1<<level) - 1).

ioctl() for link for drivers & crap (aka ethtool)
-------------------------------------------------
How can I get rid of full-duplex default?  DB: "I guess no, just deprecate it."
jgarzik: "Have a force or not force option or flag and an options
	 which is your media type."
DB: "We want to keep it native to the chip semantics, yet we want a
consistent upper layer."
Does fdx mean disable autonegotiation?  Or does it mean negotiate and
then force fdx?
This looks too hard finish.  We are moving on.

[Random aside from Avivo:  "Who's read nfsd?  What do you think of the
code quality?]

We want to get rid of ether= (lilo parameter).
AC: "How do we get rid of the dependencies?"
Paul Gortmaker: ""
AC: "How do you handle ordering?"
AC: "I'm kind of inclined to leave ether=.  It pretty-much works."

[Aside from Larry McVoy.  What is the relative complexity of scaling
to 64 processors and getting nfsd2 working?

Larry breaks nfsd a LOT...
]

DB: "ether= has to stay.  We need option 1 and option 2."
AC: "We can shoot the siocf map."
Consensus: Yes!
Andrew Morton: [Drat this sounded important...]


MMIO problems
1. Drivers which need to flush (pio doesn't need flushing and mmio does)
2. Hardware that just doesn't support mmio (hardware bugs)
3. Some hardware just doesn't even export
jg: We want MMIO to be the default in every case that it makes sense.
DB: I think I already did that.
DB: I don't think we need a general solution for 3.
No action on MMIO vs IO.


Tuning multicast filter settings, and other driver by driver stuff.
It should probably be added as an ioctl().  It is expensive to
reprogram the multicast filter, so if it is changed frequently or have
a very large number of multicast groups you should probably turn off
the filtering and go promiscuous multicast.
jg: I would prefer that these were all ethtool ioctl()'s.
jg: Let's just push through DB's proper names.
DB: Do we need a magic?
Andi Kleen: "If you are running a zero copy [somethingorother],"
DB: There are cards that support 64bit addresses and they'll do dual
cycles on a 32bit bus.  We should be careful how we do that since it
could affect the typical user at the expense of the few.'
Action Item: Andrew Morton should do something about the 64bit DMA
thing.


Link detection
-------------------------------------------------
Jamal Hadi Salim:  There is a link netcarrier on/off.  It use to be in
the netlink socket.

DB: Use the 'running' flag to mark this.  In 2.5 it should call a
function that can be inside an interrupt handler instead of setting
the flag.  It should take devnetdev structure as the argument.
Put this in the drivers now and worry about the user space interface
later.
AM: A lot of cards can generate this.  Should this be emulated with
polling?
AM: The motivating application is people who want to run pump
automatically when the link appears.

It is not passed to user space at all.  The code in netcore was taken
out.  We are free to pass it up to user level however we like.
Jamal: I like netdev.

Different topic:  Never reset /proc/net statistics.  RFC1643 and
RFC1213 make this utterly.

It is OK to check only once every 5 seconds or so.  If somebody needs
faster failover, they can buy hardware that isn't broken.  Use the
existing media timer--we'll bump that up to 5 seconds.

Perhaps we should make the media timer generic.  Every driver has one
anyway.

1) The media selection timer which we need to call every 5 seconds.
2) Do we still need the watchdog timer?  Yes.  Keep this as is.

Each driver is responsible for starting its own media selection
timer.  This allows the driver to use an on-chip timer instead.

Paul Gortmaker: Is there any need for guidelines on handling the media
timer?
Should it only run when it is open or should it start on load?
Probably on open only.

deltimer deltimer deltimer, jg agrees with DB and AM lost that fight.


Counter lengths
---------------
Received and transmit bytes.  Any reason to go from 32bit to 64bit
counters?  The problem is that 64bit inc is not atomic.  Big deal.
Don't worry about locking.

How about 64bit packet counts?  We think this can roll over is just a
few days.

User space tools will break.  Not a problem for 2.5.

Test scenarios (jgarzik really wants to cover that)
---------------------------------------------------
jgarzik will:
Send 80% of his test cards to ASDL in Oregon (he has less than 100)
where they have a huge test lab.
DB wants his own personal hotswap PCI.
TTCP [this is Donald Becker's favorite tool], Apache bench
over-length packets, runts, corrupted packets, media selection,
hardware traffic generate, white hole, out of memory conditions,

Next steps for poll
-------------------------------------------------
We'll do the patch, we'll test it and we'll roll it out.  Jamal will
get the patch to Andrew for him to break.

802.1 p,q  VLAN
-------------------------------------------------
We need to kick [somebody?] on the butt.  It's an ugly interface and we
don't know how to make it better.  There is spanning tree inside the
kernel...

Action Item:  Jamal and Andrew will sit on Ben Grier and some other
person...


set/getsockopt(IP) (for SCTP socket)
-------------------------------------------------
piggy will discuss this with Alexi too...

I will email these notes to netdev@oss.sgi.com.