[LWN Logo]

Date:	Sun, 6 Jun 1999 19:20:39 +0200
From:	Martin Mares <mj@ucw.cz>
To:	Linux Kernel Mailing List <linux-kernel@vger.rutgers.edu>, Linus Torvalds <torvalds@transmeta.com>, davem@redhat.com, dhinds@zen.stanford.edu,
Subject: RFC: Devices, buses and hotplug

Hello, world!

This is a brief summary of my thoughts on Linux device drivers, hotplug,
modules and other related things. Basically, it's a roadmap of all problems
I think we need to solve on our way to hotplug support and, of course,
world domination :-)

Comments (especially from Linus and the other gods) are really welcome.
I'd like to know your opinion before I start writing a more verbose
description and implementing the things.

				Have fun
							Martin


Device hierarchy
~~~~~~~~~~~~~~~~
The kernel should have some central data structure describing all known
devices, buses and their hierarchy.

Basically, most of the device data structure is bus-dependent, but
there is some generic info like pointer to bus_operations (a struct
containing pointers to common operations like virt_to_bus which need
a bus-dependent implementation), a list of resources assigned to the
device and a reference to the driver attached (if any; newly written
drivers will need no resource allocations themselves -- they just
allocate the whole device).


Drivers
~~~~~~~
Driver entry points and parameter passing should be unified -- no more
differences between modules and drivers compiled in the kernel.

Whenever possible, the probing for devices should be done by the
generic code and the drivers should just list the ID's of devices
they want to handle. Also, adding of devices to already running
drivers should be possible, so that we won't need to load the same
module three times when we want to use it for three devices.

Each driver should provide an array of driver tags stored in a special
ELF section which contain information about:

	o  Initialization function. (Will be used mostly for ISA
	   drivers, filesystems et cetera.)  Called when the driver
	   is loaded or initialized during bootup.
	o  Cleanup function.  Called when the driver is going to
	   be unloaded.
	o  Device ID tags: For each device supported, the driver
	   specifies its ID pattern (bus-dependent; for PCI, it's
	   just vendor ID, device ID, the subsystem ID's and
	   class, 0xffff acting as a wildcard) and function to
	   call when the generic probing code for that bus finds
	   a device matching the pattern. The driver either accepts
	   the device and starts driving it or rejects it and the
	   probing code will continue searching through the tag list.
	   [Structure and matching of ID patterns is of course
	    bus-dependent.]
	o  Parameters to be passed.  Basically, a straightforward
	   extension of the current module parameter system, but it
	   will be supported for in-kernel drivers as well.
	o  Driver descriptions and other texts -- will be omitted
	   for in-kernel drivers, but left in case of modules.

Each tag is assigned a priority which controls the order in which
are tags with identical type processed. For most tags, this priority
will be zero, but for example ISA cards require ordering of their
autoprobe routines.

Devices having non-standard probing requirements can of course
skip all the tag machinery and just list their initialization functions
which will take care of everything.


/proc
~~~~~
Procfs will contain a subtree depicting the device hierarchy. For each
device there will be a directory with generic information files
(device ID, resources allocated, driver attached etc.), bus dependent
files (configuration registers for PCI, USB descriptors etc.),
driver dependent files (possibly also special files for character
or block devices generated by the driver) and finally subdirectories
for devices connected to this device.

One of possible ways to do this is to have all the buses in the root
(the hierarchy of buses will be shown only by symlinks and in case
there is only one USB bus, it will be always accesible as usb0, not
depending on what controller it's connected to). Hierarchical levels
inside one bus (e.g., PCI buses connected to bridges and so on) won't
be collapsed this way, so that we will get:

	host/i8042/kbd0 -> ../../../kbd0	First keyboard port at i8042
	host/i8042/kbd1 -> ../../../kbd1	Second keyboard port at i8042 (the AUX port)
	pci0/03.0/usb0 -> ../../usb0		The USB controller with its bus #0 known as usb0
	pci0/04.0/01.2/usb0 -> ../../../usb1	USB controller behind PCI-to-PCI bridge with its bus #0
						known as usb1
	pci0/09.0/pcmcia0 -> ../../pcmcia0	PCMCIA controller
	isa0/03E8				Serial port on ISA
	isa0/03E8/kbd0 -> ../../../kbd2		Third keyboard port emulated on serial controller
	usb0/0					A USB device
	usb0/1/2				A USB device behind a hub
	usb1/0					Another USB device on different USB bus


Resource management
~~~~~~~~~~~~~~~~~~~
We need a better resource manager which will not only prevent drivers
from clashing with each other (today we do it for I/O only anyway), but
also will be able to assign free address space regions to newly inserted
cards. This includes keeping a list of assigned I/O addresses, memory
blocks, IRQs and DMA channels.

Since different buses and different architectures have their specific
requirements for region alignment and ranges, we should make resource
allocation one of the bus_ops and either do it directly in the bus-dependent
code or pass it to the parent bus (possibly chaning range, alignment
and flags), stopping at the host bus level where it will be handled
by arch-dependent code.

Unfortunately, this probably requires calling the PNP BIOS to get all the
regions magically occupied by motherboard hardware :-(


Naming of devices
~~~~~~~~~~~~~~~~~
Naming of devices is a hard issue and (as shown during recent talks at
the linux-usb list) no solution is correct -- some of them are too complex
for simple workstation, some fail on large servers with lots of devices
being swapped in mysterious ways. Therefore it should be a matter of user
choice and as such a userland issue.

On the other hand, the kernel should provide some support for userland
device naming programs -- an ideal form is the bus tree in /proc
introduced above plus some information about relationship between
the devices and their special files (e.g. by including the special
files in the bus tree as well).

Traditional special files in /dev still exist for backward compatibility
and simple setups, but the user has a possibility to name everything
in his own way. As he has all the information about device identity,
connections etc., he can name mice according to their connection,
SCSI disks according to their serial numbers and /dev/lp be simply
"the only true printer".


HotPlug
~~~~~~~
With the device architecture outlined above, hotplug support should be
close to trivial:

   (1) Bus-dependent code detects plugging of a new device. It reads the
       device headers, creates a device structure, calls the resource
       manager to assign addresses and adds the device to the
       hiararchy by notifing the device layer.

   (2) Generic device handling code scans tags of all loaded drivers
       and if it finds a driver, it just passes the device to the
       driver and everything is done. [We can also use kmod to notify
       userland about new device appearing which needs to be named.]

   (3) If no driver matches and kmod is enabled, call modprobe to find
       a module driving our new device (it has enough information to find
       it as depmod is aware of the device tags and extracts a ID -> module
       mapping from all modules). The module is inserted, initialized
       and passed the device it should handle.

[time passes, water in the river flows and the user finally decides to unplug
the device.]

   (4) The bus-dependent code receives an unplug notification and sends
       it to the driver. The driver releases the device, bus-dependent
       code removes it and deallocates all of its resources. Done.


Host Bus
~~~~~~~~
The Host Bus is a virtual bus loosely corresponding to devices on the motherboard
which are not connected to any other bus. Essentially, all the host bridges,
system timers, keyboard controllers and similar strange creatures live here.


ISA
~~~
ISA doesn't fit too well to our framework since the devices have no IDs,
but we can use base addresses instead of IDs. Also, request_region
et cetera can be modified to automatically create device nodes in case
of old drivers.


ISAPnP
~~~~~~
ISAPnP is probably much too complex to be fully handled in the kernel --
assigning the right addresses to all PnP devices is a very hard task and
it's probably NP-complete, so the addresses should be assigned by a userspace
program controlled by a configuration file.

On the other hand, the kernel should know of ISAPnP devices in order to make
it possible for drivers to find their devices and determine their addresses.
We can do it this way: ISAPnP is a separate bus in our hierarchy, the kernel
is able to enumerate the devicesm, read their addresses, create device nodes
for them (so that the drivers know everything) and export PnP register interface
to userland (similarly as we currently do PCI configuration registers).
During bootup, the kernel will be able to start using devices which have
been initialized by the BIOS (so that it will be possible to boot off a
PnP SCSI card) and then a userland utility will be run which will configure
the rest, notify the kernel about the changes, the kernel will rescan the
addresses and announce the new devices to the drivers as if they were
just plugged in.


PCI
~~~
PCI can be adapted to this approach very easily and in a backward-compatible
way. Outside of PCI subsystem implementation details, the drivers never need
to know bus and device numbers -- they can just use the pci_dev structure
as a opaque handle representing the device and pass it to generic PCI services
if they want to read/write configuration registers (we already use this
approach anyway).


PCMCIA and CardBus
~~~~~~~~~~~~~~~~~~
I believe both PCMCIA and CardBus should be treated as separate hotplug
capable buses, but as I'm no PCMCIA guru, I'd like David Hinds to tell
me his opinion on this.


Device access
~~~~~~~~~~~~~
All addresses in device nodes should be expressed as physical addresses,
i.e. those accepted by ioremap(), /dev/mem and similar interfaces. Each bus
should also define phys_to_bus and bus_to_phys translation functions, because
address translating differs from bus to bus even on one architecture. Also
a separate functions for requesting address translation before doing a DMA
and freeing it after doing a DMA should be created to handle IOMMUs correctly.

readb() and similar functions should no longer accept and automatically convert
	ISA addresses. Most drivers should be converted to use ioremap()
	properly and if it's too hard, isa_readb() introduced and used instead
	of readb() in non-converted drivers.
readl() et al. should also exist in variants directly specifying endianity
	of target data.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/