Date: Sun, 6 Jun 1999 19:20:39 +0200 From: Martin Mares <mj@ucw.cz> To: Linux Kernel Mailing List <linux-kernel@vger.rutgers.edu>, Linus Torvalds <torvalds@transmeta.com>, davem@redhat.com, dhinds@zen.stanford.edu, Subject: RFC: Devices, buses and hotplug Hello, world! This is a brief summary of my thoughts on Linux device drivers, hotplug, modules and other related things. Basically, it's a roadmap of all problems I think we need to solve on our way to hotplug support and, of course, world domination :-) Comments (especially from Linus and the other gods) are really welcome. I'd like to know your opinion before I start writing a more verbose description and implementing the things. Have fun Martin Device hierarchy ~~~~~~~~~~~~~~~~ The kernel should have some central data structure describing all known devices, buses and their hierarchy. Basically, most of the device data structure is bus-dependent, but there is some generic info like pointer to bus_operations (a struct containing pointers to common operations like virt_to_bus which need a bus-dependent implementation), a list of resources assigned to the device and a reference to the driver attached (if any; newly written drivers will need no resource allocations themselves -- they just allocate the whole device). Drivers ~~~~~~~ Driver entry points and parameter passing should be unified -- no more differences between modules and drivers compiled in the kernel. Whenever possible, the probing for devices should be done by the generic code and the drivers should just list the ID's of devices they want to handle. Also, adding of devices to already running drivers should be possible, so that we won't need to load the same module three times when we want to use it for three devices. Each driver should provide an array of driver tags stored in a special ELF section which contain information about: o Initialization function. (Will be used mostly for ISA drivers, filesystems et cetera.) Called when the driver is loaded or initialized during bootup. o Cleanup function. Called when the driver is going to be unloaded. o Device ID tags: For each device supported, the driver specifies its ID pattern (bus-dependent; for PCI, it's just vendor ID, device ID, the subsystem ID's and class, 0xffff acting as a wildcard) and function to call when the generic probing code for that bus finds a device matching the pattern. The driver either accepts the device and starts driving it or rejects it and the probing code will continue searching through the tag list. [Structure and matching of ID patterns is of course bus-dependent.] o Parameters to be passed. Basically, a straightforward extension of the current module parameter system, but it will be supported for in-kernel drivers as well. o Driver descriptions and other texts -- will be omitted for in-kernel drivers, but left in case of modules. Each tag is assigned a priority which controls the order in which are tags with identical type processed. For most tags, this priority will be zero, but for example ISA cards require ordering of their autoprobe routines. Devices having non-standard probing requirements can of course skip all the tag machinery and just list their initialization functions which will take care of everything. /proc ~~~~~ Procfs will contain a subtree depicting the device hierarchy. For each device there will be a directory with generic information files (device ID, resources allocated, driver attached etc.), bus dependent files (configuration registers for PCI, USB descriptors etc.), driver dependent files (possibly also special files for character or block devices generated by the driver) and finally subdirectories for devices connected to this device. One of possible ways to do this is to have all the buses in the root (the hierarchy of buses will be shown only by symlinks and in case there is only one USB bus, it will be always accesible as usb0, not depending on what controller it's connected to). Hierarchical levels inside one bus (e.g., PCI buses connected to bridges and so on) won't be collapsed this way, so that we will get: host/i8042/kbd0 -> ../../../kbd0 First keyboard port at i8042 host/i8042/kbd1 -> ../../../kbd1 Second keyboard port at i8042 (the AUX port) pci0/03.0/usb0 -> ../../usb0 The USB controller with its bus #0 known as usb0 pci0/04.0/01.2/usb0 -> ../../../usb1 USB controller behind PCI-to-PCI bridge with its bus #0 known as usb1 pci0/09.0/pcmcia0 -> ../../pcmcia0 PCMCIA controller isa0/03E8 Serial port on ISA isa0/03E8/kbd0 -> ../../../kbd2 Third keyboard port emulated on serial controller usb0/0 A USB device usb0/1/2 A USB device behind a hub usb1/0 Another USB device on different USB bus Resource management ~~~~~~~~~~~~~~~~~~~ We need a better resource manager which will not only prevent drivers from clashing with each other (today we do it for I/O only anyway), but also will be able to assign free address space regions to newly inserted cards. This includes keeping a list of assigned I/O addresses, memory blocks, IRQs and DMA channels. Since different buses and different architectures have their specific requirements for region alignment and ranges, we should make resource allocation one of the bus_ops and either do it directly in the bus-dependent code or pass it to the parent bus (possibly chaning range, alignment and flags), stopping at the host bus level where it will be handled by arch-dependent code. Unfortunately, this probably requires calling the PNP BIOS to get all the regions magically occupied by motherboard hardware :-( Naming of devices ~~~~~~~~~~~~~~~~~ Naming of devices is a hard issue and (as shown during recent talks at the linux-usb list) no solution is correct -- some of them are too complex for simple workstation, some fail on large servers with lots of devices being swapped in mysterious ways. Therefore it should be a matter of user choice and as such a userland issue. On the other hand, the kernel should provide some support for userland device naming programs -- an ideal form is the bus tree in /proc introduced above plus some information about relationship between the devices and their special files (e.g. by including the special files in the bus tree as well). Traditional special files in /dev still exist for backward compatibility and simple setups, but the user has a possibility to name everything in his own way. As he has all the information about device identity, connections etc., he can name mice according to their connection, SCSI disks according to their serial numbers and /dev/lp be simply "the only true printer". HotPlug ~~~~~~~ With the device architecture outlined above, hotplug support should be close to trivial: (1) Bus-dependent code detects plugging of a new device. It reads the device headers, creates a device structure, calls the resource manager to assign addresses and adds the device to the hiararchy by notifing the device layer. (2) Generic device handling code scans tags of all loaded drivers and if it finds a driver, it just passes the device to the driver and everything is done. [We can also use kmod to notify userland about new device appearing which needs to be named.] (3) If no driver matches and kmod is enabled, call modprobe to find a module driving our new device (it has enough information to find it as depmod is aware of the device tags and extracts a ID -> module mapping from all modules). The module is inserted, initialized and passed the device it should handle. [time passes, water in the river flows and the user finally decides to unplug the device.] (4) The bus-dependent code receives an unplug notification and sends it to the driver. The driver releases the device, bus-dependent code removes it and deallocates all of its resources. Done. Host Bus ~~~~~~~~ The Host Bus is a virtual bus loosely corresponding to devices on the motherboard which are not connected to any other bus. Essentially, all the host bridges, system timers, keyboard controllers and similar strange creatures live here. ISA ~~~ ISA doesn't fit too well to our framework since the devices have no IDs, but we can use base addresses instead of IDs. Also, request_region et cetera can be modified to automatically create device nodes in case of old drivers. ISAPnP ~~~~~~ ISAPnP is probably much too complex to be fully handled in the kernel -- assigning the right addresses to all PnP devices is a very hard task and it's probably NP-complete, so the addresses should be assigned by a userspace program controlled by a configuration file. On the other hand, the kernel should know of ISAPnP devices in order to make it possible for drivers to find their devices and determine their addresses. We can do it this way: ISAPnP is a separate bus in our hierarchy, the kernel is able to enumerate the devicesm, read their addresses, create device nodes for them (so that the drivers know everything) and export PnP register interface to userland (similarly as we currently do PCI configuration registers). During bootup, the kernel will be able to start using devices which have been initialized by the BIOS (so that it will be possible to boot off a PnP SCSI card) and then a userland utility will be run which will configure the rest, notify the kernel about the changes, the kernel will rescan the addresses and announce the new devices to the drivers as if they were just plugged in. PCI ~~~ PCI can be adapted to this approach very easily and in a backward-compatible way. Outside of PCI subsystem implementation details, the drivers never need to know bus and device numbers -- they can just use the pci_dev structure as a opaque handle representing the device and pass it to generic PCI services if they want to read/write configuration registers (we already use this approach anyway). PCMCIA and CardBus ~~~~~~~~~~~~~~~~~~ I believe both PCMCIA and CardBus should be treated as separate hotplug capable buses, but as I'm no PCMCIA guru, I'd like David Hinds to tell me his opinion on this. Device access ~~~~~~~~~~~~~ All addresses in device nodes should be expressed as physical addresses, i.e. those accepted by ioremap(), /dev/mem and similar interfaces. Each bus should also define phys_to_bus and bus_to_phys translation functions, because address translating differs from bus to bus even on one architecture. Also a separate functions for requesting address translation before doing a DMA and freeing it after doing a DMA should be created to handle IOMMUs correctly. readb() and similar functions should no longer accept and automatically convert ISA addresses. Most drivers should be converted to use ioremap() properly and if it's too hard, isa_readb() introduced and used instead of readb() in non-converted drivers. readl() et al. should also exist in variants directly specifying endianity of target data. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/