From: Keith Owens <kaos@ocs.com.au> To: kbuild-devel@lists.sourceforge.net Subject: [kbuild-devel] Database design for kbuild 2.5 Date: Tue, 01 Jan 2002 14:32:15 +1100 This is a first cut at a single database to hold all the kbuild 2.5 data. Current kbuild 2.5 uses lots of files which is proving to be a bottleneck, especially when every compile step has to read the main list in order to work out where each dependency is and what the config timestamps are set to. The theory is that a single, shared, mmapped database will be faster. We shall see. This file describes the data needed by kbuild 2.5 in a clean design. The current code implements this design but not in a clean fashion, the code and data structures "just grew" as more and more features were added to the mini language. Think of this as "the file to read when Keith gets run over by a bus". The current code uses a single overloaded structure to represent directories, source and target files, this schema separates them. This is only a rough schema and omits most of the implementation details of how the records will be represented in the database. Assume that the database supports unlimited size, variable length records with pointers between records that never become invalid as new records are added. How the large records are represented and how the pointers between records are maintained is a implementation detail which does not affect this schema. For my own sanity (?), only one variable length field can appear in each record and it must be at the end. Variable size arrays or filenames must be last, a record cannot contain multiple variable sized arrays, they must be in separate records. '->' indicates a pointer to another record. Base kbuild 2.5 has the concept of a base target, it is the one that the following commands apply to. The default base target is 'vmlinux', other base targets are things like bootloaders. There is also a reserved base target called 'host' which is used for host compile and link. The use of base targets means that the shorthand commands can be used for almost everything and you get automatic dependency tracking for free. kbuild 2.4 never did correct dependency tracking on bootloaders and host compiles. Record: Number of bases. [] struct { flags has_modules has_arch_head is_host -> topdir -> objects (target list) -> modules (target list) } base Every architecture has at least 2 bases, 'vmlinux' and 'host', it can have more, see the base_target() command. topdir is the first directory in which the base is referenced. objects is the list of objects to be built into the base, in the required order. modules is the list of modules to be compiled for the base, only valid if has_modules is set. Directory The name of a directory, relative to the logical root of the source or object tree. Record: -> parent flags create_objdir (from create_objdir()) link_subdirs (from link_subdirs()) -> extra_aflags (from extra_aflags_all()) -> extra_cflags (from extra_cflags_all()) -> extra_ldflags (from extra_ldflags_all()) -> child list -> target list -> dir_base list relname The child list is the files and directories immediately below this directory, used for directory traversal. target list is all the target files to be created when you say make directory_name. dir_base gives all the bases that appear in this directory and, for each base, lists all the objects and sub-directories to be linked into that base, mainly to define how vmlinux is linked. relname is the relative name of this directory, from its parent. Dir_base Lists one base target referenced in a directory and the list of objects and/or sub-directories that are to be linked into that base, in the required order. The object/directory list is constructed from select(y) and link_subdirs() commands, host* commands are equivalent to selecting against base 'host'. Record: Number of bases referred to in the directory. ] struct { -> base -> object/directory list } dir_base Target file A target file is any generated file, from the smallest object or module all the way up to vmlinux, it also includes generated files such as asm-offsets.h. There is a lot of detail required for a target file. Record: -> parent directory index (always points to object tree) mtime flags nodepend (from nodepend() command) setup (from setup() command) expsyms (from expsyms() command) assembler (.S file) archive (.a file) gen_i (an explicitly requested .i file) gen_s (an explicitly requested .s file) commands (the user has supplied commands to build this file) archive_member (object is linked into an archive) arch_head (arch_head files have special order requirements) selm (summary of select(m) over all bases) sely (summary of select(y) over all bases) rebuild (the target must be rebuilt) -> target_base_select list (see below) -> dependency list (all the input files the target depends on) -> objlink list (from objlink(), only for conglomerates and archives) -> command (the last command used to successfully build the target) -> strip_aflags (from strip_aflags()) -> strip_cflags (from strip_cflags()) -> strip_ldflags (from strip_ldflags()) -> extra_aflags (from extra_aflags()) -> extra_cflags (from extra_cflags()) -> extra_ldflags (from extra_ldflags()) -> link (the conglomerate or archive this object is linked into, if any). relname Target_base_select list A list of bases that have selected this target and, for each base, whether it was selected as y or m. Some files are used by both vmlinux and the bootloader so they can appear in multiple bases. host* commands are equivalent to selecting against base 'host'. When an object is linked into a conglomerate or an archive then the select status for the conglomerate or archive is propagated onto the individual objects, via the objlink list. Record: Number of bases referred to by select commands for this target. [] struct { -> base flags selm sely } target_base_select Target list A list of target files, used to represent the objects and modules in a base and the list of targets to be built in a directory. Record: Number of targets. [] -> target file Command, strip_aflags, strip_cflags, strip_ldflags, extra_aflags, extra_cflags, extra_ldflags. Record: Variable length text string. Source file Record: -> parent directory index (which source/object tree it is in) mtime flags changed -> component list (see below) -> config list (config options that the source refers to) relname Shadow trees allow .prepend and .append so a source file can be constructed from multiple components. The component list points to all the individual components, in the order that they are copied to synthesize the final source file. Child list Record: Number of children. [] -> child Each child entry can point to a directory, a source file or a target file, in any order. Component, config and dependency lists These records have the same structure, they differ in what they should be pointing to. Record: parent index parent mtime Number of entries in list. [] struct { -> source (component), target (config), target (dependency) index mtime } ccd These records record information about the previous successful build and are used to determine if the parent or the list itself needs to be rebuilt. If the parent mtime and index on this build do not match the parent data saved on the last successful build then the parent must be rebuilt. If any of the entries in the array have changed their index or times then the parent must be rebuilt. Rebuilding the parent will regenerate these lists. The actual processing is something like this :- Delete index, mtime, target_base_select, objlink, strip flags, extra flags and link data from all target records. The dependency list and previous command are kept for targets. Delete extra flags, target list and dir_base from all directory records. The child list is kept for directories. Clear all flags in target and directory records. Find all the files, extracting their index (the source or object tree they are in) and their mtime, store this information in the database. Delete all data for source and target files that no longer exist. Retain empty target records as a placeholder, they will probably be recreated. Delete empty directories from the database. New sources are marked as changed, new targets are marked as rebuild. Some source files are constructed from components (.prepend, .append), temporary component lists are built while finding all the files. Compare the existing and temporary component lists for such files. If the constructed source file has changed or the temporary component list does not match the current component list then create a new version of the source file in the object directory. Mark the source file record as changed and update the list of components used. Generate the global makefile from the Makefile.in files, without including the command checks. Run the global makefile through make to build the list of what has been selected, flags etc. Read the select data, flags etc. generated by make and load it into the database. After reconstructing any source files from their components, the source files must be scanned for any config changes since the last build. If the source file has changed since the last build then skip the config check. Otherwise run the config list, the index and mtime entry for each entry in the config list (set on last successful build) is compared against the current index and mtime for that config option. Any difference will cause the source entry to be marked as changed. If the source file is marked as changed by any process, purge the config list, it must be rebuilt. Repeatedly scan the database until no more changes occur. You have now identified all sources that have been directly (user) or indirectly (config) changed. Every target file generated from source has at least one dependency, the source file used to create it, it also depends on all the included files, either directly or indirectly. A conglomerate object or archive depends on the individual objects linked into it. When a target file is being considered, look at the dependency list. If the target has changed index or mtime since the last build (manual fiddling with the target) then mark it as rebuild. If any of the dependencies are marked as changed or rebuild then mark the target as rebuild. Targets that are marked as rebuild are removed from the object directory to force 'make' to recreate the file, in addition their mtime, command and dependency data are cleared in the database. Repeatedly scan the database until no more changes occur. You have now identified all targets that need to be rebuilt. All of the above work is done at the start of the build, it identifies everything affected by source file and config changes and removes the affected targets. After removing out of date targets and identifying what has been selected and the flags etc. to be used, generate the final global makefile, including the checks on changes to the build command. Run make against the global makefile to compile the targets. Each compile step is run by a pre-processor wrapper. While gcc is compiling the code, the pre-processor reads the source in parallel with gcc and extracts config data, creating a new config list for the target. After each successful compile step, the dependency list generated by gcc -MD is read, standardized and copied into the database for the target file. Each source dependency is checked in the database to see if it is up to date, any dependency that is marked as changed is then scanned for config entries and the config list in the dependency record is updated accordingly. The first successful compile that references a dependency will update the config list for that source file and clear the changed flag, later compiles will see that the dependency is up to date and do nothing. The command used to build a target is standardized and stored in the database. The saved command is used to force a rebuild if any of the flags have changed, including run time overrides. The current kbuild 2.5 does the above processing but it is not so clear (assuming that the above explanation was clear). Current kbuild 2.5 uses lots of little files to hold the target to input dependencies and to hold the source to config mappings. These little files are read back during phase 5 which is quite slow, by updating the database after each successful build we remove the need for phase 5. Using a shared database _should_ remove the main bottleneck where every compile step has to read the entire file list and index it before it can be used. _______________________________________________ kbuild-devel mailing list kbuild-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kbuild-devel