![[LWN Logo]](/images/lcorner.png) |
|
![[LWN.net]](/images/Included.png) |
From: Keith Owens <kaos@ocs.com.au>
To: kbuild-devel@lists.sourceforge.net
Subject: [kbuild-devel] Database design for kbuild 2.5
Date: Tue, 01 Jan 2002 14:32:15 +1100
This is a first cut at a single database to hold all the kbuild 2.5
data. Current kbuild 2.5 uses lots of files which is proving to be a
bottleneck, especially when every compile step has to read the main
list in order to work out where each dependency is and what the config
timestamps are set to. The theory is that a single, shared, mmapped
database will be faster. We shall see.
This file describes the data needed by kbuild 2.5 in a clean design.
The current code implements this design but not in a clean fashion, the
code and data structures "just grew" as more and more features were
added to the mini language. Think of this as "the file to read when
Keith gets run over by a bus". The current code uses a single
overloaded structure to represent directories, source and target files,
this schema separates them.
This is only a rough schema and omits most of the implementation
details of how the records will be represented in the database. Assume
that the database supports unlimited size, variable length records with
pointers between records that never become invalid as new records are
added. How the large records are represented and how the pointers
between records are maintained is a implementation detail which does
not affect this schema.
For my own sanity (?), only one variable length field can appear in
each record and it must be at the end. Variable size arrays or
filenames must be last, a record cannot contain multiple variable sized
arrays, they must be in separate records. '->' indicates a pointer to
another record.
Base
kbuild 2.5 has the concept of a base target, it is the one that the
following commands apply to. The default base target is 'vmlinux',
other base targets are things like bootloaders. There is also a
reserved base target called 'host' which is used for host compile and
link. The use of base targets means that the shorthand commands can
be used for almost everything and you get automatic dependency
tracking for free. kbuild 2.4 never did correct dependency tracking
on bootloaders and host compiles.
Record:
Number of bases.
[] struct {
flags
has_modules
has_arch_head
is_host
-> topdir
-> objects (target list)
-> modules (target list)
} base
Every architecture has at least 2 bases, 'vmlinux' and 'host', it can
have more, see the base_target() command. topdir is the first
directory in which the base is referenced. objects is the list of
objects to be built into the base, in the required order. modules is
the list of modules to be compiled for the base, only valid if
has_modules is set.
Directory
The name of a directory, relative to the logical root of the source
or object tree.
Record:
-> parent
flags
create_objdir (from create_objdir())
link_subdirs (from link_subdirs())
-> extra_aflags (from extra_aflags_all())
-> extra_cflags (from extra_cflags_all())
-> extra_ldflags (from extra_ldflags_all())
-> child list
-> target list
-> dir_base list
relname
The child list is the files and directories immediately below this
directory, used for directory traversal. target list is all the
target files to be created when you say make directory_name.
dir_base gives all the bases that appear in this directory and, for
each base, lists all the objects and sub-directories to be linked
into that base, mainly to define how vmlinux is linked. relname is
the relative name of this directory, from its parent.
Dir_base
Lists one base target referenced in a directory and the list of
objects and/or sub-directories that are to be linked into that base,
in the required order. The object/directory list is constructed from
select(y) and link_subdirs() commands, host* commands are equivalent
to selecting against base 'host'.
Record:
Number of bases referred to in the directory.
] struct {
-> base
-> object/directory list
} dir_base
Target file
A target file is any generated file, from the smallest object or
module all the way up to vmlinux, it also includes generated files
such as asm-offsets.h. There is a lot of detail required for a
target file.
Record:
-> parent directory
index (always points to object tree)
mtime
flags
nodepend (from nodepend() command)
setup (from setup() command)
expsyms (from expsyms() command)
assembler (.S file)
archive (.a file)
gen_i (an explicitly requested .i file)
gen_s (an explicitly requested .s file)
commands (the user has supplied commands to build this file)
archive_member (object is linked into an archive)
arch_head (arch_head files have special order requirements)
selm (summary of select(m) over all bases)
sely (summary of select(y) over all bases)
rebuild (the target must be rebuilt)
-> target_base_select list (see below)
-> dependency list (all the input files the target depends on)
-> objlink list (from objlink(), only for conglomerates and archives)
-> command (the last command used to successfully build the target)
-> strip_aflags (from strip_aflags())
-> strip_cflags (from strip_cflags())
-> strip_ldflags (from strip_ldflags())
-> extra_aflags (from extra_aflags())
-> extra_cflags (from extra_cflags())
-> extra_ldflags (from extra_ldflags())
-> link (the conglomerate or archive this object is linked into, if any).
relname
Target_base_select list
A list of bases that have selected this target and, for each base,
whether it was selected as y or m. Some files are used by both
vmlinux and the bootloader so they can appear in multiple bases.
host* commands are equivalent to selecting against base 'host'. When
an object is linked into a conglomerate or an archive then the select
status for the conglomerate or archive is propagated onto the
individual objects, via the objlink list.
Record:
Number of bases referred to by select commands for this target.
[] struct {
-> base
flags
selm
sely
} target_base_select
Target list
A list of target files, used to represent the objects and modules in
a base and the list of targets to be built in a directory.
Record:
Number of targets.
[] -> target file
Command, strip_aflags, strip_cflags, strip_ldflags, extra_aflags,
extra_cflags, extra_ldflags.
Record:
Variable length text string.
Source file
Record:
-> parent directory
index (which source/object tree it is in)
mtime
flags
changed
-> component list (see below)
-> config list (config options that the source refers to)
relname
Shadow trees allow .prepend and .append so a source file can be
constructed from multiple components. The component list points to
all the individual components, in the order that they are copied to
synthesize the final source file.
Child list
Record:
Number of children.
[] -> child
Each child entry can point to a directory, a source file or a target
file, in any order.
Component, config and dependency lists
These records have the same structure, they differ in what they
should be pointing to.
Record:
parent index
parent mtime
Number of entries in list.
[] struct {
-> source (component), target (config), target (dependency)
index
mtime
} ccd
These records record information about the previous successful build
and are used to determine if the parent or the list itself needs to
be rebuilt. If the parent mtime and index on this build do not match
the parent data saved on the last successful build then the parent
must be rebuilt. If any of the entries in the array have changed
their index or times then the parent must be rebuilt. Rebuilding the
parent will regenerate these lists.
The actual processing is something like this :-
Delete index, mtime, target_base_select, objlink, strip flags, extra
flags and link data from all target records. The dependency list and
previous command are kept for targets. Delete extra flags, target list
and dir_base from all directory records. The child list is kept for
directories. Clear all flags in target and directory records.
Find all the files, extracting their index (the source or object tree
they are in) and their mtime, store this information in the database.
Delete all data for source and target files that no longer exist.
Retain empty target records as a placeholder, they will probably be
recreated. Delete empty directories from the database. New sources
are marked as changed, new targets are marked as rebuild.
Some source files are constructed from components (.prepend, .append),
temporary component lists are built while finding all the files.
Compare the existing and temporary component lists for such files. If
the constructed source file has changed or the temporary component list
does not match the current component list then create a new version of
the source file in the object directory. Mark the source file record
as changed and update the list of components used.
Generate the global makefile from the Makefile.in files, without
including the command checks. Run the global makefile through make to
build the list of what has been selected, flags etc. Read the select
data, flags etc. generated by make and load it into the database.
After reconstructing any source files from their components, the source
files must be scanned for any config changes since the last build. If
the source file has changed since the last build then skip the config
check. Otherwise run the config list, the index and mtime entry for
each entry in the config list (set on last successful build) is
compared against the current index and mtime for that config option.
Any difference will cause the source entry to be marked as changed. If
the source file is marked as changed by any process, purge the config
list, it must be rebuilt. Repeatedly scan the database until no more
changes occur. You have now identified all sources that have been
directly (user) or indirectly (config) changed.
Every target file generated from source has at least one dependency,
the source file used to create it, it also depends on all the included
files, either directly or indirectly. A conglomerate object or archive
depends on the individual objects linked into it. When a target file
is being considered, look at the dependency list. If the target has
changed index or mtime since the last build (manual fiddling with the
target) then mark it as rebuild. If any of the dependencies are marked
as changed or rebuild then mark the target as rebuild. Targets that
are marked as rebuild are removed from the object directory to force
'make' to recreate the file, in addition their mtime, command and
dependency data are cleared in the database. Repeatedly scan the
database until no more changes occur. You have now identified all
targets that need to be rebuilt.
All of the above work is done at the start of the build, it identifies
everything affected by source file and config changes and removes the
affected targets. After removing out of date targets and identifying
what has been selected and the flags etc. to be used, generate the
final global makefile, including the checks on changes to the build
command. Run make against the global makefile to compile the targets.
Each compile step is run by a pre-processor wrapper. While gcc is
compiling the code, the pre-processor reads the source in parallel with
gcc and extracts config data, creating a new config list for the
target. After each successful compile step, the dependency list
generated by gcc -MD is read, standardized and copied into the database
for the target file.
Each source dependency is checked in the database to see if it is up to
date, any dependency that is marked as changed is then scanned for
config entries and the config list in the dependency record is updated
accordingly. The first successful compile that references a dependency
will update the config list for that source file and clear the changed
flag, later compiles will see that the dependency is up to date and do
nothing.
The command used to build a target is standardized and stored in the
database. The saved command is used to force a rebuild if any of the
flags have changed, including run time overrides.
The current kbuild 2.5 does the above processing but it is not so clear
(assuming that the above explanation was clear). Current kbuild 2.5
uses lots of little files to hold the target to input dependencies and
to hold the source to config mappings. These little files are read
back during phase 5 which is quite slow, by updating the database after
each successful build we remove the need for phase 5. Using a shared
database _should_ remove the main bottleneck where every compile step
has to read the entire file list and index it before it can be used.
_______________________________________________
kbuild-devel mailing list
kbuild-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kbuild-devel