[kbuild-devel] Database design for kbuild 2.5

From:	 Keith Owens <kaos@ocs.com.au>
To:	 kbuild-devel@lists.sourceforge.net
Subject: [kbuild-devel] Database design for kbuild 2.5
Date:	 Tue, 01 Jan 2002 14:32:15 +1100

This is a first cut at a single database to hold all the kbuild 2.5
data.  Current kbuild 2.5 uses lots of files which is proving to be a
bottleneck, especially when every compile step has to read the main
list in order to work out where each dependency is and what the config
timestamps are set to.  The theory is that a single, shared, mmapped
database will be faster.  We shall see.

This file describes the data needed by kbuild 2.5 in a clean design.
The current code implements this design but not in a clean fashion, the
code and data structures "just grew" as more and more features were
added to the mini language.  Think of this as "the file to read when
Keith gets run over by a bus".  The current code uses a single
overloaded structure to represent directories, source and target files,
this schema separates them.

This is only a rough schema and omits most of the implementation
details of how the records will be represented in the database.  Assume
that the database supports unlimited size, variable length records with
pointers between records that never become invalid as new records are
added.  How the large records are represented and how the pointers
between records are maintained is a implementation detail which does
not affect this schema.

For my own sanity (?), only one variable length field can appear in
each record and it must be at the end.  Variable size arrays or
filenames must be last, a record cannot contain multiple variable sized
arrays, they must be in separate records.  '->' indicates a pointer to
another record.


Base

  kbuild 2.5 has the concept of a base target, it is the one that the
  following commands apply to.  The default base target is 'vmlinux',
  other base targets are things like bootloaders.  There is also a
  reserved base target called 'host' which is used for host compile and
  link.  The use of base targets means that the shorthand commands can
  be used for almost everything and you get automatic dependency
  tracking for free.  kbuild 2.4 never did correct dependency tracking
  on bootloaders and host compiles.

  Record:

    Number of bases.
    [] struct {
	flags
	    has_modules
	    has_arch_head
	    is_host
	-> topdir
	-> objects (target list)
	-> modules (target list)
    } base

  Every architecture has at least 2 bases, 'vmlinux' and 'host', it can
  have more, see the base_target() command.  topdir is the first
  directory in which the base is referenced.  objects is the list of
  objects to be built into the base, in the required order.  modules is
  the list of modules to be compiled for the base, only valid if
  has_modules is set.


Directory

  The name of a directory, relative to the logical root of the source
  or object tree.

  Record:

    -> parent
    flags
	create_objdir (from create_objdir())
	link_subdirs (from link_subdirs())
    -> extra_aflags (from extra_aflags_all())
    -> extra_cflags (from extra_cflags_all())
    -> extra_ldflags (from extra_ldflags_all())
    -> child list
    -> target list
    -> dir_base list
    relname

  The child list is the files and directories immediately below this
  directory, used for directory traversal.  target list is all the
  target files to be created when you say make directory_name.
  dir_base gives all the bases that appear in this directory and, for
  each base, lists all the objects and sub-directories to be linked
  into that base, mainly to define how vmlinux is linked.  relname is
  the relative name of this directory, from its parent.


Dir_base

  Lists one base target referenced in a directory and the list of
  objects and/or sub-directories that are to be linked into that base,
  in the required order.  The object/directory list is constructed from
  select(y) and link_subdirs() commands, host* commands are equivalent
  to selecting against base 'host'.

  Record:

    Number of bases referred to in the directory.
    ] struct {
      -> base
      -> object/directory list
    } dir_base


Target file

  A target file is any generated file, from the smallest object or
  module all the way up to vmlinux, it also includes generated files
  such as asm-offsets.h.  There is a lot of detail required for a
  target file.

  Record:

    -> parent directory
    index (always points to object tree)
    mtime
    flags
	nodepend (from nodepend() command)
	setup (from setup() command)
	expsyms (from expsyms() command)
	assembler (.S file)
	archive (.a file)
	gen_i (an explicitly requested .i file)
	gen_s (an explicitly requested .s file)
	commands (the user has supplied commands to build this file)
	archive_member (object is linked into an archive)
	arch_head (arch_head files have special order requirements)
	selm (summary of select(m) over all bases)
	sely (summary of select(y) over all bases)
	rebuild (the target must be rebuilt)
    -> target_base_select list (see below)
    -> dependency list (all the input files the target depends on)
    -> objlink list (from objlink(), only for conglomerates and archives)
    -> command (the last command used to successfully build the target)
    -> strip_aflags (from strip_aflags())
    -> strip_cflags (from strip_cflags())
    -> strip_ldflags (from strip_ldflags())
    -> extra_aflags (from extra_aflags())
    -> extra_cflags (from extra_cflags())
    -> extra_ldflags (from extra_ldflags())
    -> link (the conglomerate or archive this object is linked into, if any).
    relname


Target_base_select list

  A list of bases that have selected this target and, for each base,
  whether it was selected as y or m.  Some files are used by both
  vmlinux and the bootloader so they can appear in multiple bases.
  host* commands are equivalent to selecting against base 'host'.  When
  an object is linked into a conglomerate or an archive then the select
  status for the conglomerate or archive is propagated onto the
  individual objects, via the objlink list.

  Record:

    Number of bases referred to by select commands for this target.
    [] struct {
	-> base
	flags
	    selm
	    sely
    } target_base_select


Target list

  A list of target files, used to represent the objects and modules in
  a base and the list of targets to be built in a directory.

  Record:

    Number of targets.
    [] -> target file


Command, strip_aflags, strip_cflags, strip_ldflags, extra_aflags,
extra_cflags, extra_ldflags.

  Record:

    Variable length text string.


Source file

  Record:

    -> parent directory
    index (which source/object tree it is in)
    mtime
    flags
	changed
    -> component list (see below)
    -> config list (config options that the source refers to)
    relname

  Shadow trees allow .prepend and .append so a source file can be
  constructed from multiple components.  The component list points to
  all the individual components, in the order that they are copied to
  synthesize the final source file.


Child list

  Record:

    Number of children.
    [] -> child

  Each child entry can point to a directory, a source file or a target
  file, in any order.


Component, config and dependency lists

  These records have the same structure, they differ in what they
  should be pointing to.

  Record:

    parent index
    parent mtime
    Number of entries in list.
    [] struct {
        -> source (component), target (config), target (dependency)
	index
	mtime
    } ccd

  These records record information about the previous successful build
  and are used to determine if the parent or the list itself needs to
  be rebuilt.  If the parent mtime and index on this build do not match
  the parent data saved on the last successful build then the parent
  must be rebuilt.  If any of the entries in the array have changed
  their index or times then the parent must be rebuilt.  Rebuilding the
  parent will regenerate these lists.


The actual processing is something like this :-

Delete index, mtime, target_base_select, objlink, strip flags, extra
flags and link data from all target records.  The dependency list and
previous command are kept for targets.  Delete extra flags, target list
and dir_base from all directory records.  The child list is kept for
directories.  Clear all flags in target and directory records.

Find all the files, extracting their index (the source or object tree
they are in) and their mtime, store this information in the database.
Delete all data for source and target files that no longer exist.
Retain empty target records as a placeholder, they will probably be
recreated.  Delete empty directories from the database.  New sources
are marked as changed, new targets are marked as rebuild.

Some source files are constructed from components (.prepend, .append),
temporary component lists are built while finding all the files.
Compare the existing and temporary component lists for such files.  If
the constructed source file has changed or the temporary component list
does not match the current component list then create a new version of
the source file in the object directory.  Mark the source file record
as changed and update the list of components used.

Generate the global makefile from the Makefile.in files, without
including the command checks.  Run the global makefile through make to
build the list of what has been selected, flags etc.  Read the select
data, flags etc.  generated by make and load it into the database.

After reconstructing any source files from their components, the source
files must be scanned for any config changes since the last build.  If
the source file has changed since the last build then skip the config
check.  Otherwise run the config list, the index and mtime entry for
each entry in the config list (set on last successful build) is
compared against the current index and mtime for that config option.
Any difference will cause the source entry to be marked as changed.  If
the source file is marked as changed by any process, purge the config
list, it must be rebuilt.  Repeatedly scan the database until no more
changes occur.  You have now identified all sources that have been
directly (user) or indirectly (config) changed.

Every target file generated from source has at least one dependency,
the source file used to create it, it also depends on all the included
files, either directly or indirectly.  A conglomerate object or archive
depends on the individual objects linked into it.  When a target file
is being considered, look at the dependency list.  If the target has
changed index or mtime since the last build (manual fiddling with the
target) then mark it as rebuild.  If any of the dependencies are marked
as changed or rebuild then mark the target as rebuild.  Targets that
are marked as rebuild are removed from the object directory to force
'make' to recreate the file, in addition their mtime, command and
dependency data are cleared in the database.  Repeatedly scan the
database until no more changes occur.  You have now identified all
targets that need to be rebuilt.

All of the above work is done at the start of the build, it identifies
everything affected by source file and config changes and removes the
affected targets.  After removing out of date targets and identifying
what has been selected and the flags etc. to be used, generate the
final global makefile, including the checks on changes to the build
command.  Run make against the global makefile to compile the targets.

Each compile step is run by a pre-processor wrapper.  While gcc is
compiling the code, the pre-processor reads the source in parallel with
gcc and extracts config data, creating a new config list for the
target.  After each successful compile step, the dependency list
generated by gcc -MD is read, standardized and copied into the database
for the target file.

Each source dependency is checked in the database to see if it is up to
date, any dependency that is marked as changed is then scanned for
config entries and the config list in the dependency record is updated
accordingly.  The first successful compile that references a dependency
will update the config list for that source file and clear the changed
flag, later compiles will see that the dependency is up to date and do
nothing.

The command used to build a target is standardized and stored in the
database.  The saved command is used to force a rebuild if any of the
flags have changed, including run time overrides.


The current kbuild 2.5 does the above processing but it is not so clear
(assuming that the above explanation was clear).  Current kbuild 2.5
uses lots of little files to hold the target to input dependencies and
to hold the source to config mappings.  These little files are read
back during phase 5 which is quite slow, by updating the database after
each successful build we remove the need for phase 5.  Using a shared
database _should_ remove the main bottleneck where every compile step
has to read the entire file list and index it before it can be used.


_______________________________________________
kbuild-devel mailing list
kbuild-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kbuild-devel