a/aegis

To:	linux-kernel@vger.rutgers.edu
Subject: Proposal: Aegis to manage Linux kernel development
From:	Peter Miller <millerp@canb.auug.org.au>
Date:	Fri, 26 Mar 1999 12:27:53 +0100

Purpose of this Posting

	Recently there have been discussions about how to manage the
	Linux kernel sources, rapidly side-tracking into how CVS isn't
	sufficiency capable to do the job.  These discussions appear in
	numerous places on the Internet, and have even appeared in more
	public forums, such as the recent Linux Expo.

	I would like to suggest a candidate for serious consideration:

		Aegis

	This post is rather long, and I apologize in advance if you feel
	this topic is an inappropriate use of linux-kernel bandwidth.
	While it is a "meta" issue, about management of the kernel
	sources, rather than about the kernel itself, no other forum
	would appear more appropriate.

Summary for the Impatient

	Source management is not enough.  The Linux kernel is more than
	the aggregation of its source files.  A tool which supports the
	the software development process for large teams is required.

	Aegis supports large teams and large projects.	Aegis is designed
	around change sets.  Aegis is designed around repository security
	(availability, integrity and confidentiality).	Aegis' distributed
	development uses this existing mature functionality to keep two
	or more repositories synchronized.

	Aegis supports multiple repositories, multiple lines of
	development, multiple distributed copies of repositories,
	disconnected operation, and is security conscious.

	Aegis is licensed under the GNU GPL.

	Aegis is mature software.  It is 8 years old.  It has users all
	around the world.  It is actively being maintained and enhanced.

	Aegis is easy to use.  Is -is- big, it -does- have a lot of
	functionality, but the essential process can be learned in less
	than a day.

	Aegis is available from

		http://www.canb.auug.org.au/~millerp/aegis/

	Please download it, plus one of the template projects, to get a
	feel for the environment.  If you would like more information,
	there is also a Reference Manual and User Guide available from
	the same place.

Source Management is not sufficient

	In looking for a better way to manage the Linux kernel sources,
	it is necessary to look beyond the obvious and perennial file
	bashing, to see if there could be a larger picture.

	In writing software, there is one basic underlying activity,
	repeated again and again:

		edit, build, test, check, commit

	Different textbooks and tools will call the various steps
	different things, like

		edit, make, Unit Test, Peer Review, check-in

	and for single-person projects, some of these steps are so
	abbreviated as to be almost invisible, especially when you simply
	jump in and edit the files in the master source directly.

	And the activities are rarely so pure, usually there are
	iterations and backtracking, which also serves to obscure the
	underlying commonality of software development.  The review step,
	in particular, often moves around a great deal.

	For the maintainer of an Internet project, the activities are
	remarkably similar:

		edit: apply an incoming patch,
		build it (also serves to make sure it is consistent with
			itself and the rest of the project),
		test: make sure it works (does the thing right),
		review: make sure it is appropriate (does the right thing),
		commit: yes, I'll accept this

	The term ``source management'' carries with it a focus on the
	source files, but the activities outlined above only talk about
	files indirectly!  Source management alone is not enough.

	Tools like RCS and SCCS concentrate exclusively on single files.
	CVS also concentrates on files, but only at a slightly higher
	level.

Enter the Change Set

	One of the most obvious things about the software development
	process outlined above, is that it is about *sets* of files.
	You almost always edit several files to fix a bug or add a new
	feature, you then build them to stitch them together into the
	project, you test them as a set, if there is a review they will
	be reviewed as a set, and you commit them together.

	A project makes progress by applying a series of these change
	sets, so tracking them is the only way to re-create self-
	consistent prior versions of the project.

	Software developers, however, frequently work on several changes
	at once.  Figuring out where one change sets stops and another
	starts requires a modicum of discipline.  The fuzziness of the
	boundaries often serves to obscure the underlying presence of
	change sets.

	But are change sets enough?  Change sets are, after all, a way
	of aggregating the right versions of sets of *files*, and the
	software development process above only mentions change sets
	indirectly.

What Could be More than Change Sets?

	For many developers, even those working in large companies and in
	large teams, change sets are the best tool they have.  They work,
	day in and day out, with change sets.  And they get the job done.

	But take a look, for a moment, at what the project maintainer
	does:
		if the patch doesn't apply cleanly, don't accept it
		if the patch doesn't build, don't accept it
		if the patch doesn't test OK, don't accept it
		if the patch doesn't look right, don't accept it
		else commit

	Stepping back a bit, you will notice that these apply equally
	in work within a software house.  How often have we all seen
	stuff which was allowed to skip one of the validations, only to
	get yanked and re-fixed later?

	The next step in improving the development process is automating
	the tracking of these steps, to make sure each one has been done.
	Some tools merely beep at you if you skip a step, others make
	the validations mandatory before a commit may occur.  Mandatory
	things usually get developers riled up, and prevent introduction
	of the tools.

	But these validations are done for a purpose: they are there to
	catch stuff-ups *before* they reach the repository.  They exist
	to defend the quality of the product.  They are not arbitrary
	rules, they are just checking that we are doing the things we
	say we are doing already.

	The pay-back for such a tool is to detect such process blunders
	before they introduce defects into the project.  Fixing them
	before they are committed is less effort than fixing them after
	they are released (if we are to believe cumulative experience
	*and* the numerous studies).

	Let's look at the maintainers role again for a moment.	Those
	first 3 steps (patch, build, test) can be automated.  I would not
	suggest for a moment that the commit should be unconditional!
	Thus, the 4th step, the code review, is the essential work
	of the maintainer.  The pay-back of this is also clear - less
	mindless tedium for the maintainer.

What ELSE Could be More than Change Sets?

	Most folks are not convinced by any of this.  It's just a crock.
	They can do it perfectly well manually.  They *have* been doing
	it manually for a decade or more - with more flexibility, too!

	Working in a team comes with a number of costs.  The most
	obvious cost is that you need to manage the interactions between
	the developers.  It becomes rapidly obvious that they can't all
	just leap into the source tree and edit on the files directly,
	because pretty much instantly nothing compiles for anyone.
	And the change sets are obfuscated beyond redemption.

	That's what work areas are for - they've been re-invented
	thousands of times, and have been called zillions of different
	names (e.g. sand boxes), but they all do the same thing: Each
	developer gets their own work area, and they leave the master
	source alone.  They do all their work there, and only when they
	are ready to commit do files get modified in the master source.

	Notice the strong correlation between work areas and change
	sets?  Different tools make this correlation weaker or stronger,
	depending on what they are trying to achieve.  The basic concept,
	however, is that change sets have meaning even after the files
	are committed, whereas a work area is where change sets are
	created and reproduced.

	A tool which seeks to do more than just manage files, or
	even change sets, needs to address work areas, too.  This is
	particularly true when one of the validations (build, test or
	review) *fail*.  You don't want the master source polluted.

	Work areas are only half the story though.  Teams almost
	immediately lead to the next problem: file conflicts.  No matter
	how you implement file locking, at some point you have to merge
	the competing edits.  Different tools do this at different points
	in the software development process, but they all do it.

	The tool needs to track file versions in work areas, so you know
	if the file is up-to-date (if someone has committed a competing
	edit ahead of you).  This isn't a big problem, because change sets
	must record file versions anyway.  If the file isn't up-to-date,
	you need a 3-way merge to bring it up-to-date (and you have the
	3 versions - the one copied, the one in the work area, and the
	one most recently committed).  Most tools prevent commit from
	occurring if the file needs to be merged.  (You could prevent
	build and test, too, but that's a bit too officious - there are
	often good reasons for working with outdated sources.)

Software Configuration Management

	``Nuh, uh.  No way!  I've tried BarfCase and it always crashed /
	went far too slowly / harassed me.  Not going there!''

	This is a common reaction to tools which attempt to do more than
	baby-sit files.  On the whole, it's a very reasonable reaction,
	considering what some of them do to you and your system.

	However, SCM is the correct term (in the textbooks, anyway) for
	looking after the process and not just the files.  To look after
	more you need to actually track the progress of change sets as
	they work their way through the process.  Some tools are *very*
	invasive about this, and some are more subtle.

	There are things the SCM tool needs to know to do its job:

	* when a change set is created (this often implies the creation
	of a work area)

	* when a file is added to a change set, so the version can be
	recorded (this often implies a copy into the work area)

	* when files are created or deleted or renamed as part of a
	change set.

	* the results of building the change set (either for warnings
	or errors, if a commit is tried against a failed build).

	* the results of testing the change set (either for warnings or
	errors, if a commit is tried against a failed test).

	* the results of a review of the change sets (either for warnings
	or errors, if a commit is tried against a failed review).

	* when the change set is committed or abandoned (i.e. when it
	is finished)

	None of these things are new.  All of us are doing all of
	them already.  Sometimes, some of the steps are pretty short,
	but they are all there.

Distributed Development

	Once you have change sets, you have the basics of distributed
	development.  You can use their information about files and file
	versions to package them up and sling them across the net.

	But what do you do when you are the recipient of a change set?
	There is no way you are going to apply the damn thing to your
	repository sight unseen.  You are going to check it all the ways
	you can: you will build it, you will test it, you will review it,
	and maybe decide to commit it.  You need *process*.

	Even when you are working alone, when you are the only user on
	a single PC, participation in a distributed development project
	is a -team- activity, and you need an SCM tool which is designed
	for working in teams.  Source management alone is not enough.


Aegis

	Aegis is a software configuration management system.  It does
	all of the above and more besides, but it delegates as much
	as possible, so as to give you access to the other development
	tools you need...

	* the build step is watched, but what it does, and what tool
	you use to do it, is up to you.  Yes, you can use make.

	* file merges are watched, but what it does, and what tool you
	use to do it, is up to you.

	* the test step is watched, but what it does, and what tool you
	use to do it, is up to you.  It's also optional.

	* the review step is watched, but what it does, and what tool
	you use to do it, is up to you.

	* the commit step is watched, but what it does, and what tool
	you use to do it, is up to you.  Yes, you can use RCS.	Yes,
	you can use SCCS.

	Aegis does all this, but introduces a bare minimum of commands.
	Most of them perform functions developers are already intimately
	familiar with, and others with obvious purpose in a process like
	the one described above.  Some of them are described here:

	aenc (new change), aedb (develop begin) are used to create a
	change set, and create its work area.

	aecp (copy files) - analogous to RCS ``co'', used to copy
	files into the change set, and remember the version.

	aeb (build) - used to run the build tool of your choice, and
	wait for the exit status.

	aed (diff) - used to see the differences between the baseline
	and the change set.

	aede (develop end) - used to say the change set is ready for
	review.

	aerpass (review pass) - used to say a change set has passed review.

	The commands are different (e.g. aeb vs make, aecp vs co) but the
	activities are familiar.  Aegis is easy to use - believe it or
	not, you've just seen all of the *routine* commands necessary
	for a developer to submit a change (there are only a couple
	more routine commands for change set integrators, and they are
	often automated).

	One more command...  The aedist command is used to package change
	sets for sending, and unpackage them on receipt.

		aedist -send -change N | mail linus
	
	will take change set N and mail it somewhere.  Easy.  To apply
	it at the other end (I use MH in this example) you simply say

		show | aedist -receive
	
	The change set will unpacked into a separate work area, be built,
	and be tested (if tests enabled).  If the change set has no
	problems, it will then stop and wait for review.  Similar things
	can be done with aedist for web servers and clients.

Where to from Here?

	Can Aegis do the job?  I believe that it can, but you should
	not take my word for it!  Download a copy and start playing.
	Get a feel for it.  You can get Aegis from

		http://www.canb.auug.org.au/~millerp/aegis/

	If you would like to read some manuals, there is PostScript
	copies of the User Guide and Reference Manual available for
	download from the same place.

	Once you have Aegis installed, download one of the template
	projects, available from the same place.  These template projects
	get you up and running very quickly.  (They also exercise the
	distributed development functionality to do so: your first taste.)

	In order to have informed discussion of the merits of Aegis,
	it is necessary for a number of people to download Aegis and
	try it out.  And also try out distributing change sets with it.

	Once this has happened, it will be possible to discuss whether
	or not it is suitable for Linux kernel development, and if so,
	how to implement it.

	I look forward to your thoughtful comments and suggestions.

Regards
Peter Miller    E-Mail: millerp@canb.auug.org.au
/\/\*           WWW:    http://www.canb.auug.org.au/~millerp/
Disclaimer: The opinions expressed here are personal and do not necessarily
	reflect the opinion of my employer or the opinions of my colleagues.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/