[LWN Logo]

Date:	Wed, 3 Feb 1999 18:16:41 -0500 (EST)
From:	"Theodore Y. Ts'o" <tytso@MIT.EDU>
To:	linux-kernel@vger.rutgers.edu
Subject: [Monty: Kernel interface changes (was Re: cdrecord problems on recent Linux versions)]


This message didn't get sent to the linux-kernel list due to an
addressing typo, so I'm taking the liberty of forwarding it.  The
message here is *important*, and *timely*, especially now that 2.2 has
been released.

I'd hate to see MIT turn away from Linux to NetBSD, but more
importantly, if we keep being gratuitous about changing interfaces, it
will be more than just MIT that will give up on Linux.  

I've been saying that binary compatibility matters for a long time now.
I used to complain that the libc folks didn't seem to care about
compatibility issues, and that the kernel folks were careful about such
things.  I hope that the later half of that statement is at least true.

Given that folks during the 2.1 release have blithely made binary
incompatible changes with sysctl (which fortunately Stephen caught),
this point bears repeating.  We have *got* to be careful about keeping
interfaces backwards compatible.  I have delayed making code available,
or delayed asking Linus to merge in some patches, until I was sure that
I had the interface right, so that we wouldn't have this kind of
interface incompatibility.  When I've had to make interface changes,
I've kept old interfaces around and added new ioctls instead of changing
existing ioctls.  I'd suggest that other people try very, very hard to
do the same.

Granted, there will be some rare cases where making incompatible changes
is required.  But they really should be the exception, not the rule.

						- Ted

------- Forwarded Message

From: Monty <xiphmont@MIT.EDU>
To: linuxch-kernel@vger.rutgers.edu, linux-dev@MIT.EDU, jered@MIT.EDU,
        warlord@MIT.EDU, nemo@MIT.EDU, linux-scsi@vger.rutgers.edu,
        cox@idecnet.com, cdwrite@lists.debian.org
Cc: xiphmont@MIT.EDU
Subject: Kernel interface changes (was Re: cdrecord problems on recent Linux versions)
Date: Wed, 03 Feb 1999 16:21:22 EST


(Alan: you're CC:ed because you've been receptive to my flames in the
past and generally know the best place to send them.  I'm not blaming
you for anything this time ;-)

> Hi all,
> 
> it seems that current Linux versions (e.g. SuSE 6.0) give problems
> with inconsistent content in 
> 
>       /usr/include/scsi/sg.h
> 
> and
> 
>       /usr/src/linux/include/scsi/sg.h
> 
> The result is that cderecord reports "Not enough memory" cannot send
> SCSI command.....

I take it this means that someone, in their infinite wisdom (Jens:
apologies if this was you), decided to gratuitously change yet another
kernel interface (and not mention it to the community beyond burying
it in a changelog) and that every binary package out there that uses
this interface just broke?  Great, great, great...  The sooner we root
out the disaster that is SG, the better.

I guess I better go look, huh. I hope it's not the case that MIT
Athena now needs an i386_linux4 binaries directory :-P We just got
most things built for linux3...

@mantra(BINARY COMPATABILITY MATTERS)

why?  An example...

For those not paying attention (I'm addressing kernel/distribution
maintainers here) MIT Athena and myself believe in modern
client/server.  That means we have 10,000 workstations on MITnet
(clients), and they get their filesystems and software from a handful
of very pumped fileservers.  These days, most of these workstations
are private, and a huge fraction of those are Linux.  They run
everything from Linux 1.0/SlackwareAncient to Redhat5.2 plus every
bleeding edge patch available (Sal finally took his 0.98 based 386
home I think).

Because Linux makes major executable format changes every year or so,
we have to build binaries that work on everything and it's a royal
goddamned bitch and a half when minor changes like this break software
for no good reason.  You might have *thought* that just bumping up the
size of the argument struct to a single ioctl from 2.0.34->2.0.35 was
a minor thing that no one would notice, but we noticed.  We had to
debug the goddamned thing and build new binaries.  Fortunately we
found a route (in this one case) to make the binaries work on
everything.

i386_linux/bin1 is a.out.  Most everything statically linked.  

i386_linux2/bin is ELF/libc5, everything statically linked.  I know,
libc5/ELF was done to make shared objects quick and easy.
Unfortunately, libc5.3 and 5.4 are fundamentally incompatable due to
endianness changes in the usr/include headers (take a look at string.h
and its lookup tables for example).  A binary built against 5.3
segfaults when run with 5.4 and vice versa.  So, everything is
statically linked.  Thank you Foresight.

i386_linux3/bin is ELF/libc6.  We thought that finally all this would
be behind us.  We're ELF, state of the art libc (sort of), no major
changes coming.  Surprise!!!  How many 'minor interfaces' will change
before we need linux4/bin?  In the meantime, anything using SG is
going to have to be coded to replicate *both* structures and probe for
the possiblility of *both* interfaces (in addition to the other three
Linux packet command interfaces also in the kernel.  As an aside, I'm
part of a group of kernel developers and *userspace* developers
working to clean this up with minimal pain.  *Please* don't read all
this and immediately start changing things in isolation again.  If
you're interested in helping, write to the linux-scsi list).

Let's not continue pushing MIT to NetBSD because of these headaches.
When I said above that Linux was "a huge fraction" of MITnet's private
workstations, I did not mean majority.  It *used* to be a majority,
but somewhere between i386_linux2/bin and i386_linux3/bin, enough
people got fed up with the lurking incompatabilities of the release of
the week that the majority (including most of the active *student*
group developers) bailed to NetBSD and show no signs of coming back.

MIT Information Services/Athena is seriously considering the next
major *official*, supported platform rollout on MITnet to be a free
unix on PCs.  Linux is still in the running here.  I'd like to see
that.  Let's not continue to look so foolish, eh?

I'm going to go look at the sg.h problem now.  I'm praying someone is
confused and it's still all the same, or at least extended in such a
way I can build backward compatable binaries.  Mind the above flame.
*Never* change a kernel interface with the rationale "eh, no one will
notice much" or "oh, they can just rebuild".  Major changes have to
happen now and then to stay current, but every minor change that's
slipped between the major changes multiplies the headaches.  We're no
longer in the world where everyone with a linux box builds everything
on that machine from source.  *BINARY COMPATABILITY MATTERS*

>I noticed that with the current Debian (and probably other distributions
>aswell) that /usr/include/linux, /usr/include/asm, and /usr/include/scsi
>are copies of the equivalent 2.0.36 kernel directories, rather than
>symlinks to the current kernel source.

All the major distributions now package the kernel headers seperately
from the kernel source.  And, yes, it's caused minor build/debug
nightmares for those who didn't realize it, but this is a minor evil
as evils go.

Monty

(PS: I'm not MIT Information Systems staff.  I just build software there.)

</flame>












------- End Forwarded Message

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/