Date: Wed, 3 Feb 1999 18:16:41 -0500 (EST) From: "Theodore Y. Ts'o" <tytso@MIT.EDU> To: linux-kernel@vger.rutgers.edu Subject: [Monty: Kernel interface changes (was Re: cdrecord problems on recent Linux versions)] This message didn't get sent to the linux-kernel list due to an addressing typo, so I'm taking the liberty of forwarding it. The message here is *important*, and *timely*, especially now that 2.2 has been released. I'd hate to see MIT turn away from Linux to NetBSD, but more importantly, if we keep being gratuitous about changing interfaces, it will be more than just MIT that will give up on Linux. I've been saying that binary compatibility matters for a long time now. I used to complain that the libc folks didn't seem to care about compatibility issues, and that the kernel folks were careful about such things. I hope that the later half of that statement is at least true. Given that folks during the 2.1 release have blithely made binary incompatible changes with sysctl (which fortunately Stephen caught), this point bears repeating. We have *got* to be careful about keeping interfaces backwards compatible. I have delayed making code available, or delayed asking Linus to merge in some patches, until I was sure that I had the interface right, so that we wouldn't have this kind of interface incompatibility. When I've had to make interface changes, I've kept old interfaces around and added new ioctls instead of changing existing ioctls. I'd suggest that other people try very, very hard to do the same. Granted, there will be some rare cases where making incompatible changes is required. But they really should be the exception, not the rule. - Ted ------- Forwarded Message From: Monty <xiphmont@MIT.EDU> To: linuxch-kernel@vger.rutgers.edu, linux-dev@MIT.EDU, jered@MIT.EDU, warlord@MIT.EDU, nemo@MIT.EDU, linux-scsi@vger.rutgers.edu, cox@idecnet.com, cdwrite@lists.debian.org Cc: xiphmont@MIT.EDU Subject: Kernel interface changes (was Re: cdrecord problems on recent Linux versions) Date: Wed, 03 Feb 1999 16:21:22 EST (Alan: you're CC:ed because you've been receptive to my flames in the past and generally know the best place to send them. I'm not blaming you for anything this time ;-) > Hi all, > > it seems that current Linux versions (e.g. SuSE 6.0) give problems > with inconsistent content in > > /usr/include/scsi/sg.h > > and > > /usr/src/linux/include/scsi/sg.h > > The result is that cderecord reports "Not enough memory" cannot send > SCSI command..... I take it this means that someone, in their infinite wisdom (Jens: apologies if this was you), decided to gratuitously change yet another kernel interface (and not mention it to the community beyond burying it in a changelog) and that every binary package out there that uses this interface just broke? Great, great, great... The sooner we root out the disaster that is SG, the better. I guess I better go look, huh. I hope it's not the case that MIT Athena now needs an i386_linux4 binaries directory :-P We just got most things built for linux3... @mantra(BINARY COMPATABILITY MATTERS) why? An example... For those not paying attention (I'm addressing kernel/distribution maintainers here) MIT Athena and myself believe in modern client/server. That means we have 10,000 workstations on MITnet (clients), and they get their filesystems and software from a handful of very pumped fileservers. These days, most of these workstations are private, and a huge fraction of those are Linux. They run everything from Linux 1.0/SlackwareAncient to Redhat5.2 plus every bleeding edge patch available (Sal finally took his 0.98 based 386 home I think). Because Linux makes major executable format changes every year or so, we have to build binaries that work on everything and it's a royal goddamned bitch and a half when minor changes like this break software for no good reason. You might have *thought* that just bumping up the size of the argument struct to a single ioctl from 2.0.34->2.0.35 was a minor thing that no one would notice, but we noticed. We had to debug the goddamned thing and build new binaries. Fortunately we found a route (in this one case) to make the binaries work on everything. i386_linux/bin1 is a.out. Most everything statically linked. i386_linux2/bin is ELF/libc5, everything statically linked. I know, libc5/ELF was done to make shared objects quick and easy. Unfortunately, libc5.3 and 5.4 are fundamentally incompatable due to endianness changes in the usr/include headers (take a look at string.h and its lookup tables for example). A binary built against 5.3 segfaults when run with 5.4 and vice versa. So, everything is statically linked. Thank you Foresight. i386_linux3/bin is ELF/libc6. We thought that finally all this would be behind us. We're ELF, state of the art libc (sort of), no major changes coming. Surprise!!! How many 'minor interfaces' will change before we need linux4/bin? In the meantime, anything using SG is going to have to be coded to replicate *both* structures and probe for the possiblility of *both* interfaces (in addition to the other three Linux packet command interfaces also in the kernel. As an aside, I'm part of a group of kernel developers and *userspace* developers working to clean this up with minimal pain. *Please* don't read all this and immediately start changing things in isolation again. If you're interested in helping, write to the linux-scsi list). Let's not continue pushing MIT to NetBSD because of these headaches. When I said above that Linux was "a huge fraction" of MITnet's private workstations, I did not mean majority. It *used* to be a majority, but somewhere between i386_linux2/bin and i386_linux3/bin, enough people got fed up with the lurking incompatabilities of the release of the week that the majority (including most of the active *student* group developers) bailed to NetBSD and show no signs of coming back. MIT Information Services/Athena is seriously considering the next major *official*, supported platform rollout on MITnet to be a free unix on PCs. Linux is still in the running here. I'd like to see that. Let's not continue to look so foolish, eh? I'm going to go look at the sg.h problem now. I'm praying someone is confused and it's still all the same, or at least extended in such a way I can build backward compatable binaries. Mind the above flame. *Never* change a kernel interface with the rationale "eh, no one will notice much" or "oh, they can just rebuild". Major changes have to happen now and then to stay current, but every minor change that's slipped between the major changes multiplies the headaches. We're no longer in the world where everyone with a linux box builds everything on that machine from source. *BINARY COMPATABILITY MATTERS* >I noticed that with the current Debian (and probably other distributions >aswell) that /usr/include/linux, /usr/include/asm, and /usr/include/scsi >are copies of the equivalent 2.0.36 kernel directories, rather than >symlinks to the current kernel source. All the major distributions now package the kernel headers seperately from the kernel source. And, yes, it's caused minor build/debug nightmares for those who didn't realize it, but this is a minor evil as evils go. Monty (PS: I'm not MIT Information Systems staff. I just build software there.) </flame> ------- End Forwarded Message - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/