[LWN Logo]

Date:   Thu, 23 Dec 1999 23:07:24 +0100 (CET)
From:   arjan@fenrus.demon.nl (Arjan van de Ven)
To:     zab@zabbo.net (Zach Brown)
Subject: Re: Bloat? (khttpd)

[I respond to Zach's mail, but most of it is in reply to the entire thread]

Hi,

Zach Brown wrote:

> what we _do_ want is a nice API for doing async bulk file->file transfers.
> Thats all the meat of khttpd really is.  We can get this with a bit of
> thinking and proper async io.  This will happen in time.

I really hope so. Linux will benefit from this big time. 

kHTTPd (and any webserver, and in a way NFS/FTP server) is two things

1) A Request-decoder/Startup (header) part
2) Bulk data transfer

kHTTPd is not very good at #2. "stock" Apache is very bad at #1, as it needs 
a lot of syscalls (9 or so, I don't remember exactly) to do this. 

Nobody objects to having #2 inside the kernel. Linux needs that for
performance, for FTP/HTTP/NFS (be it userspace or kernelspace) and 
file->file operations. (In fact sendfile() does this in a synchronous 
way, and is in the 2.2 kernels since ages). For #2, it makes no sence to 
switch to userspace a lot, as userspace effectively would only increment 
some counter.


Webservers usually serve a lot of small files (.html and .gif/.png). For
benchmarks and other file-server like situations, latency counts above all. 

kHTTPd achieves this by _not_ doing the syscalls in #1, by not doing all 
the rare complex stuff (it "bounces" those requests to userspace), and by 
reducing the number of context-switches in the fast path. 

ph[h]httpd, as I understand it, reduces the number of syscalls by using
pre-calculated headers, and reduces the number of context-switches by
"grouping" events into blocks of async-RT signals. (Not to speak of the 
reduced overhead wrt select()/poll() implementations). It also seems to 
"bounce" all non-static requests to a modified Apache. 

I haven't benchmarked phhttpd vs kHTTPd yet, as I have not patched my Apache
for phhttpd yet. (Zach: Can phhttpd run without these modifications, or run 
Apache without phhttpd afterwards?)

The discussion about "bloat" cannot be over the filesize of the
kernel-tarbal, as this overhead is minimal. It also cannot be "Everything 
that can be done in userspace, should be banned from the kernel". Even the 
TCP/IP stack and the VFS would have to be banned. 

It is the common perception, that it is a task of the kernel to give
file-data to "processes" that ask for it. The VFS is there for userspace
programs, kNFSd is there to do this to remote processes over the NFS protocol, 
for the very same reasons (latency) as mentioned above. 

kHTTPd is _one way_ of doing this for remote processes over the HTTP
protocol. HTTP is much cleaner than NFS in many ways, even though the RFC is
175 pages long. kHTTPd could also have been done the Microsoft way, that is
to do everything from interrupt-handlers. That would increase performance,
sure, but it would blast all modularity to bits.  

I am not arguing that all such protocols should be implemented inside the 
kernel (far from that), but there are a few protocols that matter a lot in 
the outside world. HTTP is increasingly important, and not just for 
Benchmarks. GNOME and KDE will provide transparent network-access through 
HTTP, for example. This will change HTTP in the direction of a file-server
protocol, where latency counts.

About the remark from Alexey that kHTTPd isn't modular at all: There is
(was) some miscommunication between Alexey and me, and the issues at hand
will be resolved shortly in a way that I hope is satusfactory for both of
Alexey and me.

I look forward to further improvements in phhttpd and have all confidence 
that in the end, Linux as a whole will become better from discussions like
this (at least from the parts that use real arguments and facts).

Greetings,
   Arjan van de Ven

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/