[LWN Logo]

Date:	Sat, 19 Jun 1999 07:47:38 +0200 (CEST)
From:	Arjan van de Ven <arjan@fenrus.demon.nl>
To:	linux-kernel@vger.rutgers.edu
Subject: kHTTPd: Good or Bad

Hi,

In the last few days, a discussion about khttpd has begun. Some find it
bloat, others see a use for it. I have not had time to answer many of the
discussions, but if feel that I need to explain my point of
view and the current status of kHTTPd.


Implementation
--------------
Some argument was made about the "ugly" implementation of kHTTPd. kHTTPd
is currently designed much like Apache: There are X threads blocking in
accept(), one of these threads get a new connection and handles the
requests exclusively. When the request is handled, it blocks in accept()
again.

This design is far from optimal, it places a huge burden on the scheduler.
There are some inconveniences in Linux that enhance this effect (for
example, a threads wakes up on an incomming connection, but has to go to
sleep right away while waiting for data).

I am building a version that handles things differently (one thread per
CPU that does all the work), but the current design was easy to implement
as a proof of concept.

An other point was made that the way kHTTPd handles userspace is uggly.
This is certainly true for versions up to 0.1.1, but in the (pre)0.1.2
version this is handled in a much more graceful way.


phhttp
------
Alan Cox argumented that phhttpd was faster than kHTTPd. I have not seen
any numbers that indicate that, but I have not performed any benchmark
myself either. phhttpd "cheats" though, it caches all files into memory
and doesn't do a lot of things a httpd should, such as sending "Date:",
"Last-Modified:", "Content-Type:"  headers. Some of them might be easy to
build into phhttpd, but the "Date:" header is more complex as it requires
the daemon to send the _current_ that at the time of sending.

kHTTPd doesn't cheat, sends those headers AND supports"If-Modified-Since"
requests. If someone benchmarks kHTTPd agains phhttpd, this should be
recognized. 

Cache
-----
The point was made by an Apache developer that kHTTPd would do a better
job if it was just caching requests and that userspace "commands" kHTTPd
to send a certain file. 

If this is true, Alan Cox is right and kHTTPd shouldn't be in the kernel,
but the problem with the kernel should be fixed.

I don't think this is true though:
1) There already is a cache in the kernel, the buffercache. Caching things
   twice is insane
2) Caching the header is a joke since building one costs only one sprintf
   (At least, from within the kernel) and some good bookkeeping.


Why in the kernel
-----------------
kHTTPd solves one specific problem. Someone wants to see a file, and
kHTTPd gets that file to this someone. knfsd also does this. The VFS also
does this (it's just that the VFS doesn't use TCP/IP to communicate).

Why are knfsd and the VFS in the kernel: because it is silly not to do
this. I see no difference between a remote client that asks over NFS if a
specific file has changed and a remote client that asks over http if
a specific file has changed. The same for reading the content of this
file.

I find the argument that kHTTPd bloats the kernel and phhttpd doesn't
somewhat strange. kHTTPd requires no kernel patches (it helps if you
export some extra symbols though) and no modifications to the
userspace-client of your choise. phhttpd _does_ require kernelpatches and
although the concept can be used by other http-daemons, it means that
those _USERSPACE_ daemons have to be tuned to Linux only. For Apache, this
might work. But there are a lot of other (great) webservers, good at a
specific job. They benifit from kHTTPd without any burden on their
developers, but their architecture prohibits the use of the
phhttpd-tricks.



Greetings, 
  Arjan van de Ven


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/