From: Alan Schmitt <alan.schmitt@inria.fr> To: lwn@lwn.net Subject: Attn: Development Editor, Latest Caml Weekly News Date: Tue, 11 Dec 2001 15:33:45 +0100 Hello, Here is the latest Caml Weekly News, week 05 to 11 december, 2001. Summary: 1) Code-base size comparison vs Language 2) Unreliable Threading? 3) Ensemble release 1.33 4) embedding Ocaml in an existing application 5) SML -> OCaml ====================================================================== 1) Code-base size comparison vs Language ---------------------------------------------------------------------- David McClain elaborated: I just finished my experiment to reduce the size of a fielded application by recoding in either of Lisp or OCaml. I had early indications that, aside from pure ease of programming in these HLL's, the overall code base would be drastically reduced (5x to 6x). That is certainly true if you count all the source code needed to produce the application, but an honest, impartial, comparison of the lines I actually had to write, of non-reusable, application specific code shows somewhat disappointing results on this basis alone. The application is a system network server that performs recursive prefix mappings of file pathnames, including environment variable substitutions. This is a variation on the system provided by the Sprite experimental OS developed at UCB by John Ousterhout, et. al. in the late 1980's and early 1990's. The existing version was coded in M$ VC++ making heavy use of STL. It is a COM/OLE process server based on M$ ATL. All three versions retain a machine generated ATL wrapper code for this COM/OLE behavior -- I only needed to write a few lines of IDL to produce the basic skeleton, and all three versions use identical stuff here... For the application specific coding, the scores are: Existing App: C/C++ = 1106 LOC Lisp Version: C/C++ = 284 LOC, Lisp = 798 LOC --> Total = 1082 LOC OCaml Version: C/C++ = 284 LOC, Lisp = 58 LOC, OCaml = 453 LOC --> Total = 888 These LOC counts do *not* include blank lines and comment only lines. On the basis of code-base size reduction, these results are nearly a tie. But on the basis of ease of programming, I have to award Lisp first, followed by OCaml, and distantly trailed by C/C++. The reasons for this are: 1. Lisp is a huge langauge with nearly everything you need already built in. But it produces very bulky DLL's -- on the order of 15 MBytes. 2. OCaml is equally terse as Lisp, or even slightly better, but needs a fair amount of additional support routines written, to cover the application needs. Some of this is in C/C++ (very little) but most has to do with providing things like unwind-protect, generalized string handling, generalized list operations. It produces very fast runtime code (not needed here) and quite reasonably sized DLL's -- about 300 KBytes (50x smaller than Lisp!!) 3. C/C++, making heavy use of classes and STL is nearly unreadable, took a long time to program, and is frightening to revisit after some time away from it (1 year or more since original writing). C/C++ retains the capability to utilize Unicode (FWIW -- I don't really need it), but it was written with some embedded bugs that I found only when I was able to remain at the abstract levels permitted by HOL's. Both the Lisp and OCaml versions were written in the course of 2-3 hours. Writing the C/C++ version took the better part of 1 week. Prior to that I had written experimental versions in Lisp and had more than a year of playing with the system to get an understanding of the needed algorithms. I will say that both Lisp and OCaml allowed me to spot some errors in the C/C++ implementation, fix those errors, and add some extra capability (about 20 LOC in both Lisp and OCaml for the extra stuff). I estimate the time needed to go back and refamiliarize myself with STL and the internal architecture of the existing application -- in order to fix the bugs I discovered and add the additional capabilities -- would be several days. I find it remarkable that OCaml has a slight edge on Lisp for terseness of expression. OCaml is a highly expressive syntax and you can say quite a lot in a few keystrokes. Lisp tends to be more wordy, use longer identifiers, and the code is quite a bit sparser for semantic content over a given number of LOC. This is as close as I can come to providing an honest, impartial, comparison of these languages for the purpose of rewriting existing code to be more maintainable, robust, and correct. I definitely think the effort is worthwhile, but not entirely for the reasons I had originally anticipated. ====================================================================== 2) Unreliable Threading? ---------------------------------------------------------------------- David McClain remarked: Earlier, when developing a mixed C/C++ and OCaml program, it looked as thought the OCaml threading code was "unreliable" in that I would get crashes under the MSVC++ debugger of "Invalid Handle" errors. But running the code outside of the debugger works just fine. I now believe the problem has nothing to do with OCaml or its threading code, but rather the way the M$ debugger attempts to walk stack frames looking for contexts to report. Since OCaml threading does its own thing, M$ appears to get lost in this process and ends up crashing on its own -- incorrectly attributing the error to the code under test. All efforts to use OCaml multithreading and Channels for inter-thread coordination HAVE been quite reliable outside of the M$ development environment. ====================================================================== 3) Ensemble release 1.33 ---------------------------------------------------------------------- Ohad Rodeh announced: A final release of Ensemble is now ready. This release includes several interesting libraries for Caml: 1. Cryptographic library interfacing with OpenSSL. 2. A socket library using winsock2 and scatter-gather on Linux, with support for raw C iovectors. 3. A merge tool allowing packaging of Caml libraries. All proprietry dependecy generation and build tools have been discarded. This has two benefits: (1) The license can be BSD-type (2) it is easy to cut-and-paste complete files and libraries. I now have to start working on some "serious" projects, and I no longer have the time to develop the system. I'll keep maintaining it though. As usual the system is available from: www.cs.cornell.edu/Info/Projects/Ensemble. All the best, Ohad. RELEASE_NOTES for Ensemble version 1.33 Author: Ohad Rodeh Last updated: 6/12/2001 CHANGES Removed proprietary build tools. 1. Rewritten all the makefiles, all proprietary build tools have been removed. The makefiles have been simplified, and can be easily read. Only standard tools are now used (ocamlc, ocamlopt, ocamldep, GNU-make, gcc, cl, nmake). 2. A large number of makefile bugs have been removed, makefile code has been considerably reduced. 2. Cleanup up and updated INSTALL.htm. 3. Copyright violations have been fixed, the Ensemble code base contains no CAML code, nor any other code that is not BSD. 4. Work is underway to add IBM to the copyright holders. 5. Many small bug fixes. OCAML COMPILER VERSION We are using version 3.01 for this version. PORTABILITY This version was tested on Linux, WIN2000, and NT4. Light testing was carried out on Sparc/Solaris. CE still does not run on WIN32 (yet). ====================================================================== 4) embedding Ocaml in an existing application ---------------------------------------------------------------------- Basile STARYNKEVITCH wondered: Hello, Some remarks (and wishes) about embedding Ocaml (bytecoded) in an existing application. I just embedded (a single day hack) Ocaml inside the Larbin web crawler (coded in C++ by Sebastien Ailleret) on http://pauillac.inria.fr/~ailleret/prog/larbin/ You should be able to get the patch from my personal web page or else email me either at work <basile.starynkevitch@cea.fr> or at home <basile@starynkevitch.net>. This because I need (for the POESIA project) to scan some part of the Web and get some crude measures of features like: percentage of Web pages using Javascript (about 50%) and mean length of such scripts (about 2Kbytes). So I hacked larbin to add Ocaml inside it. I used ocaml v3.02 or v3.01 (bytecode interpreter). First remark: the ocaml linking mechanism use the undocumented -p flag to dump all the Ocaml symbol table. So I had to hack Larbin's main routine to test for the special -p single argument and then call only ocaml_main_init(argv); and exit(0) in that case. I believe that it would be quite trickly to embed Ocaml inside a significant application which uses the -p flag (see the code in ocaml/byterun/startup.c routine parse_command_line). At least I suggest that the linking mechanism use a less natural convention (such as "-print-ocaml-symbols" as argv[1] of main) and that it could be documented. (and I did not found quickly where Ocaml invokes this magic -p flag). Second remark: for embedding Ocaml inside Larbin I need the -use-runtime flag to the ocamlc compiler. But it seems that the next version (v3.03) does not have it anymore. I hope that it will still be possible to embed OCaml inside an application without having to compile the whole application as a shared library. 3rd remark: why don't caml_main take and changes the main argc,argv arguments (like e.g. gtk_init routine in the Gtk toolkit) so that the application can use the other arguments. Otherwise, I suggest to publicize the parse_command_line routine so an embedding application could parse its own main arguments and pass some others to ocaml before starting the Ocaml interpreter. (imagine you want to embed Ocaml inside an existing Gtk application?). My previous post http://caml.inria.fr/archives/200112/msg00015.html regarding a tiny syntactic extension was related to this [Larbin] patch. I think that novice Ocaml users might be interested in a syntactic sugar to register bound values. (Xavier Leroy posted a long reply to this post that can be found at http://caml.inria.fr/archives/200112/msg00040.html) ====================================================================== 5) SML -> OCaml ---------------------------------------------------------------------- Daniel de Rauglaudre said: > Is there an SML -> OCaml translator? I have some SML programs I'd like > to convert... even something that only partly works would save me a fair > amount of tedium. You can try: camlp4 pa_sml.cmo pr_o.cmo your_file.ml But it is probably incomplete: you may get syntax errors. And even if it is ok, you probably have to change things by hands. You can ask me if you want me to complete of fix some bugs. ====================================================================== Alan Schmitt -- The hacker: someone who figured things out and made something cool happen.