Hello
Here is the latest Caml Weekly News, for the week of 19 to 26 April, 2005.
Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/727bc0e114e1e2c5616b96bfb0b38ccb.en.html
Alain Frisch announced:I'd like to announce an experimental extension of OCaml with built-in support for XML types. It can be seen roughly as a merger between OCaml and CDuce (http://www.cduce.org/). The compiler has been implemented on top of OCaml 3.08.2 and CDuce 0.3.2. The OCaml+CDuce language is intended to provide a simple way to deal with XML documents in OCaml applications. Thanks to XML types, you get static guarantees about the type of XML documents produced by the application. XML pattern matching is a powerful operation, reminiscent of ML pattern matching but much more powerful. Some facilities are provided to translate automatically from regular ML values to CDuce values and back. The language might also be useful for non-XML applications: debugging (using ML-to-XML translation), string regular expression (types and patterns), ... Documentation is very succinct for the moment: http://pauillac.inria.fr/~frisch/ocamlcduce/doc/README.cduce http://pauillac.inria.fr/~frisch/ocamlcduce/doc/ http://pauillac.inria.fr/~frisch/ocamlcduce/tests/ The CDuce documentation might be useful as well: http://www.cduce.org/documentation Home page: http://www.cduce.org/ocaml.html GODI users can try this extension by adding this line: GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi to their etc/godi.conf file, and by forcing a recompilation of the godi-ocaml-src and godi-ocaml packages. They should also build the godi-xml-support library, which features bindings for XML parsers (pxp,expat,xml-light) and a tool dtd2types (to generates type declarations from a DTD).
Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/3819cb6ae51a191b001dd68f2439ba5d.en.html
Alain Frisch said:Another kind of library which would benefit from such effort is XML parsing. I know about pxp, expat, xml-light, ocaml-xmlr, tony, xmllexer, and there might be others. It would be great to have some common interface. An event-driven interface is probably easier to agree upon. There are many points to address (external entities, encodings, namespace processing, ... even if the features are not available in all the parsers). Anyone interested in this discussion ?Nicolas Cannasse said:
I'm willing also to make XmlLight compatible, as we did for IO :)Stefano Zacchiroli said and Alain Frisch answered:
> Even if certainly easier to agree upon, event-driven interface for XML > are harder to program than tree based ones. Some applications really need stream based processing: loading the XML document into memory is out of question (because it is huge) and/or processing needs to happen as soon as data is available (e.g. for the Jabber protocol). > Basic tree operations should not be that hard to agree upon ... I'm afraid it will be hard. To start with, do we want mutable trees, upward pointers ? Do we want to keep locations, namespace declarations, comments, entity references ... ? Which whitespace to remove ? Anyway, a tree representation can easily be built on top of an event-driven interface. The difficult part in parsing XML is really lexing. We can try to agree upon one or several standard tree representation, but I believe we should start with an event-driven interface. Is someone willing to set-up a mailing list for this discussion ?Gerd Stolpmann replied:
For a standard representation we should use DOM, simply because lots of XML standards refer to DOM. Of course, that doesn't answer all details. > Anyway, a tree representation can easily be built on top of an > event-driven interface. The difficult part in parsing XML is really > lexing. We can try to agree upon one or several standard tree > representation, but I believe we should start with an event-driven > interface. And it is much simpler. > Is someone willing to set-up a mailing list for this discussion ? I have set up a mailing list: https://gps.dynxs.de/mailman/listinfo/xml-list I would suggest we wait until Monday before starting the discussion so everybody can sign up who is interested.
Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/471eea17142e68d97d253dcfc2b5050f.en.html
John Carr announced:I updated my patches for 64 bit SPARC code to work with ocaml 3.08.3: http://www.mit.edu/~jfc/ocaml-3.08.3-sparc64.tar.gz There are two changes from the 3.08.1 version: 1. The 64 bit startup code did not allocate a large enough stack frame, causing crashes in garbage collection in some programs due to register window saves overwriting of the zero word that terminates the chain of stack frames. If you want to fix this without upgrading, change 176 to 208 in the save statement at asmrun/sparc-sparc64.S line 319. 2. ocaml does not compile on Solaris because otherlibs/graph/.depend contains references to /usr/X11R6; the install script deletes these dependencies. As before: This only affects native code, ocamlopt. Although the patched ocaml recognizes other 64 bit SPARC operating systems, I only have access to Solaris 9. Floats are still boxed in 64 bit code but are properly aligned, potentially improving performance. Here are run times for three of the microbenchmarks we discussed on the list recently, from left to right lorentzian 200, sieve 10000000, sort 10000: lore siev sort ML 32 6.78 1.52 2.87 ML 64 7.41 1.18 2.72 C 32 2.81 1.93 2.54* C 64 2.92 3.50 ML 32 = ocamlopt 3.08.3 32 bit version with -march=v8 ML 64 = ocamlopt 3.08.3 64 bit version C 32 = Sun C++ 5.5 -xO3 -xarch=v8plus except * = gcc 3.3.2 C 64 = Sun C++ 5.5 -xO3 -xarch=v9
Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/fdca331aa48cf614e1e0bf2a6e721185.en.html
Eric Cooper asked and Matt Gushee answered:> Is there any way to do setenv() from within OCaml? I want to set an > environment variable that will be used by a C library that my OCaml > program calls. I know it's a simple C stub, but it would be nice if > it were in Sys along with getenv. How about Unix.putenv?
Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/ecc19bc04192af25029b34693f9883d8.en.html
David Wake asked and Richard Jones answered:> I am thinking of writing a N-tiered web application in OCAML. Is > anyone aware of any OCAML packages that could help in such a project? > I am particularly interested in: > > - application server software > - source code examples of web applications in OCAML The usual suspects: ocamlnet http://ocamlnet.sourceforge.net/ Xcaml http://www.asxcaml.org/ mod_caml http://merjis.com/developers/mod_caml/ (and there are apparently others - witness a recent lengthy thread on this list).
Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/bf7a8883b38fabbc1685f0cc9d5a099d.en.html
Christophe Troestler said:I am not sure what is your point but the trouble right now is not that there are no CGI library but that there are too many [1]! So let me place a call: | Would people be interested in setting up a list to discuss a common | CGI-like interface, i.e. a minimal set of services to be offered | (in the same vein to what was done I/O objects, see | http://ocaml-programming.de/rec/IO-Classes.html). [It should not | be hurried as for some library authors, this is not the main job.] | The aim is to make possible to develop higher level libraries | (e.g. template libraries) that work whatever the basic interface | one favors. > Une interface de base de données (MySQL) suit de près derrière. There are libraries for many databases as well as a generic one: DBI (http://savannah.nongnu.org/cgi-bin/viewcvs/modcaml/ocamldbi/). Cheers, ChriS --- [1] Among others, - Maxence Guesdon CGI (http://pauillac.inria.fr/~guesdon/Tools/cgi/) - CamlGI (http://sourceforge.net/projects/ocaml-cgi/) - fcgi-ocaml (http://sourceforge.net/projects/tcl-fastcgi/) - mod_caml (https://savannah.nongnu.org/projects/modcaml/) - OCamlnet (http://ocamlnet.sourceforge.net/) - cgi (http://www.lri.fr/~filliatr/ftp/ocaml/cgi/)Gerd Stolpmann said:
Good idea. However, I think it is too late for such a discussion. First, it already happened. Do you remember Bedouin? Although this debate was about the general design of web applications, there was also a "branch" targeting the low-level stuff, especially CGI and other connectors. This branch was Ocamlnet. Second, Ocamlnet exactly defines the "minimal set of services" (besides including several implementations). The interesting point is that it is possible to do implementations outside Ocamlnet by just defining compatible classes. This was a design idea from the very beginning, realized by using classes instead of functors everywhere. Because Ocamlnet has several layers, the developer of a new connector is even free to choose the level of the implementation, often giving one the chance to reuse code. I am quite astonished at seeing that many CGI implementations. I only knew the implementation of de Rauglaudre and Filliatre, and its limitations were one the motivations to develop Ocamlnet. Except mod_ocaml, which is somehow a different thing, the other libraries seem to have the same limitations: Non-modular design, missing features like upload of large (> 16 MB) files, or internationalization. I don't say Ocamlnet is perfect, but it is a step into the right direction.Jean-Christophe Filliatre said:
Just to clarify the situation (if needed): I wrote my CGI library for my own purposes and it is not intended to be complete, RFC-compliant, or whatever. Even if it appears in the hump (by the time I put it online there were not so many such libraries), it does not make much sense to compare it today with libraries such as ocamlnet.Christophe Troestler asked and Gerd Stolpmann answered:
> Are questions welcomed? Yes, of course. Also ideas for improvements, or just impressions. > At the time I was not so much interested by web apps -- this is still > not my main concern but, at times, I have to build some and I like > both powerful and simple tools. My experience with OCamlNet is that, > for a newcomer, it is difficult to find ones way through it. The > library is impressive but, IMO, the interface could be made _simpler_ > and more orthogonal. This is quite complicated to explain. Ocamlnet exhibits some of the internal complexity to give "power users" more possibilities, for example defining their own connector. Furthermore, it does not try to hide the peculiarities of the various connector protocols. One sees that every CGI request is performed by a new process, and for FastCGI and AJP it is not hidden whether multi-processing or multi-threading is used to parallelize requests. Of course, this is confusing for beginners, but I don't really see how to improve this without giving up modularity (i.e. every connector has its own entry point). > For example I am wondering why standard CGI must use [let cgi = new > std_activation()] while FastCGI requires [Netcgi_fcgi.serv (fun cgi -> > ...)]. Why can't the callback method be used consistently all over > the place? For historical reasons, the CGI connector has a simplified entry point: let cgi = new std_activation() Why does this initialize for CGI? Because the argument ~env is missing, and by default, env is tried to be taken from the process environment to initialize for CGI. This simply means that on this level it is implemented that CGI is the default connector. Internally, the other connectors also create a std_activation object, but with a certain ~env argument, making it different. If we added the callback method for CGI, it would be simply let cgi_serv f = f (new std_activation()) (maybe with added exception handling). > Additional advantages are that it allows to handle > exceptions [1], to [#finalize] automatically when the request has been > dealt with (the user may still want to call [#finalize] manually but > would not be required to do so) and to [#commit]/[#flush] the output. Accepted, this would be better. > Finally, how are we supposed to launch different threads for different > requests [2]? Maybe Eric can comment on this. > About arguments: is the mutability of arguments useful? This makes > the whole interface more complex for a purpose I can't see. For example, to help for debugging. The command-line interface uses the mutability of the arguments, too. > Also, why > not distinguish simple parameters (for which a method that returns a > string is sufficient) and file uploads (for which one clearly wants > more flexibility). Because this is bullshit. It is not always a good idea to copy bad habits of other libraries - I know that all other libraries treat simple arguments and file arguments differently. However, this is a difference that actually does not exist on the HTTP level. I think it is shortsighted to artificially differ between things that are principally the same. For example, what happens when a new HTML feature is defined by W3C that requires a new kind of argument? E.g. a rich text editor whose contents are transported with a new kind of header? W3C will simply represent that argument in a form-encoded request. The point is that OCamlnet can decode and represent all form-encoded requests, no matter whether it is a file, a simple value, or something completely different. Btw, the uniform representation of arguments can already be very useful now, for example for processing non-web requests. > Why is there an exception [Std_environment_not_found]? Isn't it the > role of the library to reject requests with lack of information (and > log them)? Why bother the user with that? (I don't even think one > may want to customize the reply to such requests as they are just > bogus.) See above: CGI is the default connector, and this exception is raised when the default does not apply. > I have a few more questions in the same vein but will stop here > waiting for reactions before bothering everybody even more! :-) Ok, let's see whether this discussion is fruitful.Christophe Troestler replied:
> defining their own connector. I understand one needs to do so to extend the library but can you name other situations? My feeling is that CGI, FCGI, AJP, and test are the more used ones and that a custom connector is seldom needed... so shouldn't the standard connectors share a common standard (of course with a few peculiarities to each) while the function(s) to create new ones are grouped into a separate module. The prng* functions should be in the main module -- an additional [random_sessionid] function (generating e.g. 128 bits random strings) could be useful. > Furthermore, it does not try to hide the peculiarities of the > various connector protocols. The purpose of the various connectors being the same, I believe they should share a common interface whenever possible. It is needlessly inconvenient to have to learn different interfaces for a given concept. Also, whenever possible, I believe names from the standard library should be reused (e.g. establish_server). > One sees that every CGI request is performed by a new process, and > for FastCGI and AJP it is not hidden whether multi-processing or > multi-threading is used to parallelize requests. It is good to be able to choose. For FCGI however, I was expecting some comments of Eric to understand better how it works (including multiplexing). > Of course, this is confusing for beginners, but I don't really see > how to improve this without giving up modularity (i.e. every > connector has its own entry point). I am afraid that I am not sure to fully grasp which kind of modularity you have in mind -- certainly because of my lack of experience in web devel. For example, I do not understand why [Netcgi_jserv.server_init] is not just included in [server_loop]. Another reason modularity is good for it multithreading (or multiple processes). But there are other ways to handle that than to split into many functions. For example, on can imagine val establish_server : ?max_conns:int -> ... -> ?fork:((connection -> unit) -> connection -> unit) -> (connection -> unit) -> Unix.sockaddr -> unit (?fork can create a process or a thread). This makes it possible to wrap the function handling the connection (connection -> unit) so that exceptions it raises are dealt with appropriately -- thus for example it seems possible to get rid of the care the user must exercise with [Signal_shutdown]... May you explain situations for which a [establish_server] / [handle_request] modularity is not enough? > If we added the callback method for CGI, it would be simply I am not suggesting to simply _add_ one (that would just make the whole interface more confusing) but to rework the interface so that * all connectors are treated equally (e.g. CGI is noting special w.r.t. other, conceptually) and the modularity is handled the same way for all of them (short of optional arguments). * a separate module possesses the material to extend netcgi, e.g. to create specially tailored connectors. Another thing that seems to be lacking is a uniform way to write in the server log. For CGI it is stderr, FCGI uses special "channel" (not stderr),... This is important e.g. to log nonfatal errors. > > About arguments: is the mutability of arguments useful? This makes > > the whole interface more complex for a purpose I can't see. > > For example, to help for debugging. May you explain how? Is it useful to modify the value of a param inside a request handling function, with global effect (i.e. not just for the function scope)? Setting parameters before handling the request is a different matter -- a powerful "test" mode can certainly do this without mutability (exposed). > The command-line interface uses the mutability of the arguments, Well, it is fine with me that the function creating the environment can modify it. What I am objecting is that [cgi_activation] offers functions to mutate them. > > [Std_environment_not_found] > See above: CGI is the default connector, and this exception is raised > when the default does not apply. But then, if you do not treat CGI in a special way (i.e. have distinct CGI and test connectors) it is not needed. In fact, it is not clear to me why it is good to have [std_environment] and [test_environment] in the interface as, as far as I can tell, they will just be used to implement the associated connectors (i.e. what this modularity brings you?). [custom_environment] is fine and should be put in the "extension" module.
Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).
:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM
If you know of a better way, please let me know.
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.