Previous week Up Next week

Hello

Here is the latest Caml Weekly News, for the week of 19 to 26 April, 2005.

  1. Experimental extension of OCaml with XML types
  2. Common XML interface
  3. 64 bit SPARC code generator updated for ocaml 3.08.3
  4. Sys.setenv or equivalent?
  5. N-tiered web application in OCAML
  6. Common CGI interface

Experimental extension of OCaml with XML types

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/727bc0e114e1e2c5616b96bfb0b38ccb.en.html

Alain Frisch announced:
I'd like to announce an experimental extension of OCaml with built-in 
support for XML types. It can be seen roughly as a merger between OCaml 
and CDuce (http://www.cduce.org/). The compiler has been implemented on 
top of OCaml 3.08.2 and CDuce 0.3.2.

The OCaml+CDuce language is intended to provide a simple way to deal 
with XML documents in OCaml applications. Thanks to XML types, you get 
static guarantees about the type of XML documents produced by the 
application. XML pattern matching is a powerful operation, reminiscent 
of ML pattern matching but much more powerful. Some facilities are 
provided to translate automatically from regular ML values to CDuce 
values and back.

The language might also be useful for non-XML applications: debugging
(using ML-to-XML translation), string regular expression
(types and patterns), ...

Documentation is very succinct for the moment:
http://pauillac.inria.fr/~frisch/ocamlcduce/doc/README.cduce
http://pauillac.inria.fr/~frisch/ocamlcduce/doc/
http://pauillac.inria.fr/~frisch/ocamlcduce/tests/

The CDuce documentation might be useful as well:
http://www.cduce.org/documentation


Home page: http://www.cduce.org/ocaml.html

GODI users can try this extension by adding this line:
   GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi
to their etc/godi.conf file, and by forcing a recompilation of the 
godi-ocaml-src and godi-ocaml packages. They should also build
the godi-xml-support library, which features bindings for XML parsers 
(pxp,expat,xml-light) and a tool dtd2types (to generates type 
declarations from a DTD).
    

Common XML interface

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/3819cb6ae51a191b001dd68f2439ba5d.en.html

Alain Frisch said:
Another kind of library which would benefit from such effort is
XML parsing. I know about pxp, expat, xml-light, ocaml-xmlr, tony, 
xmllexer, and there might be others.

It would be great to have some common interface. An event-driven 
interface is probably easier to agree upon. There are many points to 
address (external entities, encodings, namespace processing, ... even if 
the features are not available in all the parsers).

Anyone interested in this discussion ?
    
Nicolas Cannasse said:
I'm willing also to make XmlLight compatible, as we did for IO :)
    
Stefano Zacchiroli said and Alain Frisch answered:
> Even if certainly easier to agree upon, event-driven interface for XML
> are harder to program than tree based ones.

Some applications really need stream based processing: loading the XML 
document into memory is out of question (because it is huge) and/or 
processing needs to happen as soon as data is available (e.g. for the 
Jabber protocol).

> Basic tree operations should not be that hard to agree upon ...

I'm afraid it will be hard. To start with, do we want mutable trees, 
upward pointers ?  Do we want to keep locations, namespace declarations, 
comments, entity references ... ?  Which whitespace to remove ?

Anyway, a tree representation can easily be built on top of an 
event-driven interface. The difficult part in parsing XML is really 
lexing. We can try to agree upon one or several standard tree 
representation, but I believe we should start with an event-driven 
interface.

Is someone willing to set-up a mailing list for this discussion ?
    
Gerd Stolpmann replied:
For a standard representation we should use DOM, simply because lots of
XML standards refer to DOM. Of course, that doesn't answer all details.

> Anyway, a tree representation can easily be built on top of an 
> event-driven interface. The difficult part in parsing XML is really 
> lexing. We can try to agree upon one or several standard tree 
> representation, but I believe we should start with an event-driven 
> interface.

And it is much simpler.

> Is someone willing to set-up a mailing list for this discussion ?

I have set up a mailing list:

https://gps.dynxs.de/mailman/listinfo/xml-list

I would suggest we wait until Monday before starting the discussion so
everybody can sign up who is interested.
    

64 bit SPARC code generator updated for ocaml 3.08.3

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/471eea17142e68d97d253dcfc2b5050f.en.html

John Carr announced:
I updated my patches for 64 bit SPARC code to work with ocaml 3.08.3:
http://www.mit.edu/~jfc/ocaml-3.08.3-sparc64.tar.gz

There are two changes from the 3.08.1 version:
1. The 64 bit startup code did not allocate a large enough stack
   frame, causing crashes in garbage collection in some programs
   due to register window saves overwriting of the zero word that
   terminates the chain of stack frames.  If you want to fix this
   without upgrading, change 176 to 208 in the save statement at
   asmrun/sparc-sparc64.S line 319.
2. ocaml does not compile on Solaris because otherlibs/graph/.depend
   contains references to /usr/X11R6; the install script deletes
   these dependencies.

As before:

This only affects native code, ocamlopt.

Although the patched ocaml recognizes other 64 bit SPARC operating
systems, I only have access to Solaris 9.

Floats are still boxed in 64 bit code but are properly aligned,
potentially improving performance.


Here are run times for three of the microbenchmarks we discussed on
the list recently, from left to right lorentzian 200, sieve 10000000,
sort 10000:

      lore siev sort
ML 32 6.78 1.52 2.87
ML 64 7.41 1.18 2.72
C  32 2.81 1.93 2.54*
C  64 2.92 3.50 

ML 32 = ocamlopt 3.08.3 32 bit version with -march=v8
ML 64 = ocamlopt 3.08.3 64 bit version
C 32 = Sun C++ 5.5 -xO3 -xarch=v8plus except * = gcc 3.3.2
C 64 = Sun C++ 5.5 -xO3 -xarch=v9
    

Sys.setenv or equivalent?

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/fdca331aa48cf614e1e0bf2a6e721185.en.html

Eric Cooper asked and Matt Gushee answered:
> Is there any way to do setenv() from within OCaml?  I want to set an
> environment variable that will be used by a C library that my OCaml
> program calls.  I know it's a simple C stub, but it would be nice if
> it were in Sys along with getenv.

How about Unix.putenv?
    

N-tiered web application in OCAML

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/ecc19bc04192af25029b34693f9883d8.en.html

David Wake asked and Richard Jones answered:
> I am thinking of writing a N-tiered web application in OCAML.  Is
> anyone aware of any OCAML packages that could help in such a project?
> I am particularly interested in: 
> 
> - application server software
> - source code examples of web applications in OCAML 

The usual suspects:

ocamlnet http://ocamlnet.sourceforge.net/
Xcaml http://www.asxcaml.org/
mod_caml http://merjis.com/developers/mod_caml/

(and there are apparently others - witness a recent lengthy thread on
this list).
    

Common CGI interface

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/04/bf7a8883b38fabbc1685f0cc9d5a099d.en.html

Christophe Troestler said:
I am not sure what is your point but the trouble right now is not that
there are no CGI library but that there are too many [1]!  So let me
place a call:

 | Would people be interested in setting up a list to discuss a common
 | CGI-like interface, i.e. a minimal set of services to be offered
 | (in the same vein to what was done I/O objects, see
 | http://ocaml-programming.de/rec/IO-Classes.html).  [It should not
 | be hurried as for some library authors, this is not the main job.]
 | The aim is to make possible to develop higher level libraries
 | (e.g. template libraries) that work whatever the basic interface
 | one favors.

> Une interface de base de données (MySQL) suit de près derrière.

There are libraries for many databases as well as a generic one: DBI
(http://savannah.nongnu.org/cgi-bin/viewcvs/modcaml/ocamldbi/).

Cheers,
ChriS

---
[1] Among others,
- Maxence Guesdon CGI (http://pauillac.inria.fr/~guesdon/Tools/cgi/)
- CamlGI (http://sourceforge.net/projects/ocaml-cgi/)
- fcgi-ocaml (http://sourceforge.net/projects/tcl-fastcgi/)
- mod_caml (https://savannah.nongnu.org/projects/modcaml/)
- OCamlnet (http://ocamlnet.sourceforge.net/)
- cgi (http://www.lri.fr/~filliatr/ftp/ocaml/cgi/)
    
Gerd Stolpmann said:
Good idea. However, I think it is too late for such a discussion.

First, it already happened. Do you remember Bedouin? Although this
debate was about the general design of web applications, there was also
a "branch" targeting the low-level stuff, especially CGI and other
connectors. This branch was Ocamlnet.

Second, Ocamlnet exactly defines the "minimal set of services" (besides
including several implementations). The interesting point is that it is
possible to do implementations outside Ocamlnet by just defining
compatible classes. This was a design idea from the very beginning,
realized by using classes instead of functors everywhere. Because
Ocamlnet has several layers, the developer of a new connector is even
free to choose the level of the implementation, often giving one the
chance to reuse code.

I am quite astonished at seeing that many CGI implementations. I only
knew the implementation of de Rauglaudre and Filliatre, and its
limitations were one the motivations to develop Ocamlnet. Except
mod_ocaml, which is somehow a different thing, the other libraries seem
to have the same limitations: Non-modular design, missing features like
upload of large (> 16 MB) files, or internationalization. I don't say
Ocamlnet is perfect, but it is a step into the right direction.
    
Jean-Christophe Filliatre said:
Just to clarify the situation (if  needed): I wrote my CGI library for
my own purposes and it  is not intended to be complete, RFC-compliant,
or whatever.   Even if it appears  in the hump  (by the time I  put it
online there were  not so many such libraries), it  does not make much
sense to compare it today with libraries such as ocamlnet.
    
Christophe Troestler asked and Gerd Stolpmann answered:
> Are questions welcomed?

Yes, of course. Also ideas for improvements, or just impressions.

> At the time I was not so much interested by web apps -- this is still
> not my main concern but, at times, I have to build some and I like
> both powerful and simple tools.  My experience with OCamlNet is that,
> for a newcomer, it is difficult to find ones way through it.  The
> library is impressive but, IMO, the interface could be made _simpler_
> and more orthogonal.

This is quite complicated to explain. Ocamlnet exhibits some of the
internal complexity to give "power users" more possibilities, for
example defining their own connector. Furthermore, it does not try to
hide the peculiarities of the various connector protocols. One sees that
every CGI request is performed by a new process, and for FastCGI and AJP
it is not hidden whether multi-processing or multi-threading is used to
parallelize requests.

Of course, this is confusing for beginners, but I don't really see how
to improve this without giving up modularity (i.e. every connector has
its own entry point).

> For example I am wondering why standard CGI must use [let cgi = new
> std_activation()] while FastCGI requires [Netcgi_fcgi.serv (fun cgi ->
> ...)].  Why can't the callback method be used consistently all over
> the place?  

For historical reasons, the CGI connector has a simplified entry point:

let cgi = new std_activation()

Why does this initialize for CGI? Because the argument ~env is missing,
and by default, env is tried to be taken from the process environment to
initialize for CGI. This simply means that on this level it is
implemented that CGI is the default connector.

Internally, the other connectors also create a std_activation object,
but with a certain ~env argument, making it different.

If we added the callback method for CGI, it would be simply

let cgi_serv f = f (new std_activation())

(maybe with added exception handling).

> Additional advantages are that it allows to handle
> exceptions [1], to [#finalize] automatically when the request has been
> dealt with (the user may still want to call [#finalize] manually but
> would not be required to do so) and to [#commit]/[#flush] the output.

Accepted, this would be better.

> Finally, how are we supposed to launch different threads for different
> requests [2]?

Maybe Eric can comment on this.

> About arguments: is the mutability of arguments useful?  This makes
> the whole interface more complex for a purpose I can't see.  

For example, to help for debugging. The command-line interface uses the
mutability of the arguments, too.

> Also, why
> not distinguish simple parameters (for which a method that returns a
> string is sufficient) and file uploads (for which one clearly wants
> more flexibility).

Because this is bullshit. It is not always a good idea to copy bad
habits of other libraries - I know that all other libraries treat simple
arguments and file arguments differently.

However, this is a difference that actually does not exist on the HTTP
level. I think it is shortsighted to artificially differ between things
that are principally the same. For example, what happens when a new HTML
feature is defined by W3C that requires a new kind of argument? E.g. a
rich text editor whose contents are transported with a new kind of
header? W3C will simply represent that argument in a form-encoded
request. The point is that OCamlnet can decode and represent all
form-encoded requests, no matter whether it is a file, a simple value,
or something completely different.

Btw, the uniform representation of arguments can already be very useful
now, for example for processing non-web requests.

> Why is there an exception [Std_environment_not_found]?  Isn't it the
> role of the library to reject requests with lack of information (and
> log them)?  Why bother the user with that?  (I don't even think one
> may want to customize the reply to such requests as they are just
> bogus.)

See above: CGI is the default connector, and this exception is raised
when the default does not apply.

> I have a few more questions in the same vein but will stop here
> waiting for reactions before bothering everybody even more! :-)

Ok, let's see whether this discussion is fruitful.
    
Christophe Troestler replied:
> defining their own connector.

I understand one needs to do so to extend the library but can you name
other situations?  My feeling is that CGI, FCGI, AJP, and test are the
more used ones and that a custom connector is seldom needed...  so
shouldn't the standard connectors share a common standard (of course
with a few peculiarities to each) while the function(s) to create new
ones are grouped into a separate module.  The prng* functions should
be in the main module -- an additional [random_sessionid] function
(generating e.g. 128 bits random strings) could be useful.

> Furthermore, it does not try to hide the peculiarities of the
> various connector protocols.

The purpose of the various connectors being the same, I believe they
should share a common interface whenever possible.  It is needlessly
inconvenient to have to learn different interfaces for a given
concept.  Also, whenever possible, I believe names from the standard
library should be reused (e.g. establish_server).

> One sees that every CGI request is performed by a new process, and
> for FastCGI and AJP it is not hidden whether multi-processing or
> multi-threading is used to parallelize requests.

It is good to be able to choose.  For FCGI however, I was expecting
some comments of Eric to understand better how it works (including
multiplexing).

> Of course, this is confusing for beginners, but I don't really see
> how to improve this without giving up modularity (i.e. every
> connector has its own entry point).

I am afraid that I am not sure to fully grasp which kind of modularity
you have in mind -- certainly because of my lack of experience in web
devel.  For example, I do not understand why
[Netcgi_jserv.server_init] is not just included in [server_loop].
Another reason modularity is good for it multithreading (or multiple
processes).  But there are other ways to handle that than to split
into many functions.  For example, on can imagine

  val establish_server : ?max_conns:int -> ... ->
    ?fork:((connection -> unit) -> connection -> unit) ->
    (connection -> unit) -> Unix.sockaddr -> unit

(?fork can create a process or a thread).  This makes it possible to
wrap the function handling the connection (connection -> unit) so that
exceptions it raises are dealt with appropriately -- thus for example
it seems possible to get rid of the care the user must exercise with
[Signal_shutdown]...

May you explain situations for which a [establish_server] /
[handle_request] modularity is not enough?

> If we added the callback method for CGI, it would be simply

I am not suggesting to simply _add_ one (that would just make the
whole interface more confusing) but to rework the interface so that

* all connectors are treated equally (e.g. CGI is noting special
  w.r.t. other, conceptually) and the modularity is handled the same
  way for all of them (short of optional arguments).

* a separate module possesses the material to extend netcgi, e.g. to
  create specially tailored connectors.

Another thing that seems to be lacking is a uniform way to write in
the server log.  For CGI it is stderr, FCGI uses special "channel"
(not stderr),...  This is important e.g. to log nonfatal errors.

> > About arguments: is the mutability of arguments useful?  This makes
> > the whole interface more complex for a purpose I can't see.  
> 
> For example, to help for debugging.

May you explain how?  Is it useful to modify the value of a param
inside a request handling function, with global effect (i.e. not just
for the function scope)?  Setting parameters before handling the
request is a different matter -- a powerful "test" mode can certainly
do this without mutability (exposed).

> The command-line interface uses the mutability of the arguments,

Well, it is fine with me that the function creating the environment
can modify it.  What I am objecting is that [cgi_activation] offers
functions to mutate them.

> > [Std_environment_not_found]
> See above: CGI is the default connector, and this exception is raised
> when the default does not apply.

But then, if you do not treat CGI in a special way (i.e. have distinct
CGI and test connectors) it is not needed.  In fact, it is not clear
to me why it is good to have [std_environment] and [test_environment]
in the interface as, as far as I can tell, they will just be used to
implement the associated connectors (i.e. what this modularity brings
you?).  [custom_environment] is fine and should be put in the
"extension" module.
    

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM

If you know of a better way, please let me know.


Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.


Alan Schmitt