Here is the latest Caml Weekly News, for the week of March 17 to 24, 2009.
Archive: http://groups.google.com/group/fa.caml/browse_thread/thread/165323537f036e34#Rémi Dewitte asked and Gerd Stolpmann answered:
> I have used pxp to parse xml and I am happy with it. I'd like now to > produce xml and wonder what are the options to do so (possibly the > simpliest). Maybe not the simplest: Use the PXP preprocessor to create the output tree, and print the tree: http://projects.camlcity.org/projects/dl/pxp-1.2.1/doc/manual/html/ref/Intro_preprocessor.html http://projects.camlcity.org/projects/dl/pxp-1.2.1/doc/manual/html/ref/Pxp_document.document.html#2_WritingdocumentsasXMLtext > > I think I am going to start with the Printf module. I wonder how well > it handles utf8 for example. UTF-8 are just bytes for printf. > And I'll have to write a kind of xml_encode function. I am pretty sure > it has already be done somewhere ! let xml_encode = Netencoding.Html.encode ~in_enc:`Enc_utf8 ~out_enc:`Enc_usascii ~prefer_names:false () That would assume the input is UTF-8 encoded, and the output is ASCII-encoded. You can control which ASCII characters get the special XML representation &...; with the unsafe_chars optional argument. Docs are at http://projects.camlcity.org/projects/dl/ocamlnet-2.2.9/doc/html-main/Netencoding.Html.htmlSylvain Le Gall also replied:
Maybe it is a bit overkilling, but there is also ocamlduce. See there: http://www.cduce.org/ocaml (dev for ocaml 3.11:) http://ocamlduce.forge.ocamlcore.org/ http://git.ocamlcore.org/cgi-bin/gitweb.cgi?p=ocamlduce/ocamlduce.git;a=summary OCamlduce can also be used with Eliom/OCsigen. AFAIK, using ocamlduce can help you to type check your output tree directly within OCaml compiler...Matthieu Wipliez also replied:
Yet another solution is Xmlm by Daniel Bünzli. http://erratique.ch/software/xmlm This is probably the easiest and lightweight solution: Xmlm comes as a single module and its interface, and it's BSD so you can just copy/paste it into your project.Michael Ekstrand then added:
I second the xmlm suggestion. Polling event-based parsing is very slick and maps well into the functional paradigm, and its XML writing support (generating a stream of events identical to those you read) makes generation quite intuitive and reliable.
Archive: http://groups.google.com/group/fa.caml/browse_thread/thread/3e99820fd9b65b57#Stefano Zacchiroli announced:
Hi all, ocaml-http  is looking for a new maintainer. More details have been written on my blog . If you are interested in taking over the maintenance, please mail me in private. Cheers.  http://upsilon.cc/~zack/hacking/software/ocaml-http/  http://upsilon.cc/~zack/blog/posts/2009/03/ocaml-http_is_looking_for_a_new_maintainer/
Archive: http://groups.google.com/group/fa.caml/browse_thread/thread/3249f410fe061579#In this long thread full of technical posts, Andrey Riabushenko said and Xavier Leroy replied:
> I would like to develop LLVM frontend to Ocaml language. LLVM does participate > in GSoC. LLVM do not mind to developed a ocaml frontend as LLVM GSoC project. > I want to discuss details with you before I will make an official proposal to > LLVM. [...] Do authors of ocaml has something to say about the idea? Da. A number of things, actually. 1- I know of at least 3, maybe 4 other projects that want to do something with OCaml and LLVM. Clearly, some coordination between these efforts is needed. 2- If you're applying for funding through a summer of code project, you need to articulate good reasons why you want to combine OCaml and LLVM. "Ocaml is cool, LLVM is cool, so OCaml+LLVM would be extra cool" is not enough. "It will generate PIC-16 code" isn't either. Run-time code generation could be a good enough reason, but you still need to weigh the development cost of the LLVM approach against, for example, hacking the current OCaml code generator so that it emits machine code in memory instead of assembly code. 3- A language implementation like OCaml breaks down in four big parts: 1- Front-end compiler 2- Back-end compiler and code emitter 3- Run-time system 4- OS interface Of these four, the back-end is not the biggest part nor the most complicated part. LLVM gives you part 2, but you still need to consider the other 3 parts. Saying "I'll do 1, 3 and 4 from scratch", Harrop-style, means a 5-year project. To get somewhere within a reasonable amount of time, you really need to reuse large parts of the existing OCaml code base. 4- From a quick look at LLVM specs, the two aspects that appear most problematic w.r.t. Caml are exception handling and GC interface. LLVM seems to implement C++/Java "zero-cost" exceptions, which are known to be quite costly for Caml. (Been there, done that, in the early 1990s.) I regret that LLVM does not provide support for alternate implementations of exceptions, like the C-- intermediate language of Ramsey et al does: http://portal.acm.org/citation.cfm?id=349337 The big issue, however, is GC interface. There are GC techniques that need no support from the back-end: stack maps and conservative collection. Stack maps are very costly in execution time. Conservative GC like the Boehm-Weiser GC is already much better. But for full efficiency, back-end support is required. LLVM documents a minimal interface to produce stack maps and make them available to the GC at run-time: http://llvm.org/docs/GarbageCollection.html The first thing you want to investigate is whether this interface is enough to support an exact, relocating collector such as OCaml's. According to http://www.nabble.com/Garbage-collection-td22219430.html Gordon Henriksen did succeed in interfacing OCaml's GC with LLVM. Maybe there is still some more work to do here, in which case I recommend you look into this first. 6- Here is a schematic of the Caml compiler. (Use a fixed-width font.) | | parsing and preprocessing v Parsetree (untyped AST) | | type inference and checking v Typedtree (type-annotated AST) | | pattern-matching compilation, elimination of modules, classes v Lambda / \ / \ closure conversion, inlining, uncurrying, v \ data representation strategy Bytecode \ \ Cmm | | code generation v Assembly code In my opinion, the simplest approach is to start at Cmm and generate LLVM code from there. Starting at one of the higher-level intermediate forms would entail reimplementing large chunks of the OCaml compiler. Note that Cmm is quite close to the C-- intermediate language mentioned earlier, so there is a lot to learn from Fermin Reig's experience with an OCaml/C-- back-end. 7- To finish, I'll preventively deflect some likely reactions by Jon Harrop: "But you'll be tied to OCaml's data representation strategy!" Right, but 1- implementing you own data representation strategy is a lot of work, especially if it is type-based, and 2- OCaml's strategy is close to optimal for symbolic computing. "But LLVM assembly is typed, so you must have types!" Just use int32 or int64 as your universal type and cast to appropriate pointer types in loads or stores. "But your code will be tainted by OCaml's evil license!" It is trivial to make ocamlopt dump Cmm code in a file or pipe. (The -dcmm debug option already does this.) Then, you can have a separate, untainted program that reads the Cmm code and transforms it. "But shadow stacks are the only way to go for GC interface!" No, it's probably the worst approach performance-wise; even a conservative GC should work better.Jon Harrop replied:
> 3- A language implementation like OCaml breaks down in four big parts: > 1- Front-end compiler > 2- Back-end compiler and code emitter > 3- Run-time system > 4- OS interface > Of these four, the back-end is not the biggest part nor the most > complicated part. LLVM gives you part 2, but you still need to > consider the other 3 parts. Saying "I'll do 1, 3 and 4 from scratch", > Harrop-style, means a 5-year project. On the contrary, my "style" was to provide the features that I value (multicore & FFI) in a usable form (stop-the-world) with the shortest possible development time (i.e. <<6 months to create something useful). Specifically: 1- Front-end compiler: use camlp4 to provide an embedded DSL for high-performance parallel numerics and/or reuse front-ends from existing compilers like OCaml, PolyML, MosML, NekoML to build completely new language implementations. 2- Back-end compiler and code emitter: reuse LLVM. 3- Run-time system: write the simplest possible precise GC and use stop-the-world to apply it to threads that may then run in parallel. 4- OS interface: make it as easy as possible to call C directly. HLVM had solved (2), (3) and (4) after only 3 months of part-time work. I vindicated my style! > 7- To finish, I'll preventively deflect some likely reactions by Jon > Harrop: > > "But you'll be tied to OCaml's data representation strategy!" > Right, but 1- implementing you own data representation strategy is > a lot of work, especially if it is type-based, and Actually I found that easy, not least because I wanted a user-friendly FFI so I just used C's data representation whenever possible. > 2- OCaml's strategy is close to optimal for symbolic computing. Is MLton not several times faster than OCaml for symbolic computing? > "But LLVM assembly is typed, so you must have types!" > Just use int32 or int64 as your universal type and cast to > appropriate pointer types in loads or stores. That is entirely possible and could be useful as an incremental improvement to OCaml's existing bytecode interpreter but it is not a step toward my goals. > "But your code will be tainted by OCaml's evil license!" > It is trivial to make ocamlopt dump Cmm code in a file or pipe. > (The -dcmm debug option already does this.) Then, you can have a > separate, untainted program that reads the Cmm code and transforms it. Again, that is another technically-feasible step away from my goals because OCaml's CMM has already been mangled for its data representation (e.g. 31-bit ints, boxed floats). > "But shadow stacks are the only way to go for GC interface!" > No, it's probably the worst approach performance-wise; even a > conservative GC should work better. Building a state-of-the-art optimized concurrent GC Leroy-style means an infinity-year project. =:-p Seriously though, I think it is essential to get a first working version of a GC that permits parallel threads. HLVM will be useful to a lot of people long before its GC gets optimized.
Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).
If you know of a better way, please let me know.
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.