Caml Weekly News

Hello

Here is the latest Caml Weekly News, for the week of May 11 to 18, 2010.

OCaml-Java project: 1.4 release
Camomile 0.7.3
OCaml S3 Library?
Phantom types
Binding and evaluation order in module/type/value languages
Other Caml News

OCaml-Java project: 1.4 release

Continuing an old thread, Warren Harris asked and Xavier Clerc replied:

> I'm curious whether there are any notes / pointers regarding the
> completeness of the ocaml-java implementation (couldn't find this on the web
> site). I'm wondering about the feasibility of using it for a moderately large
> ocaml project I've been working on which uses Lwt to perform async I/O. I
> assume that for this to work with ocaml-java, the lowest levels of Lwt would
> need to be adapted to use NIO or threads in order to run on a JVM. Also my
> application is written in a pure functional style, and relies heavily on
> closures, currying, recursion and the ability for the compiler to do tail call
> optimization. I'm concerned that this will not translate well to Java.

Here is some information about the OCaml-Java project
(both current 1.4 release and future 2.0 release).

Regarding compatibility with the original implementation of OCaml:
 - ocamljava is able to compile working versions of ocamlc, ocamllex, menhir
and itself, giving some confidence in its language-compatibility level;
 - all the libraries bundled with the original implementation have been
translated and tested through some unit testing (but thread libraries tests
are only lightweight, labltk support is based on the swank project [1], and
unix is only partially supported);
 - document [2] gives per-primitive compatibility information;
 - the last page of [3] lists the differences between ocamlc/ocamlopt and
ocamljava compilers;
 - third-party libraries (such as Lwt) should be supported out-of-the-box as
long as they do neither rely on exotic features (remember: "Obj.magic is not
part of the OCaml language"), nor on C code (unless you can translate it to
Java code).

Regarding performances in current version:
 - closures are based upon Java reflection (will be based upon JDK 7's method
handles, avoiding access checks on each call);
 - only "explicit self" tail calls are optimized ("self" meaning same
function, and "explicit" meaning "not through a closure");
 - performances are terrible and memory consumption is excessive.
Bottom line: the current version should be regarded as a proof-of-concept.

However, I have spent recent vacation on the upcoming 2.0 version, with some
positive results. The runtime has been partially rewritten, reducing memory
consumption and increasing performances. The Java equivalent of "ocamlrun" is
now within an order of magnitude from the original tool on all tested
programs, and only about 4 times slower on some (e. g. "nbody" from the
language shootout [4]). This seems to indicate that -when updated- the
ocamljava compiler should be able to at least match ocamlc performances.
Notice that is a conservative statement / objective if you consider that Scala
is on par with ocamlopt on some benchmarks (but the Scala language has been
designed from the bottom up to be run on the JVM).

A final word about tail call optimization. As previously said, only "explicit
self" calls are optimized (as far as I know, this is also true for mature
compilers like Scala or Clojure). This will not change in future versions,
unless Sun / Oracle adds support for tail calls in the JVM [5] [6]. I know
that some people will regard a compiler for a functional language with no
general tail call optimization as "broken", but it is clear that OCaml-Java
will not perform program transformations (such as trampolining) to enable tail
call optimization in every cases. My understanding may be incorrect, but it
seems that such transformations result in obfuscated code (hindering HotSpot
JIT compilation) and unpredictable performances.


Some points above may need to be elaborated, but this very message is already
quite long...
Anyway, feel free to ask for further information.

Regards,

Xavier Clerc

[1] http://www.onemoonscientific.com/swank/summary.html
[2] http://cadmium.x9c.fr/distrib/cadmium-compatibility.pdf
[3] http://cafesterol.x9c.fr/distrib/cafesterol.pdf
[4] http://shootout.alioth.debian.org/
[5] http://openjdk.java.net/projects/mlvm/subprojects.html#TailCalls
[6] http://wikis.sun.com/display/mlvm/TailCalls

Camomile 0.7.3

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/91ded3b129ac8323#

Yoriyuki Yamagata announced:

I'm pleased to announce Camomile 0.7.3, a new version of Camomile, a
comprehensive Unicode library for OCaml.
This is a bug fix release. It fixes the following bugs and Camomile now works
on Windows.
    - Aliases of character encodings containing ":" are removed,
    to support Windows platform.
    - Buffering bugs in CharEncoding and OOChannel modules.
    - Tree-merging bugs of ISet and IMap.
    - Locale data are properly loaded by binary channels. (Windows related)
    - "make depend" properly generates .depend file.
    - cpp is no longer required to build from the distribution.
    The license documentation for locales/*.txt files is added.
    (locales/license.html)

  The package is tested on Windows (MinGW-port of OCaml 3.11.0 + Cygwin on
Vista SP1) and Linux (Ubuntu 8.04 + godi version of OCaml 3.11.2).

  You can download the package from https://sourceforge.net/projects/camomile/

  For more information on Camomile, see the project web page 
http://camomile.sourceforge.net/.

  I would appreciate your comments or/and opinions.  In particular, I'd like 
to know whether you can successfully operate Camomile in your platform. I have
complaints on Mac and MinGW environments, and although I believe that the
problems are fixed, I'm too lazy to find spare Mac around and test the package
:-) Also, I would like to hear about a success ( /failure ) story of Camomile.
Do you use Camomile? What for? This is important since I have to convince my
boss to allow me to invest some spare time to Camomile project.

Romain Beauxis replied:

Thanks for your work in camomile !

We have been using and enjoying it with liquidsoap [1] for probably almost 3
years now. We are using camomile to convert implicitely the various metadata
tags into unicode.

I am personally impressed by how good the module works. As far as I can tell
we have had no issues with character encoding since we are using camomile !

Romain

[1]: http://savonet.sf.net/

Sylvain Le Gall said and Yoriyuki Yamagata replied:

> I use camomile as the default implementation for ocaml-gettext:
> http://forge.ocamlcore.org/projects/ocaml-gettext
> 
> Basically I do encoding conversion with it. Projects of mine built with
> ocaml-gettext use camomile as well.
> 
> Regards
> Sylvain
> 
> ps: as asked on the BTS, a way to be able to relocate the .mar file
> would be something great -- this actually prevents me to distribute my
> projects built with camomile. (relocate -> remove the hardcoded path)

Hmm... you should be able to relocate .mar file by providing the location of
.mar files through the functor CamomileLibrary.Main.Make. If you do not link
CamomileLibrary.Default, Camomile won't load the data from the hardcoded path.
If it does, then I regards it as a bug.

Xavier Clerc said:

Camomile is used in Barista [1] to encode / decode / modify UTF8 strings of Java classes.

My opinion of Camomile is absolutely positive:
 - installation is a breeze;
 - API is complete yet straightforward;
 - it meets a need of the OCaml ecosystem.

Not yet updated to 0.7.3 but 0.7.2 used to work like a charm on both MacOS X and Fedora.

Regards,

Xavier Clerc

[1] library (part of OCaml-Java) to load / manipulate / save Java ".class" files - http://barista.x9c.fr/

Dmitry Bely asked and Yoriyuki Yamagata replied:

> How "heavy-weight" is Camomile? I was a bit scared with the size of
> its distribution. Currently I use under Windows the following my own
> simple Unicode-support module (implemented via
> WideCharToMultiByte/MultiByteToWideChar Win32 API functions). Maybe
> it's time to switch to Camomile?

The size of the package is due to mapping tables of character encoding and
localization data. they occupy several mega bytes on the disk but it is
nothing by today's standard. If you still care, you can delete any .mar files
in charmaps, locales, mappings directory. (Deleting source files in these
directory is not recommended, since it could cause a failure of compilation.)
If you delete such files, related encoding and locales do not function, but
other functionality is intact.

As for memory consumption, character mapping tables and localization data are
loaded to the memory when they are required. So if you do not use them, they
are dead on the disk. After being loaded to the memory, they are retained by
weak hashtable, therefore, when they are no longer used, GC releases them.

Some modules load data from database directory during initialization. They
persists the end of the execution, but these data are usually small (mostly
some kilobytes in the marshaled form, only allkey.mar which used in uCol
module is >100k bytes (*)). You can avoid loading them by not linking these
modules.

Finally, the programming style to use Camomile would be a bit heavy, since
Camomile uses functors a lot. In fact, the library itself is a functor which
takes modules containing initialization parameters.

Heaviness aside, Camomile does not provide OS integration, which might be a
problem for you. I tried POSIX integration but abandoned it by the reason
which I no longer remember :-) But, you can use UTF16 module to access wch
array, since Windows wch is a 16-bit integer. (UTF16.t is 16-bit integer array
implemented in bigarray module.

(*) I noticed allkey.mar can be retained by weak pointer as well (currently,
it is held in Lazy.t). I will fix it in the next release, maybe.

OCaml S3 Library?

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/fc851484522fad6c#

Daniel Patterson asked and Jerome Vouillon replied:

> Has anyone written a library to interact with amazon web services
> (specifically, S3)?

I have started writing a library for S3.  A large part of the API is
implemented.  What is missing is mostly dealing with metadata and
error handling.

You can find the current sources here:

  http://www.pps.jussieu.fr/~vouillon/S3.tar.gz

If anyone is interested to contribute, please contact me.  I will then
set up a shared repository.

Phantom types

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/0df560ee78e0f75f#

Thomas Braibant asked and Goswin von Brederlow replied:

> Following on the thread "Subtyping structurally-equivalent records, or
> something like it?", I made some experimentations with phantom types.
> Unfortunately, I turns out that I do not understand the results I got.
>
> Could someone on the list explain why the first piece of code is well
> typed, while the second one is not ?
>
>
> type p1
> type p2
>
> type 'a t = float
> let x : p1 t = 0.0
> let id : p2 t -> p2 t = fun x -> x
> let _ = id x (* type checks with type p2 t*)

This is actualy a problem, at least for me. Because that is a type error
you usualy want to specifically catch through the use of phantom types.
But ocaml knows that 'a t = float and all floats are compatible. So it
accepts all 'a t as the same.

To make the phantom type checking work for you you need to move the
definition of your phantom type into a submodule and make the type
abstract through its signature. Any functions converting from one 'a t
to 'b t also need to be in there. To avoid having to use the submodule
name all the time you can use something like

module M : sig type 'a t = private float val make : float -> 'a t end = struct
 type 'a t = float
 let make f = f
end
include M

# let x : p1 t = make 0.0;;
val x : p1 t = 0.
# let id : p2 t -> p2 t = fun x -> x;;
val id : p2 t -> p2 t = <fun>
# let _ = id x;;
Error: This expression has type p1 t = p1 M.t
      but an expression was expected of type p2 t = p2 M.t

The "private" tells the type system that nobody (outside the module) is
to create a value of that type. Only inside the module, where the type
isn't private can you create one.

> type 'a t = {l: float}
> let x : p1 t = {l = 0.0}
> let id : p2 t -> p2 t = fun x -> x
> let _ = id x (* ill typed *)

Why it works correctly here is explained nicely in the other mailss in
this thread.

Rabih Chaar also replied:

if you define the intermediate type 
type s= {l:float} 
followed by 
type 'a t = s 
everything goes well 

I am unable to give you an explanation about this (the need of the 
intermediay type s). I hope someone can shed some light on this.

Philippe Veber suggested:

It seems that the expressions typecheck when t is a type abbreviation and 
not a type definition. I don't know about the actual typing rules but this 
would be reasonable, I guess.

Dawid Toton also replied:

I think the crucial question is when new record types are born. Here is
my opinion:

The "=" sign in the above type mapping definition is what I would call
"delayed binding". "Early binding" would be equivalent to

type tmp = {lab : float}
type 'a s = tmp

(evaluate the right-hand side first, then define the mapping).

The "early binding" creates only one record type, so lab becomes
ordinary record label.
In the given example of the "delayed binding" the t becomes a machine
producing new record types.
Hence, the identifier l is not an ordinary record label. It is shared by
whole family of record types. We can see it this way:

# type 'a t = { la : float } ;;
type 'a t = { la : float; }
# {la = 0.};;
- : 'a t = {la = 0.}

So OCaml interpreter doesn't know the exact type of the last expression,
but it is clever enough to give it a generalized type.
We can use la to construct records of incompatible types:

# type 'a t = { la : float } ;;
type 'a t = { la : float; }
# let yy = ({la = 0.} : int t) ;;
val yy : int t = {la = 0.}
# let xx = ({la = 0.} : string t);;
val xx : string t = {la = 0.}
# xx = yy;;
Error: This expression has type int t but an expression was expected of type
        string t

I suppose my jargon may be not mainstream, apologies.

Binding and evaluation order in module/type/value languages

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/96dd3e1479c2c6bc#

Dawid Toton asked and Jacques Garrigue replied:

> Thinking about the discussion in the recent thread "Phantom types" [1],
> I have created the following piece of code that aims to demonstrate
> binding and evaluation order that takes effect in all three levels of OCaml.
>
> My question is: what are the precise rules is the case of type language?
> I have impression that it is lazy and memoized evaluation. But this my
> guess looks suspicious.
>
> I don't intend this question to be about inner working of the compiler,
> but about the definition at the conceptual level.

It is not clear that this will work at all.
The semantics of ocaml (contrary to SML) is not defined in terms of
type generativity.

> (* 1. Module language; side effect = create fresh record type; test =
> type equality test *)
>
> module type T = sig type t end
> module R (T:T) = struct type r = {lab : int} end
>
> module TF = struct type t = float end
> module TS = struct type t = string end
> module R1 = R(TF)
> module R2 = R(TF)
> module R3 = R(TS)
>
> let test12 (k : R1.r) (l : R2.r) = (k=l) (* pass => R1.r = R2.r *)
> let test13 (k : R1.r) (l : R3.r) = (k=l) (* pass => R1.r = R3.r *)
>
> (* Conclusion: RHS evaluated at the mapping definition point *)

Wrong: test13 fails.
OCaml functors are applicative, and type equality is the equality of
paths.
Stranger: defining twice TF, or defining an alias for TF, will result
in different types.

module TF' = TF
module R4 = R(TF')
let f (x : R1.r) = (x : R4.r);;
Error: This expression has type R1.r = R(TF).r
      but an expression was expected of type R4.r = R(TF').r

So your comment for the type language applies here:
> (* Conclusion: RHS evaluated some time after the mapping is applied;
> sort of memoization at the conceptual level *)
I'm not sure there is a timing here, but there is certainly some form
of memoization, which happens to be done by name.

> (* 2. Type language; side effect = create fresh record type; test = type
> equality test *)
>
> type 't r = {lab : int}
>
> type tf = float
> type ts = string
> type r1 = tf r
> type r2 = tf r
> type r3 = ts r
>
> let test12 (k : r1) (l : r2) = (k=l) (* pass => r1 = r2 *)
> let test13 (k : r1) (l : r3) = (k=l) (* fail => r1 ≠ r3 *)
>
> (* Conclusion: RHS evaluated some time after the mapping is applied;
> sort of memoization at the conceptual level *)

Lots of people seem to think that generativity of type definitions
gives you phantom types. This is true in Haskell, but not in ocaml.

# let f x = (x : r1 :> r3);;
val f : r1 -> r3 = <fun>

So r1 and r3 are not strictly equal, but they are equivalent from the
point of view of subtyping.
Please do not use this approach for phantom types, except if you want
the ability to forget the argument where desired.
As written in other mails, the right approach for phantom types is to
use private types, as you can control the variance of their
parameters.

Conclusion: in the type language, this is more like you thought at
first for functors, i.e. RHS evaluated at the mapping definition point.
But if the RHS is abstract/private or includes the type parameters,
then they are involved in deciding the equality (structurally, not
through generativity).

> (* 3. Value language; side effect = create fresh int; test = value
> equality test *)
> let r t = Oo.id (object end)
>
> let tf = 0.
> let ts = "A"
> let r1 = r tf
> let r2 = r tf
> let r3 = r ts
>
> let test12 = assert (r1 = r2) (* fail => r1 ≠ r2 *)
> let test13 = assert (r1 = r3) (* fail => r1 ≠ r3 *)
>
> (* Conclusion: RHS evaluated exactly at the point of mapping application *)

Sure, this is an eager language.

> [1] http://groups.google.com/group/fa.caml/browse_thread/thread/0df560ee78e0f75f#

Other Caml News

From the ocamlcore planet blog:

Thanks to Alp Mestan, we now include in the Caml Weekly News the links to the
recent posts from the ocamlcore planet blog at http://planet.ocamlcore.org/.

Asyncore:
  https://forge.ocamlcore.org/projects/asyncore/

Camomile 0.7.3:
  http://caml.inria.fr/cgi-bin/hump.cgi?contrib=85

Froc 0.2:
  http://caml.inria.fr/cgi-bin/hump.cgi?contrib=692

Observable sharing and executable proofs:
  http://alaska-kamtchatka.blogspot.com/2010/05/observable-sharing-and-executable.html

mpfi:
  https://forge.ocamlcore.org/projects/mpfi/

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt