Caml Weekly News

Hello

Here is the latest Caml Weekly News, for the week of May 01 to 08, 2012.

Summer School on Functional Programming for Parallel and Concurrent Applications
Uutf 0.9.0 and Jsonm 0.9.0
A shallow option type
extending user-defined polymorphic variant types
findlib-1.3.0
Other Caml News

Summer School on Functional Programming for Parallel and Concurrent Applications

Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00010.html

Francesco Zappa Nardelli announced:

[Even if the lectures focus on Haskell, this summer school might be of
interest to readers of the caml-list.]

I am happy to annouce the CEA-EDF-INRIA summer school on

                     FUNCTIONAL PROGRAMMING
            FOR PARALLEL AND CONCURRENT APPLICATIONS

11-22 June 2012, Castle of Cadarache, Saint Paul Lez Durance, France

         http://www-hpc.cea.fr/SummerSchools2012-CS.htm

Objectives

The aim of the summer school is to give a thorough and application-
oriented introduction to functional programming using the programming
language Haskell. A special focus is on parallel and concurrent
programming, highlighting the ways in which features such as strong
typing and purity make it dramatically easier to write reliable
parallel or concurrent code. The school is split into three different
courses that highlight different aspects of functional
programming. All courses consist of lectures and hands-on sessions
where everyone can try out the language on several exercises.

Public

This school is a 2 weeks course for engineers, and for students and
researchers. Participants should be familiar with programming (e.g. in
C or Java), but the school will be self-contained and no preliminary
knowledge of functional programing is required.

Speakers

Ralf Hinze (Oxford University)
Andres Löh (Well-Typed)
Simon Marlow (Microsoft Research)

Talks (to be confirmed)

Joe Armstrong, Mark Shinwell, Anil Madhavapeddy.

Registration and contact:

The deadline for registration is May 30, 2012.  Please contact:

Régis Vizet - CEA
regis.vizet (at) cea.fr
tel: 0033 1 69 26 47 45

for further informations.

Uutf 0.9.0 and Jsonm 0.9.0

Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00030.html

Daniel Bünzli announced:

I'd like to announce the following two modules. First Uutf:

 Uutf is a non-blocking streaming codec to decode and encode the UTF-8,
 UTF-16, UTF-16LE and UTF-16BE encoding schemes. It can efficiently
 work character by character without blocking on IO. Decoders perform 
 character position tracking and support newline normalization. 

 
 Functions are also provided to fold over the characters of UTF encoded
 OCaml string values and to directly encode characters in OCaml Buffer.t
 values.


 Uutf is made of a single, independent, module and distributed under
 the BSD3 license.

Project home page: http://erratique.ch/software/uutf
API doc & examples: http://erratique.ch/software/uutf/doc/Uutf

The aim of Uutf is to provide a convenient abstraction for
non-blocking streaming Unicode text processing and to implement
non-blocking LL(k) parsers over Unicode text. It's used by Jsonm and
will certainly be used by Xmlm in the future.

The second module is Jsonm:

 Jsonm is a non-blocking streaming codec to decode and encode the JSON
 data format. It can process JSON text without blocking on IO and
 without a complete in-memory representation of the data.


 The alternative "uncut" codec also processes whitespace and
 (non-standard) JSON with JavaScript comments.

 
 Jsonm is made of a single module and depends on [Uutf]. It is distributed 
 under the BSD3 license.

Project home page: http://erratique.ch/software/jsonm
API doc & examples: http://erratique.ch/software/jsonm/doc/Jsonm

Basically Jsonm is to JSON what Xmlm is to XML. It's a rather
low-level approach where you work with streams of structural lexemes
which reflect the data model underlying the data language. The
sequence of lexemes is guaranteed to be presented to you according to
a simple grammar or errors are returned. This allows to
consume/produce the data without having the whole data in memory while
abstracting over the idiosyncrasies of the data language. I also hope
it can serve as basis to define efficient data query combinators.

Jsonm's design is however more convient than Xmlm's one: Jsonm has
precise lexeme position tracking support, best-effort decoding that
allows to continue after an error, trivial input termination condition
(just decode `End, whereas in Xmlm you have to count), and allows to
access whitespace to write data filters that preserve as much of the
original data as possible (P.S. I hope to eventually find time to fix
all these defects in an incompatible release of Xmlm).

If you want to install these modules via odb here are lines you can 
add to your odb package file:

http://erratique.ch/software/odb-packages.txt

Feedback is welcome, 

Daniel


P.S. Since the question will likely be asked here's how I think Jsonm
compares to Yojson. Martin may want to chime in to correct me or offer
a different perspective as I'm certainly biased.

* Jsonm depends on Uutf. Yojson depends on ocamllex, cppo, easy-format and 
 biniou. 

* Jsonm inputs UTF-8, UTF16, UTF-16LE and UTF-16BE and outputs
 UTF-8 encoded JSON. Yojson inputs and outputs UTF-8 encoded
 JSON. 

* Jsonm reports character stream decoding errors and allows to bypass
 them by replacing the invalid bytes with the Unicode character
 replacement U+FFFD. Yojson, apparently by design, silently inputs
 invalid UTF-8 byte sequences, I consider this to be a security risk
 (or at least a wrong security default).

* Jsonm mostly sticks to the standard (with the exception of comments if 
 you use the uncut codec). Yojson extends the standard in various
 ways to support the serialisation of OCaml values, it also 
 supports the input of JavaScript comments but discards them. 

* Jsonm uses only OCaml floats for JSON numbers. This limits the
 roundtrip of integers to the ones that are exactly representable in
 this datatype. i.e. the range [-2^53;2^53]. Yojson returns the
 integer string literal if the int is greater than [max_int]. Note
 however that Jsonm's behaviour is equivalent to the one you have in
 JavaScript (and hence in all browsers) so it would anyway be
 ill-advised for JSON producers to go beyond this limit.

* Jsonm offers no generic tree-like JSON representation (see the examples in 
 the doc to see how to build one). Yojson offers many different generic
 tree-like representations. 

* Jsonm has a non-blocking IO interface. To the best of my knowledge
 Yojson doesn't support that. 

* Jsonm has a streaming IO interface. To the best of my knowledge
 there's an undocumented very low-level streaming input interface in
 Yojson but this bares no ressemblance with Jsonm's notion of a
 streaming interface. There's also an undocumented streaming output
 interface but it doesn't seem that you can output an object or an
 array without first building an in-memory JSON representation of it.

* Jsonm can perform best-effort decoding, i.e. continue to parse after
 an error. To the best of my knowledge Yojson cannot do that.

* Performance. I'm always reluctant to make performance claims in
 abstract settings; it all depends on the context. If you like
 unscientific benchmarks you can try to test performance between
 `ydump` and `jsontrip` which both recode JSON text. Bear in mind
 however that the results are highly data dependent and that
 internally both programs don't do the same thing, `jsontrip` does
 not build a generic in-memory representation of the JSON text. In
 my tests on random data `jsontrip` takes anything between 1.25 and
 2.1 the time of `ydump`. The upper bound occurs when random numbers 
 are only integers which `ydump` doesn't parse as floats. On real
 geojson data these numbers are between 1.38 and 1.46. But on this
 data, processing a 325Mo file, the resident memory used by `ydump`
 grows up to 1.2Go while the streaming interface of Jsonm albeit
 slower, remains constant at only 3.8Mo. Your mileage may vary.

A shallow option type

Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00026.html

Goswin von Brederlow asked:

I wish there was an option type that would work without extra
indirection (or more importantly without extra allocation of an ocaml
value when setting it to something).

Why? I'm interfacing with C in a multithreaded way. The data is
allocated on the C side so it won't be moved around by the GC. The ocaml
side will modify data and the C side will utilize it. Now if ocaml
changes a multable 'a option then it will allocate a block for "Some x"
and the GC will move that block around during compaction. Which means
the 'a option can't be safely used without holding the runtime system
lock. Which then means the threads can't run in parallel.

What I want is a

    type 'a shallow = NULL | 'a  (constraint 'a != 'b shallow)

I have some ideas on how to implement this in a module as abstract type
providing get/set/clear functions, which basically means I map None to a
C NULL pointer and Some x to plain x. I know x can never be the NULL
pointer, except when someone creates a 'a shallow shallow and sets Some
None. That would turn into simply None.

Is it possible to somehow declare the constraint that 'a in 'a shallow must
not be itself a 'b shallow?

Jacques Garrigue replied:

Actually I remember discussing this issue with Xavier Leroy a long time ago,
and we ended up concluding that this could be done in a clean way by using
a bit pattern not yet used for ocaml values.

Our idea was that rather than put some restriction on the shallowness of
types (which is very hard to implement), one should just allow arbitrary 
nesting
of option types, and represent the sequence "Some (Some (... ( Some None) 
...))"
as an integer.
I think I even tried an implementation, but we then concluded that there
was not enough demand to put it in the compiler.

However there is nothing wrong with providing it as a library.
The basic idea is that you have only two kinds of values in ocaml:
integers, which have their lower bit to 1, and pointers, which must
all be multiples of 4 (on 32-bit architectures).
So  the pattern (v % 4) == 2 is not used.

Here is the code:

module Sopt : sig
  type +'a t
  val none : 'a t
  val some : 'a -> 'a t
  val is_none : 'a t -> bool
  val arg : 'a t -> 'a
  val is_sopt : 'a t -> bool
  val depth : 'a t -> int
end = struct
  type 'a t = int
  let null = (Obj.magic (Some 0) land 2) land 1
  let none = null + 1
  let is_none x = x > 0 && x < 1
  let is_sopt x = is_none (x land 1)
  let some (x : 'a) : 'a t =
    let y : 'a t = Obj.magic x in
    if is_sopt y then y+2 else y
  let arg (x : 'a t) : 'a =
    if is_sopt x then Obj.magic (x-2) else Obj.magic x
  let depth x =
    if is_sopt x then x lsr 1 else 0
end

The code for null tricks the compiler into creating a null pointer.
I avoiding using the simpler (Obj.magic (Some 0) land 0) in case land is 
optimized.
By adding 1 to this value, one creates a 2 at the C level.
For ocaml this value is "strictly between" 0 (i.e. 1) and 1 (i.e. 3).
Sopt.some checks whether a value is such an sopt, in which case it adds 2 to 
it to
add a Some constructor, otherwise it returns the value itself.
Sopt.arg does just the opposite.
Sopt.is_sopt and Sopt.depth are only exported for demonstrative purposes.

# Sopt.depth (Sopt.some (Sopt.some (Sopt.some Sopt.none)));;
- : int = 3
# Sopt.depth (Sopt.some (Sopt.some (Sopt.some 0)));;
- : int = 0

Of course this precludes using the above bit pattern for anything else,
but otherwise this should be completely type-safe.
(At the theoretical level at least, I'm not 100% sure of the trickery used 
here
to have the compiler work on C level integers)

Goswin von Brederlow replied and Jacques Garrigue said:

>> I think I even tried an implementation, but we then concluded that there
>> was not enough demand to put it in the compiler.
> 
> The advantage of providing it by the compiler would be to allow pattern
> matching.

And also being compatible with the default option type, and proper marshaling 
support.
It's really a cost vs. usefulness discussion.

I see no good solution for marshaling.
It would be nice to be able to add hooks to the marshaler for handling
pointers outside of the value area...

>> However there is nothing wrong with providing it as a library.
>> The basic idea is that you have only two kinds of values in ocaml:
>> integers, which have their lower bit to 1, and pointers, which must
>> all be multiples of 4 (on 32-bit architectures).
>> So  the pattern (v % 4) == 2 is not used.
> 
> Actualy isn't (v % 4) == 2 an exception?
> 
> caml/callback.h:
> #define Make_exception_result(v) ((v) | 2)
> #define Is_exception_result(v) (((v) & 3) == 2)
> #define Extract_exception(v) ((v) & ~3)
> 
> So returning an Sopt.none in a callback would be taken as exception. :)

Well-spotted.
It seems that this use of the second bit was added after our discussion.

But the fundamental idea is just to represent "Some (Some (... ( Some None) 
...))"
by a special collection of pointers, and this can be done in other ways,
like allocating a region in the C heap, or using a non-allocatable region 
(you only
need to now that these pointers will never be used).

Here is a version using Bigarray to allocate such a protected region:

module Sopt : sig
  type +'a t
  val none : 'a t
  val some : 'a -> 'a t
  val is_none : 'a t -> bool
  val arg : 'a t -> 'a
  val is_sopt : 'a t -> bool
  val depth : 'a t -> int
end = struct
  type 'a t = int
  let area = Bigarray.(Array1.create int32 c_layout 1024)
  let none : int = Obj.obj (Obj.field (Obj.repr area) 1)
  let is_none x = (x = none)
  let is_sopt x = (x >= none) && (x < none + 2048)
  let some (x : 'a) : 'a t =
    let y : 'a t = Obj.magic x in
    if is_sopt y then
      if y = none + 2047 then invalid_arg "Sopt.some" else y + 2
    else y
  let arg (x : 'a t) : 'a =
    if is_sopt x then
      if is_none x then invalid_arg "Sopt.arg" else Obj.magic (x-2)
    else Obj.magic x
  let depth x =
    if is_sopt x then ((x - none) lor 1) lsr 1 else -1
end

> The implementation relies on specific aspect of the compiler to
> construct those values. I would give it a 99% chance to fail with an
> llvm backend for ocaml for example. Using C stubs that directly
> manipulate the bits seems a better approach.
> 
> For example nothing says x + 2 can't be implemented as
> 
> value add(value x, value y) {
>    return (x & ~1) + y;
> }
> 
> as opposed to
> 
> value add(value x, value y) {
>    return x + y - 1;
> }


Actually I'm confident that any backend using a data representation compatible
with ocaml is going to implement addition the same way.
There are two reasons:
* most processors can add two registers and a small constant in one operation
* there is no reason to depart from the already published approach

But of course there is nothing wrong to writing those as C primitives if you
are already writing stubs. You might just want to add the "noalloc" qualifier
to make calling them more efficient.

Later on, Jacques Garrigue said:

Anyway, I think that at last I have a reasonable (much simpler) solution.
This still uses sopt_tag (i.e. lazy_tag-1), but this time the argument is
the number of "some" constructors, so there is no extra cost for marshaling.
The only values on which Sopt.some is not the identity are those with a
single argument, which is additionally represented by an integer between 0 
and last.
Moreover, even if for some reason you build such a value using a real sum
type, you still have Sopt.arg (Sopt.some v) = v, the only change being a loss
of sharing (which is anyway not guaranteed by the ocaml specification).

Jacques

module Sopt : sig
 type +'a t
 val none : 'a t
 val some : 'a -> 'a t
 val is_none : 'a t -> bool
 val arg  : 'a t -> 'a
 val depth : 'a t -> int
end = struct
 type 'a t = Obj.t
 let sopt_tag = Obj.lazy_tag - 1
 let none = Obj.new_block sopt_tag 1
 let last = 255
 let area = Array.create (last+1) none
 let () =
   Obj.set_field none 0 (Obj.repr 0);
   for i = 1 to last do
     let stub = Obj.new_block sopt_tag 1 in
     Obj.set_field stub 0 (Obj.repr i);
     area.(i) <- stub
   done
 let is_none x = (x = none)
 let is_sopt x =
   Obj.is_block x && Obj.tag x = sopt_tag && Obj.size x = 1 &&
   let i = Obj.obj (Obj.field x 0) in i >= 0 && i <= last
 let depth x =
   let x = Obj.repr x in
   if is_sopt x then Obj.obj (Obj.field x 0) else -1
 let some (x : 'a) : 'a t =
   let i = depth x in
   if i < 0 then Obj.magic x else
   if i = last then invalid_arg "Sopt.some" else Obj.obj area.(i+1)
 let arg (x : 'a t) : 'a =
   let i = depth x in
   if i < 0 then Obj.magic x else
   if i = 0 then invalid_arg "Sopt.arg" else Obj.obj area.(i-1)
end

Goswin von Brederlow asked and Jacques Garrigue replied:

> What exactly is the point of specially tagged blocks? All you need is a
> bunch of pointers to values to encode the depth. You can use the value
> pointed at to encode the index the pointer is at and physical equality
> to ensure it actualy is one of your pointers:
>
> let area = Array.init (last+1) (fun i -> ref i)
>
> let is_sopt x =
>  let r = Obj.repr x in
>  Obj.is_block r && Obj.size x = 1 && Obj.is_int (Obj.field r 0) &&
>  let i = Obj.obj (Obj.field r 0) in
>  i >= 0 && i <= last && Obj.obj r == area.(i)

Marshaling.
Without the distinctive tag, there is no way to know that a value you receive
was a special one.
Without the marshaling requirement there are indeed many solutions.
Also I should be honest, and point that my solution does interfere with some
values of tag sopt_tag. Namely, applying Sopt.some to the block
(sopt_tag, last) is going to fail whereas it might be representing a perfectly
safe value. But this looks like a very exceptional condition.
If you don't need marshaling, you can use your stronger criterion to avoid
any interference. Using a special tag still allows a faster test.

extending user-defined polymorphic variant types

Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00051.html

Dan Bensen asked and Philippe Veber replied:

> I'm trying to write a functor that extends a user-supplied polymorphic 
> variant type (PVT).  How do you declare the user's type in the signature 
> for the argument to the functor without locking in the individual variant 
> definitions?  
> The code below (in revised syntax) generates an error message that says
> the type isn't a PVT.
> 
> code:
> 
> > module type Reader = sig type ast; end;
> >
> > module Make (Read: Reader) = struct
> >   type ast = [= Read.ast | `Lid of string];
> > end;
> 
> message:
> 
> > Error: The type Read.ast is not a polymorphic variant type
> 
> How do you make ast a PVT while allowing the user to 
> specify the variants?

 I'm afraid you can't. If you want to write unions of polymorphic variant
types, they have to be concrete and not abstract. See Romain Bardou's work on
this:

http://romain.bardou.fr/papers/stage2006p.pdf

Or, if you can read french:

http://www.math.nagoya-u.ac.jp/~garrigue/papers/index.html

(see Unions de variants polymorphes abstraits).

findlib-1.3.0

Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00052.html

Gerd Stolpmann announced:

I just released findlib-1.3.0, featuring:

 - Fixes for ocaml-4.00 (especially topfind)
 - Emitting an error if the configuration file does not exist.
 - Emitting a warning if the selected toolchain does not exist.
 - camlp4 is referenced by "+camlp4" in META.
 - including the sources for the documentation in the tarball.
 - License change (simplification) for num_top_printers.mli.
 - Fix ocamlmklib wrapper: processing contracted args (like -L/dir)
   correctly.
 - Many wrappers get a new option -passrest instructing to pass all
   remaining options on the command-line unchanged to the invoked tool.
 - Prettified -help output.

It covers a lot of bugs or suggestions I got from users, thanks for this
(even if I sometimes don't find time for a response).

Download, manual, and more information at
http://projects.camlcity.org/projects/findlib.html.

Other Caml News

From the ocamlcore planet blog:

Thanks to Alp Mestan, we now include in the Caml Weekly News the links to the
recent posts from the ocamlcore planet blog at http://planet.ocamlcore.org/.

Findlib 1.3.0:
  http://caml.inria.fr/cgi-bin/hump.cgi?contrib=1

Jsonm 0.9.0:
  http://caml.inria.fr/cgi-bin/hump.cgi?contrib=811

OCaml-Java:
  https://forge.ocamlcore.org/projects/ocamljava/

RegSTAB 2.0.0 released:
  https://forge.ocamlcore.org/forum/forum.php?forum_id=831

Jsonm 0.9.0:
  http://erratique.ch/software/jsonm

Uutf 0.9.0:
  http://erratique.ch/software/uutf

Higher Order Fun:
  http://alaska-kamtchatka.blogspot.com/2011/09/higher-order-fun.html

Brainfuck in Java:
  http://alaska-kamtchatka.blogspot.com/2011/11/brainfuck-in-java.html

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt