Hello
Here is the latest Caml Weekly News, for the week of May 01 to 08, 2012.
Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00010.html
Francesco Zappa Nardelli announced:[Even if the lectures focus on Haskell, this summer school might be of interest to readers of the caml-list.] I am happy to annouce the CEA-EDF-INRIA summer school on FUNCTIONAL PROGRAMMING FOR PARALLEL AND CONCURRENT APPLICATIONS 11-22 June 2012, Castle of Cadarache, Saint Paul Lez Durance, France http://www-hpc.cea.fr/SummerSchools2012-CS.htm Objectives The aim of the summer school is to give a thorough and application- oriented introduction to functional programming using the programming language Haskell. A special focus is on parallel and concurrent programming, highlighting the ways in which features such as strong typing and purity make it dramatically easier to write reliable parallel or concurrent code. The school is split into three different courses that highlight different aspects of functional programming. All courses consist of lectures and hands-on sessions where everyone can try out the language on several exercises. Public This school is a 2 weeks course for engineers, and for students and researchers. Participants should be familiar with programming (e.g. in C or Java), but the school will be self-contained and no preliminary knowledge of functional programing is required. Speakers Ralf Hinze (Oxford University) Andres Löh (Well-Typed) Simon Marlow (Microsoft Research) Talks (to be confirmed) Joe Armstrong, Mark Shinwell, Anil Madhavapeddy. Registration and contact: The deadline for registration is May 30, 2012. Please contact: Régis Vizet - CEA regis.vizet (at) cea.fr tel: 0033 1 69 26 47 45 for further informations.
Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00030.html
Daniel Bünzli announced:I'd like to announce the following two modules. First Uutf: Uutf is a non-blocking streaming codec to decode and encode the UTF-8, UTF-16, UTF-16LE and UTF-16BE encoding schemes. It can efficiently work character by character without blocking on IO. Decoders perform character position tracking and support newline normalization. Functions are also provided to fold over the characters of UTF encoded OCaml string values and to directly encode characters in OCaml Buffer.t values. Uutf is made of a single, independent, module and distributed under the BSD3 license. Project home page: http://erratique.ch/software/uutf API doc & examples: http://erratique.ch/software/uutf/doc/Uutf The aim of Uutf is to provide a convenient abstraction for non-blocking streaming Unicode text processing and to implement non-blocking LL(k) parsers over Unicode text. It's used by Jsonm and will certainly be used by Xmlm in the future. The second module is Jsonm: Jsonm is a non-blocking streaming codec to decode and encode the JSON data format. It can process JSON text without blocking on IO and without a complete in-memory representation of the data. The alternative "uncut" codec also processes whitespace and (non-standard) JSON with JavaScript comments. Jsonm is made of a single module and depends on [Uutf]. It is distributed under the BSD3 license. Project home page: http://erratique.ch/software/jsonm API doc & examples: http://erratique.ch/software/jsonm/doc/Jsonm Basically Jsonm is to JSON what Xmlm is to XML. It's a rather low-level approach where you work with streams of structural lexemes which reflect the data model underlying the data language. The sequence of lexemes is guaranteed to be presented to you according to a simple grammar or errors are returned. This allows to consume/produce the data without having the whole data in memory while abstracting over the idiosyncrasies of the data language. I also hope it can serve as basis to define efficient data query combinators. Jsonm's design is however more convient than Xmlm's one: Jsonm has precise lexeme position tracking support, best-effort decoding that allows to continue after an error, trivial input termination condition (just decode `End, whereas in Xmlm you have to count), and allows to access whitespace to write data filters that preserve as much of the original data as possible (P.S. I hope to eventually find time to fix all these defects in an incompatible release of Xmlm). If you want to install these modules via odb here are lines you can add to your odb package file: http://erratique.ch/software/odb-packages.txt Feedback is welcome, Daniel P.S. Since the question will likely be asked here's how I think Jsonm compares to Yojson. Martin may want to chime in to correct me or offer a different perspective as I'm certainly biased. * Jsonm depends on Uutf. Yojson depends on ocamllex, cppo, easy-format and biniou. * Jsonm inputs UTF-8, UTF16, UTF-16LE and UTF-16BE and outputs UTF-8 encoded JSON. Yojson inputs and outputs UTF-8 encoded JSON. * Jsonm reports character stream decoding errors and allows to bypass them by replacing the invalid bytes with the Unicode character replacement U+FFFD. Yojson, apparently by design, silently inputs invalid UTF-8 byte sequences, I consider this to be a security risk (or at least a wrong security default). * Jsonm mostly sticks to the standard (with the exception of comments if you use the uncut codec). Yojson extends the standard in various ways to support the serialisation of OCaml values, it also supports the input of JavaScript comments but discards them. * Jsonm uses only OCaml floats for JSON numbers. This limits the roundtrip of integers to the ones that are exactly representable in this datatype. i.e. the range [-2^53;2^53]. Yojson returns the integer string literal if the int is greater than [max_int]. Note however that Jsonm's behaviour is equivalent to the one you have in JavaScript (and hence in all browsers) so it would anyway be ill-advised for JSON producers to go beyond this limit. * Jsonm offers no generic tree-like JSON representation (see the examples in the doc to see how to build one). Yojson offers many different generic tree-like representations. * Jsonm has a non-blocking IO interface. To the best of my knowledge Yojson doesn't support that. * Jsonm has a streaming IO interface. To the best of my knowledge there's an undocumented very low-level streaming input interface in Yojson but this bares no ressemblance with Jsonm's notion of a streaming interface. There's also an undocumented streaming output interface but it doesn't seem that you can output an object or an array without first building an in-memory JSON representation of it. * Jsonm can perform best-effort decoding, i.e. continue to parse after an error. To the best of my knowledge Yojson cannot do that. * Performance. I'm always reluctant to make performance claims in abstract settings; it all depends on the context. If you like unscientific benchmarks you can try to test performance between `ydump` and `jsontrip` which both recode JSON text. Bear in mind however that the results are highly data dependent and that internally both programs don't do the same thing, `jsontrip` does not build a generic in-memory representation of the JSON text. In my tests on random data `jsontrip` takes anything between 1.25 and 2.1 the time of `ydump`. The upper bound occurs when random numbers are only integers which `ydump` doesn't parse as floats. On real geojson data these numbers are between 1.38 and 1.46. But on this data, processing a 325Mo file, the resident memory used by `ydump` grows up to 1.2Go while the streaming interface of Jsonm albeit slower, remains constant at only 3.8Mo. Your mileage may vary.
Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00026.html
Goswin von Brederlow asked:I wish there was an option type that would work without extra indirection (or more importantly without extra allocation of an ocaml value when setting it to something). Why? I'm interfacing with C in a multithreaded way. The data is allocated on the C side so it won't be moved around by the GC. The ocaml side will modify data and the C side will utilize it. Now if ocaml changes a multable 'a option then it will allocate a block for "Some x" and the GC will move that block around during compaction. Which means the 'a option can't be safely used without holding the runtime system lock. Which then means the threads can't run in parallel. What I want is a type 'a shallow = NULL | 'a (constraint 'a != 'b shallow) I have some ideas on how to implement this in a module as abstract type providing get/set/clear functions, which basically means I map None to a C NULL pointer and Some x to plain x. I know x can never be the NULL pointer, except when someone creates a 'a shallow shallow and sets Some None. That would turn into simply None. Is it possible to somehow declare the constraint that 'a in 'a shallow must not be itself a 'b shallow?Jacques Garrigue replied:
Actually I remember discussing this issue with Xavier Leroy a long time ago, and we ended up concluding that this could be done in a clean way by using a bit pattern not yet used for ocaml values. Our idea was that rather than put some restriction on the shallowness of types (which is very hard to implement), one should just allow arbitrary nesting of option types, and represent the sequence "Some (Some (... ( Some None) ...))" as an integer. I think I even tried an implementation, but we then concluded that there was not enough demand to put it in the compiler. However there is nothing wrong with providing it as a library. The basic idea is that you have only two kinds of values in ocaml: integers, which have their lower bit to 1, and pointers, which must all be multiples of 4 (on 32-bit architectures). So the pattern (v % 4) == 2 is not used. Here is the code: module Sopt : sig type +'a t val none : 'a t val some : 'a -> 'a t val is_none : 'a t -> bool val arg : 'a t -> 'a val is_sopt : 'a t -> bool val depth : 'a t -> int end = struct type 'a t = int let null = (Obj.magic (Some 0) land 2) land 1 let none = null + 1 let is_none x = x > 0 && x < 1 let is_sopt x = is_none (x land 1) let some (x : 'a) : 'a t = let y : 'a t = Obj.magic x in if is_sopt y then y+2 else y let arg (x : 'a t) : 'a = if is_sopt x then Obj.magic (x-2) else Obj.magic x let depth x = if is_sopt x then x lsr 1 else 0 end The code for null tricks the compiler into creating a null pointer. I avoiding using the simpler (Obj.magic (Some 0) land 0) in case land is optimized. By adding 1 to this value, one creates a 2 at the C level. For ocaml this value is "strictly between" 0 (i.e. 1) and 1 (i.e. 3). Sopt.some checks whether a value is such an sopt, in which case it adds 2 to it to add a Some constructor, otherwise it returns the value itself. Sopt.arg does just the opposite. Sopt.is_sopt and Sopt.depth are only exported for demonstrative purposes. # Sopt.depth (Sopt.some (Sopt.some (Sopt.some Sopt.none)));; - : int = 3 # Sopt.depth (Sopt.some (Sopt.some (Sopt.some 0)));; - : int = 0 Of course this precludes using the above bit pattern for anything else, but otherwise this should be completely type-safe. (At the theoretical level at least, I'm not 100% sure of the trickery used here to have the compiler work on C level integers)Goswin von Brederlow replied and Jacques Garrigue said:
>> I think I even tried an implementation, but we then concluded that there >> was not enough demand to put it in the compiler. > > The advantage of providing it by the compiler would be to allow pattern > matching. And also being compatible with the default option type, and proper marshaling support. It's really a cost vs. usefulness discussion. I see no good solution for marshaling. It would be nice to be able to add hooks to the marshaler for handling pointers outside of the value area... >> However there is nothing wrong with providing it as a library. >> The basic idea is that you have only two kinds of values in ocaml: >> integers, which have their lower bit to 1, and pointers, which must >> all be multiples of 4 (on 32-bit architectures). >> So the pattern (v % 4) == 2 is not used. > > Actualy isn't (v % 4) == 2 an exception? > > caml/callback.h: > #define Make_exception_result(v) ((v) | 2) > #define Is_exception_result(v) (((v) & 3) == 2) > #define Extract_exception(v) ((v) & ~3) > > So returning an Sopt.none in a callback would be taken as exception. :) Well-spotted. It seems that this use of the second bit was added after our discussion. But the fundamental idea is just to represent "Some (Some (... ( Some None) ...))" by a special collection of pointers, and this can be done in other ways, like allocating a region in the C heap, or using a non-allocatable region (you only need to now that these pointers will never be used). Here is a version using Bigarray to allocate such a protected region: module Sopt : sig type +'a t val none : 'a t val some : 'a -> 'a t val is_none : 'a t -> bool val arg : 'a t -> 'a val is_sopt : 'a t -> bool val depth : 'a t -> int end = struct type 'a t = int let area = Bigarray.(Array1.create int32 c_layout 1024) let none : int = Obj.obj (Obj.field (Obj.repr area) 1) let is_none x = (x = none) let is_sopt x = (x >= none) && (x < none + 2048) let some (x : 'a) : 'a t = let y : 'a t = Obj.magic x in if is_sopt y then if y = none + 2047 then invalid_arg "Sopt.some" else y + 2 else y let arg (x : 'a t) : 'a = if is_sopt x then if is_none x then invalid_arg "Sopt.arg" else Obj.magic (x-2) else Obj.magic x let depth x = if is_sopt x then ((x - none) lor 1) lsr 1 else -1 end > The implementation relies on specific aspect of the compiler to > construct those values. I would give it a 99% chance to fail with an > llvm backend for ocaml for example. Using C stubs that directly > manipulate the bits seems a better approach. > > For example nothing says x + 2 can't be implemented as > > value add(value x, value y) { > return (x & ~1) + y; > } > > as opposed to > > value add(value x, value y) { > return x + y - 1; > } Actually I'm confident that any backend using a data representation compatible with ocaml is going to implement addition the same way. There are two reasons: * most processors can add two registers and a small constant in one operation * there is no reason to depart from the already published approach But of course there is nothing wrong to writing those as C primitives if you are already writing stubs. You might just want to add the "noalloc" qualifier to make calling them more efficient.Later on, Jacques Garrigue said:
Anyway, I think that at last I have a reasonable (much simpler) solution. This still uses sopt_tag (i.e. lazy_tag-1), but this time the argument is the number of "some" constructors, so there is no extra cost for marshaling. The only values on which Sopt.some is not the identity are those with a single argument, which is additionally represented by an integer between 0 and last. Moreover, even if for some reason you build such a value using a real sum type, you still have Sopt.arg (Sopt.some v) = v, the only change being a loss of sharing (which is anyway not guaranteed by the ocaml specification). Jacques module Sopt : sig type +'a t val none : 'a t val some : 'a -> 'a t val is_none : 'a t -> bool val arg : 'a t -> 'a val depth : 'a t -> int end = struct type 'a t = Obj.t let sopt_tag = Obj.lazy_tag - 1 let none = Obj.new_block sopt_tag 1 let last = 255 let area = Array.create (last+1) none let () = Obj.set_field none 0 (Obj.repr 0); for i = 1 to last do let stub = Obj.new_block sopt_tag 1 in Obj.set_field stub 0 (Obj.repr i); area.(i) <- stub done let is_none x = (x = none) let is_sopt x = Obj.is_block x && Obj.tag x = sopt_tag && Obj.size x = 1 && let i = Obj.obj (Obj.field x 0) in i >= 0 && i <= last let depth x = let x = Obj.repr x in if is_sopt x then Obj.obj (Obj.field x 0) else -1 let some (x : 'a) : 'a t = let i = depth x in if i < 0 then Obj.magic x else if i = last then invalid_arg "Sopt.some" else Obj.obj area.(i+1) let arg (x : 'a t) : 'a = let i = depth x in if i < 0 then Obj.magic x else if i = 0 then invalid_arg "Sopt.arg" else Obj.obj area.(i-1) endGoswin von Brederlow asked and Jacques Garrigue replied:
> What exactly is the point of specially tagged blocks? All you need is a > bunch of pointers to values to encode the depth. You can use the value > pointed at to encode the index the pointer is at and physical equality > to ensure it actualy is one of your pointers: > > let area = Array.init (last+1) (fun i -> ref i) > > let is_sopt x = > let r = Obj.repr x in > Obj.is_block r && Obj.size x = 1 && Obj.is_int (Obj.field r 0) && > let i = Obj.obj (Obj.field r 0) in > i >= 0 && i <= last && Obj.obj r == area.(i) Marshaling. Without the distinctive tag, there is no way to know that a value you receive was a special one. Without the marshaling requirement there are indeed many solutions. Also I should be honest, and point that my solution does interfere with some values of tag sopt_tag. Namely, applying Sopt.some to the block (sopt_tag, last) is going to fail whereas it might be representing a perfectly safe value. But this looks like a very exceptional condition. If you don't need marshaling, you can use your stronger criterion to avoid any interference. Using a special tag still allows a faster test.
Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00051.html
Dan Bensen asked and Philippe Veber replied:> I'm trying to write a functor that extends a user-supplied polymorphic > variant type (PVT). How do you declare the user's type in the signature > for the argument to the functor without locking in the individual variant > definitions? > The code below (in revised syntax) generates an error message that says > the type isn't a PVT. > > code: > > > module type Reader = sig type ast; end; > > > > module Make (Read: Reader) = struct > > type ast = [= Read.ast | `Lid of string]; > > end; > > message: > > > Error: The type Read.ast is not a polymorphic variant type > > How do you make ast a PVT while allowing the user to > specify the variants? I'm afraid you can't. If you want to write unions of polymorphic variant types, they have to be concrete and not abstract. See Romain Bardou's work on this: http://romain.bardou.fr/papers/stage2006p.pdf Or, if you can read french: http://www.math.nagoya-u.ac.jp/~garrigue/papers/index.html (see Unions de variants polymorphes abstraits).
Archive: https://sympa-roc.inria.fr/wws/arc/caml-list/2012-05/msg00052.html
Gerd Stolpmann announced:I just released findlib-1.3.0, featuring: - Fixes for ocaml-4.00 (especially topfind) - Emitting an error if the configuration file does not exist. - Emitting a warning if the selected toolchain does not exist. - camlp4 is referenced by "+camlp4" in META. - including the sources for the documentation in the tarball. - License change (simplification) for num_top_printers.mli. - Fix ocamlmklib wrapper: processing contracted args (like -L/dir) correctly. - Many wrappers get a new option -passrest instructing to pass all remaining options on the command-line unchanged to the invoked tool. - Prettified -help output. It covers a lot of bugs or suggestions I got from users, thanks for this (even if I sometimes don't find time for a response). Download, manual, and more information at http://projects.camlcity.org/projects/findlib.html.
Thanks to Alp Mestan, we now include in the Caml Weekly News the links to the recent posts from the ocamlcore planet blog at http://planet.ocamlcore.org/. Findlib 1.3.0: http://caml.inria.fr/cgi-bin/hump.cgi?contrib=1 Jsonm 0.9.0: http://caml.inria.fr/cgi-bin/hump.cgi?contrib=811 OCaml-Java: https://forge.ocamlcore.org/projects/ocamljava/ RegSTAB 2.0.0 released: https://forge.ocamlcore.org/forum/forum.php?forum_id=831 Jsonm 0.9.0: http://erratique.ch/software/jsonm Uutf 0.9.0: http://erratique.ch/software/uutf Higher Order Fun: http://alaska-kamtchatka.blogspot.com/2011/09/higher-order-fun.html Brainfuck in Java: http://alaska-kamtchatka.blogspot.com/2011/11/brainfuck-in-java.html
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.