Caml Weekly News

Hello

Here is the latest Caml Weekly News, for the week of October 06 to 13, 2009.

Core 0.6.0
camlp5/revised syntax questions
Improving OCaml's choice of type to display
Other Caml News

Core 0.6.0

Archive: http://groups.google.com/group/fa.caml/browse_thread/thread/bfe4469e378897dd#

Ralph Douglass announced:

We are proud to announce the second major release of Core, Jane Street's
alternative to OCaml's standard library.  This release also includes
Core_extended, which adds new functionality such as subcommand style command
line argument handling, a procfs interface, readline support, and more.
Core_extended is used heavily at Jane Street, but not systematically code
reviewed in the same manner as Core.

As was warned in the first release, the interfaces to many modules have
changed, so upgrade with care. Interfaces will continue to change with future
releases.

Core is intended to be used with OCaml 3.11.1.  It will not compile with 3.10.

We have tested the code on Linux (Centos 5), but have only limited experience
with it on other platforms.  It compiles on Mac OS 10.6, but has had almost no
testing on that platform, and hasn't been tested at all on anything else.

You can find the library here:

 http://www.janestreet.com/ocaml

along with three other libraries that you will need to use along with it:
type-conv, sexplib, bin-prot, and fieldslib.  These four libraries provide
macros for generating functions for serializing and deserializing types, and
for folding over records.

In addition, Core depends on Pcre and Res.  Core_extended also depends on Pcre.
You can find them at Markus's website:

 http://www.ocaml.info/home/ocaml_sources.html

If you have any comments or patches, we'd love to hear about it.  Our
blog is a great place for comments:

 http://ocaml.janestreet.com/?q=node/70

and patches should be sent to opensource@janestcapital.com.

All of the released libraries are licensed under the
LGPL-plus-linking-exception that is used by the OCaml standard
library.

Markus Mottl then added:

The new Jane Street Core library is now also available through Godi 
and is tested to build on Linux.

camlp5/revised syntax questions

Archive: http://groups.google.com/group/fa.caml/browse_thread/thread/16e3654fbfda9720#

Aaron Bohannon asked and blue storm replied:

Disclaimer : I have learned camlp4 from the ocaml distribution >=
3.10, wich is different from camlp5 : take my words with a grain of
salt.

Aaron Bohannon wrote:
> >From reading the camlp5 documentation, I've managed to write a syntax
> extension that adds a new expression starting with a distinct keyword,
> and it seems to work fine.  However, if I want to experiment with
> infix notations, things get a little trickier.  I need to specify it's
> precedence and associativity, of course.
>
> So, there is a list of syntactic structures on this page:
> http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/ast_transi.html
>
> 1) Where can I find the "level names" for each of these syntactic
> constructs, for use with BEFORE, LIKE, etc?  Is that what the
> "Comment" column is for?

The different "level names" are not absolute among all camlp4 grammars
: they're a property of each grammar rule of each grammar. If you want
to "modify" a specific grammar (that is, EXTEND it), you must check
the different levels available in the definition. See the files
meta/pa_r.ml and etc/pa_o.ml in the camlp5 source tree for the
definition of the revised and classical syntax, respectively. Luckily,
the expr rules (the one defining ocaml expressions, wich you seem
interested in) of both syntax mostly share the same levels (the
revised syntax has an additional "where" level for example, but
they're otherwise mostly the same).

There is no explicitely "precedence" in camlp4 parlance, you must use
the levels instead : each precedence level of infix operators has it
own level, usually named after the most representative infix operator
of the level : ':=', '||', '&&', '<'...

> 2) I am confused by the fact that this is a list is for the revised
> syntax.  I think most people (including me) want to modify the
> original syntax.  e.g., imagine that I want to modify the record
> update operator "<-" in the original syntax---I need to refer to it
> somehow, but it doesn't even appear in the list.

While concrete syntaxes for revised and classical syntax are
different, the abstract syntax tree is the same. Camlp4 quotations
works by replacing (using camlp4) the quotation you wrote by the
concrete ocaml AST representation. A nice side-effet of this is that
you can use quotations "in the revised syntax" (the code inside
quotations use the revised syntax) when writing an extension for the
classical syntax. Eg. if you parse "for i = 0 to 10 step 2 do ...
done" and you want to generate the OCaml AST corresponding to "for i =
0 to 10/2 do let i = i * 2 in ... done", you can write <:expr< for i =
0 to 10 / 2 do { let i = i * 2 in ... } >> , it will generate the
corresponding AST and be printed (after processing the source using
the extenion) in whatever syntax the user of your extension is using
(probably the classical one).

This way (using revised syntax inside the quotation of your
extension), your can stay consistent with the camlp5 documentation
(wich describes the quotation in the revised syntax). camlp4 >= 3.10
also has quotations in the classical syntax, but I wouldn't recommend
using them : revised syntax is a less ambiguous syntax wich makes
those things easier.

In your specific case, you can parse whatever syntax you want using
the "<-" operator, then output the corresponding AST using a quotation
in the revised syntax, that is <:expr< a := b >> (instead of "a <-
b"). For a reference, see the related rules in etc/pa_o.ml :

 | ":=" NONA
   [ e1 = SELF; ":="; e2 = expr LEVEL "expr1" ->
     <:expr< $e1$.val := $e2$ >>
   | e1 = SELF; "<-"; e2 = expr LEVEL "expr1" ->
     <:expr< $e1$ := $e2$ >> ]

Later on, Aaron Bohannon asked and blue storm replied:

> Thanks for your detailed reply.  I had a suspicion I would have to
> read the source code to get the all of the necessary documentation.

It is actually possible to pretty-print the grammar rules during
camlp* execution. For example, here is the code I gave to the toplevel
(using ocamlfind/findlib) to print the default "expr" grammar and
levels :

 #use "topfind";;
 #camlp4o;;
 open Camlp4.PreCast;;
 Gram.Entry.print Format.std_formatter Syntax.expr;;

This is probably camlp4-specific, but the printing routine is
documented for camlp5 (
http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/library.html#b:printing-grammar-entries
) so an equivalent code should work. I however prefer to read the
source code, wich is easier to browse and contains more information
(pretty printing shows the parsing rules, but not the parse action).

> However, I'm still missing some basic point here.
>
> blue storm wrote:
>> The different "level names" are not absolute among all camlp4 grammars
>> : they're a property of each grammar rule of each grammar. If you want
>> to "modify" a specific grammar (that is, EXTEND it), you must check
>> the different levels available in the definition.
>
> Yes, I understand that.  But how do you specify which grammar your
> file is extending?  My file is structured like this:
>
> #load "pa_extend.cmo";
> #load "q_MLast.cmo";
> open Pcaml;
>
> EXTEND
>  GLOBAL: expr;
>  ...
> END;
>
> So where did I specify whether I was extending the original syntax or
> the revised syntax (or some other grammar entirely)?  I suppose I must
> have implicitly chosen the original syntax because my code works fine
> on that.

The syntax extension mechanism is imperative in nature : the EXTEND
statement works on an existing grammar and add/change/delete rules
(camlp5 documentation :
http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/grammars.html ) : more
precisely, the EXTEND syntax is a camlp4 extension itself, wich gets
desugared to a bare ocaml expression wich modifies the given
Grammar.Entry.t values (in an imperative way).

The revised and classical syntax are designed as syntax extensions
(pa_o.ml pa_r.ml) that extend an empty grammar, wich already contains
some (empty) grammar entries. They first clear every entry of that
grammar (probably to make sure it's really empty), then add by
extension every syntaxic construct of the ocaml language. They get
compiled to pa_o.cmo and pa_r.cmo, wich you can pass to camlp4 to
choose one of the two syntax :
 camlp4 pa_o.cmo my_extension.cmo ...

What happens here is that :
 - camlp4 starts with an empty ocaml grammar
 - you link it to pa_o.cmo, wich gets executed and set up the
classical syntax (by mutation of the (empty) grammar entries)
 - you then add your own extension wich makes additional mutations

In essence, the effect of your extension depends on the side effects
that were done before. If pa_o.cmo or pa_r.cmo was passed as a
parameter, you build upon their syntax rules, but it can be the case
that an additional syntax extension was added before yours, and thus
you're actually working upon slightly modified syntax rules.

camlp4o and camlp4r are just packaged versions of camlp4, wich
respectively "pa_o.cmo" and "pa_r.cmo" implicitly linked.

In general, reasonably local syntax extension tends to work on both
the classical and the revised syntax (because their syntax rules are
quite similar). If your extension depends on one of the syntax, you
should specify it. If your extension tries to delete a rule wich was
not present in the syntax you're extending, you will get a runtime
error (for example, trying to delete the "where"-related rule in the
classical syntax).

> 1) In the parsing rule for the simple dot noation...
>
>      | e1 = SELF; "."; e2 = SELF -> <:expr< $e1$ . $e2$ >> ]
>
> ...why is the field label an "expr"?  This does not agree with the
> OCaml manual, which has a separate syntactic category for "field"
> (http://caml.inria.fr/pub/docs/manual-ocaml/expr.html), nor with my
> intuition about the meaning of the code.

Is suppose this presentation was chosen to make the grammar rules simpler.

Camlp4 parsers are not tied to the documented ocaml grammar. Camlp4
grammars for ocaml (you can use camlp4 to parse other languages,
without necessarily starting from the OCaml grammar) use a
camlp4-specific ocaml AST with then get translated to the specific AST
the OCaml compiler expects (when no camlp4 preprocessing is needed,
the ocaml compiler use its own yacc parser wich directly produces the
ocaml-compiler AST).
There are actually subtle differences in parsing (for example "let id
x = x in id fun _ -> ()" gets rejected by the non-camlp4 parser but
parses fine under camlp4 and camlp5), and I don't think any of them is
"right" : they are all tied to implementation-specific parsing
strategies (weird recursive descent for camlp{4,5} and yacc), and I'm
not sure even the yacc version rigourously respects the documented BNF
grammar.

> 2) Furthermore, as one can see from the ":=" entry above, the entire
> left side of a record update is parsed as its own subexpression.  So
> this means, that in the context of a record update, that subexpression
> has to be interpreted as a reference, but in other contexts, the very
> same expression must be interpreted as a value.  I don't necessarily
> care what kind of magic makes this possible on the back end, but I am
> wondering whether this has any implications for modifying the record
> syntax.

I'm not sure what you mean here, but I'm under the impression that
you're confusing the syntaxic representation of the expression and its
runtime/compile-time semantic. Camlp* knows nothing of the meaning of
the code it produces; the output is an AST wich has no idea of what a
"reference" and a "value" means. The semantic of the given code
depends on the deeper passes of the compiler (for example typing),
wich probably have an internal language of their own, and surely make
the difference between lvalue and rvalues.

Improving OCaml's choice of type to display

Archive: http://groups.google.com/group/fa.caml/browse_thread/thread/360508774c821064#

Yaron Minsky asked:

Anyone who plays around with the Core library that Jane Street just released
can see showcased a rather ugly detail of how Core's design interacts with how
OCaml displays types. Witness:

# Int.of_string;;
- : string -> Core.Std.Int.stringable = <fun>
# Float.of_string;;
- : string -> Core_extended.Std.Float.stringable = <fun>

I'd be much happier if this was rendered in the following equally correct and
more readable form:

# Int.of_string;;
- : string -> Int.t = <fun>
# Float.of_string;;
- : string -> Float.t = <fun>

Or even:

# Int.of_string;;
- : string -> int = <fun>
# Float.of_string;;
- : string -> float = <fun>

And this isn't just an issue in the top-level. The compiler also displays
types in the same difficult to read form. I'm wondering if anyone has some
thoughts as to what we can do to make the compiler make better choices here.
There are two issues to overcome:

- Dropping the module name. I'd love to give the compiler the hint that
  Core.Std. could be dropped from the prefix in a context where that module is
  open. This is what's done with the pervasives module already, I believe, so
  it seems like it should be doable here.

- Choosing shorter names. This one seems harder, but there are various
  different possibilities for what type name to print out, and a reasonable
  heuristic to use might be to pick the shortest one. Part of the reason these
  issues come up is our use of standardized interface components (that's where
  the "stringable" type name comes from). I suspect this one will be hard to
  fix, sadly.

Anyway, we'd be happy with any suggestions on how to improve matters.

After much discussion, Jun Furuse replied:

I have quickly wrote a small patch against 3.11.1 for this feature, to
see what it would be like.

http://sites.google.com/a/furuse.info/jun/hacks/other-small-ocaml-patches

With this patch, path names are printed without opened modules in the
context. It also tries heuristic data type name simplification: if a
variant/record data type is an alias of another data type whose name
is shorter than the original, the printer uses the latter.

For example:

# open Hashtbl;;
# let tbl = Hashtbl.create 10;;
val tbl : ('_a, '_b) t = <abstr>      (* not Hashtbl.t, since Hashtbl is open *)

# type t = int;;
type t = int
# type long_name = int;;
type long_name = int
# (1 : t);;
- : t = 1                     (* t is the shorter than its original type int *)
# (1 : long_name);;
- : int = 1                   (* int is shorter name for long_name. t
is even shorter but not aliased unfortunatelly. *)

I warn you that the patch is very quickly written and not tested well. Enjoy!

Other Caml News

From the ocamlcore planet blog:

Thanks to Alp Mestan, we now include in the Caml Weekly News the links to the
recent posts from the ocamlcore planet blog at http://planet.ocamlcore.org/.
  
ocaml-autoconf 1.1:
  http://upsilon.cc/~zack/blog/posts/2009/10/ocaml-autoconf_1.1/

OBus 1.0rc1 is out!:
  http://forge.ocamlcore.org/forum/forum.php?forum_id=442

new ocaml-autoconf release: 1.1:
  http://forge.ocamlcore.org/forum/forum.php?forum_id=443

Constructive gem: double exponentials:
  http://math.andrej.com/2009/10/12/constructive-gem-double-exponentials/

Optimizing List.map:
  http://ocaml.janestcapital.com/?q=node/71

Core 0.6.0 release:
  http://ocaml.janestcapital.com/?q=node/70

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt