Previous week Up Next week


Here is the latest Caml Weekly News, for the week of November 14 to 21, 2006.

  1. parameterized pattern and F#
  2. Bytecode object files structure
  3. Symbolic derivative macro

parameterized pattern and F#


Continuing the thread from last week, Brian asked and Don Syme answered:
> I just did a quick scan of some F# docs and 
> I saw nothing. What did you have in mind? 

NET type parameters are extensional, i.e. "you can always find out what 
'a is at runtime".  In particular in C# you can just write "typeof(T)", 
and in F# "(type 'a)", in each case getting a System.Type value. 
Supporting exact runtime types was a design decision we made in the 
early design stages for .NET generics. 
As a result all .NET languages that support generics (polymorphism) have 
extensional polymorphism. It gets used heavily in the kind of 
meta-activities we're all familiar with: marshalling, pretty-printing, 
debugging (yes, Visual Studio 2005 shows you the values of "T", except 
when they've been optimized away). It also gets used internally in some 
libraries for adhoc optimization purposes, e.g. generating efficient 
comparison functions for default .NET comparers based on type arguments, 
and the F# matrix library uses it to detect when generic matrices are 
really floating point matrices, hence thunking out to more efficient 
matrix routines.   

There are downsides to extensional polymorphism (e.g. you can wind up 
take up extra registers passing type parameters), but they don't seem to 
bite in practice.  At the last ML Workshop a group at Cambridge 
University recently reported on an experiment to modify core-OCaml to 
pass runtime types, and IIRC saw no significant performance loss. 

FWIW if you're interested I'd also like to mention the huge impact OCaml 
had on the design of .NET generics and C# 2.0, which I've never properly 
described on this list.  It was seeing and experiencing polymorphic 
programming in OCaml and SML that made us persevere with the long and 
torturous process of convincing Microsoft that they should add such an 
"experimental and academic" feature as generics to their flagship 
runtime and languages.  Of course we were in reality just making 1970s 
ideas work in practice, but at least now even Visual Basic has generics.
Brian then asked and Don Syme answered:
[ FWIW let's take discussions about F# off-list, e.g. to hubFS? ] 
> As a mostly Java programmer now, I have to say I'm a bit 
> envious. C# generics look a lot better to me than the Java 5 ones. 

Well, this comparison point certainly helped to persuade Microsoft 
management to do the feature. :-)   
> What I didn't notice while looking at the F# docs was a 
> way to declare a generic function/value, where by "generic" 
> here I mean in the GCaml/CLOS sense, not the Java/Ada sense. 
> Is something like that in F#, or planned? 

Yes and no, though the topic often comes up.  Currently, operators are 
overloaded through a statically-resolved version of Ada-style trait 
constraints, which works well enough in practice. 
Haskell-style type classes or the proposed default parameters for Scala 
are other possible design points. These are a little less compelling 
when you can't redesign the whole .NET library design to take advantage 
of the feature, but still potentially worthwhile.

Bytecode object files structure


Pierre-Etienne Meunier asked and Alain Frisch answered:
> According to the source of Ocaml, there's something called the 
> "cmo_magic_number", systematically written at the beginning of all .cmo 
> files. Does it have a real function for executing the programs, or is it just 
> a way to make sure the file contains ocaml bytecode ? 

It is just a way to make sure that the file contains ocaml bytecode with 
the expected version. 
> After that, I can't understand anything : there vaguely seems to be some 
> information related to linking or so... What is the precise structure of this 
> part ? Is there some kind of a bytecode assembler ? 

The structure is a compilation unit descriptor, described in 
Yann Régis-Gianas added:
The file tools/ in the O'Caml tree may be used to parse the 
object file. This should be a first step to understand the bytecode 
file format.
Pierre-Etienne Meunier then asked and Xavier Clerc answered:
> I'd like to write an assembler, to be able to understand how the vm   
> really 
> works. I've to work on this for a school project (a compiler, I   
> want it to 
> output caml bytecode object files). 

If you are working on a compiler that should output files to be   
executed by the ocaml runtime, it does not seem necessary to handle   
cmo/cmi files as the format of bytecode file should be sufficient to   
code your compiler. Unless you have to link with ocaml modules. 
> I've understood that the data part, after the code itself, was   
> generated using 
> output_value (I didn't know this function before). 

This fonction is used by the Marshal module. It transforms any non- 
abstract value into a chain of bytes. 
The format of marshalling can be understood from the extern_rec   
function of the byterun/extern.c file. 
> What I don't get now are 
> the cu_reloc, cu_primitives and cu_imports fields of the   
> compilation_unit 
> type. 

You should remember that cmo files are parts that will be put   
together (linked) in order to create a bytecode file. 
Given this context : 
        - cu_imports lists the name of imported (used) modules the current   
cmo should be linked with in order to produce a bytecode file (the   
digest of the imported modules is also kept to ensure that you link   
with the same version you compiled against) ; 
        - cu_primitives lists the primitives declared by the current module   
(each 'external f : type1 -> type2 = "primitive" ' will result in a   
"primitive" entry of this list), needed to ensure that all required C   
primitives are provided ; 
        - cu_reloc : as each module is compiled independently, it can   
declare some elements (e.g. global variables) and use them using a 0- 
based index ; thus, when you link several modules together, you have   
to relocate this information to ensure that the first module uses   
indexes from 0 to n, the second module uses indexes from n+1 to n+m   
and so on ... 
Hope this helps, 

Xavier Clerc 

PS : I am working on some documents describing marshalling format,   
bytecode files as well as instruction opcodes. 
I will hopefully release them before xmas but don't hold your breath   
as I don't have much spare time these days. 
In the meantime, you can contact me off-list for any related question.

Symbolic derivative macro


Jon Harrop asked and Bruno De Fraine answered:
> Can someone point me to, or even knock up, a simple camlp4 macro that 
> demonstrates naively but statically computing the symbolic   
> derivative of an 
> OCaml expression? 

Since there hasn't been an answer from anyone more knowledgeable, I'm   
willing to give it a shot. 
One important part of camlp4 is the quotation system. A quotation   
allows a user function to describe how a part of the source file is   
translated into a syntax tree. 

I load an ocaml toplevel with camlp4 and the standard AST quotations.   
Since every AST node is associated with a source code location, and   
the quotations can't find this out by themselves, they require the   
variable "_loc" (previously "loc") to be defined. Since I'm not   
concerned with source locations, I define a dummy value. 

$ ocaml camlp4o.cma q_MLast.cmo 
         Objective Caml version 3.09.2 

         Camlp4 Parsing version 3.09.2 

# let _loc = Token.dummy_loc ;; 
val _loc : Token.flocation = ... 

I can then inspect the AST of some simple OCaml expressions using the   
quotation "expr". As above, I've replaced occurrences of _loc with   
"..." to keep the output readable: 

# <:expr< 1 >> ;; 
- : MLast.expr = 
# <:expr< x+1 >> ;; 
- : MLast.expr = 

To understand this latter AST, note that the expression x+1 is   
equivalent to ((+) x) 1. 

This is a good moment to mention that you can employ antiquotations   
inside quotations to specify parts that you do not want to be   
translated, but rather interpreted directly. Inside expressions you   
can antiquote e.g. expressions and literal values: 

# let i = <:expr< 1 >> in <:expr< $i$ + $i$ >> ;; 
- : MLast.expr = 
# <:expr< $str: Sys.ocaml_version$ >> ;; 
- : MLast.expr = 
# <:expr< $int: string_of_int Sys.word_size$ >> ;; 
- : MLast.expr = 

We can then define our core derivative function as a translation from   
one expression AST to another: 
(Note that I choose to make "_loc" an explicit argument to make the   
function independent from the environment) 

# let rec deriv _loc x = function 
         | MLast.ExInt (_,_) -> <:expr< 0 >> 
         | MLast.ExLid (_,n) -> 
                 let i = if n = x then "1" else "0" in 
                 <:expr< $int:i$ >> 
         | MLast.ExApp (_, 
                 MLast.ExApp (_, 
                         MLast.ExLid (_,"+"), 
                 v) -> 
                 let u' = deriv _loc x u and v' = deriv _loc x v in 
                 <:expr< $u'$ + $v'$ >> 
         | MLast.ExApp (_, 
                 MLast.ExApp (_, 
                         MLast.ExLid (_,"*"), 
                 v) -> 
                 let u' = deriv _loc x u and v' = deriv _loc x v in 
                 <:expr< $u'$ * $v$ + $u$ * $v'$ >> 
         | _ -> failwith "Not implemented" 
val deriv : MLast.loc -> string -> MLast.expr -> MLast.expr = <fun> 

You can see this already makes correct derivatives, although without   
algebraic simplification: 

# deriv _loc "x" <:expr< x*(x+y) >> = <:expr< 1*(x+y) + x*(1+0) >> ;; 
- : bool = true 

I compare to the expected result in the above expression because the   
actual AST expression is hardly readable. However, with a bit of a   
workaround, it is possible to employ a printer to print an expression   
AST in a nice form. I came up with: 
(note that Pcaml is a module that holds references to parsers,   
printers, etc. set by language extensions) 

# #load "pr_o.cmo" ;; 
# let print_expr e = !Pcaml.print_implem [MLast.StExp(_loc,e),_loc] ;; 
val print_expr : MLast.expr -> unit = <fun> 
# print_expr (deriv _loc "x" <:expr< x*(x+y) >>) ;; 
let _ = 1 * (x + y) + x * (1 + 0) 
- : unit = () 

Of course, you don't just want the AST of the derivative expression,   
you also want to evaluate it. 

One way to do this would be to install the "deriv" transformation   
function as a quotation through Quotation.add. This requires the name   
of the quotation as well as how to expand from the source text to a   
expression/pattern AST: 

# Quotation.add ;; 
- : string -> Quotation.expander -> unit = <fun> 

With file "quotation.mli" defining: 

type expander = 
   [ ExStr of bool -> string -> string 
   | ExAst of (string -> MLast.expr * string -> MLast.patt) ] 

A difficulty here is that our transformation requires the original   
expression as an AST while it is provided as a string. So we need to   
invoke the installed parsing functions first: 

# let parse_expr s = 
   Grammar.Entry.parse Pcaml.expr_eoi (Stream.of_string s) ;; 
val parse_expr : string -> MLast.expr = <fun> 
# Quotation.add "deriv_x" (Quotation.ExAst ( 
         (fun s -> deriv _loc "x" (parse_expr s)), 
         (fun _ -> failwith "Not supported"))) ;; 
- : unit = () 

Now we can do: 

# let x = 2 and y = 3 in <:deriv_x< x*(x+y) >> ;; 
- : int = 7 

Alternatively, we can install our symbolic derivative transformation   
in the main grammar of the language using the syntax extension for   
defining extensions: 

# #load "pa_extend.cmo" ;; 
     Pcaml.expr: LEVEL "expr1" [ 
       [ "deriv_x"; e = Pcaml.expr -> deriv _loc "x" e ] 
- : unit = () 

(Note that inside of the action of the grammar rule, the variable   
"_loc" is bound to the source location of the rule; so we don't   
employ the dummy location from the environment.) 

This makes deriv_x a keyword of the language and allows for it to be   
employed inside of expressions: 

# let x = 2 and y = 3 in deriv_x x*(x+y) ;; 
- : int = 7 


When not using the toplevel, you would put the definition of function   
"deriv" as well as the EXTEND statement in a source file named   
"" ("pa" because you influence the parsing phase). It can   
then be compiled as: 

$ ocamlc -c -pp 'camlp4o q_MLast.cmo pa_extend.cmo' -I +camlp4 

(The -pp flag because the syntax uses AST quotations and the EXTEND   
statement; the -I flag because we refer to types and definitions from   
module MLast.) 

To compile a source file that employs the deriv_x keyword, employ the   
preprocessor with our extension loaded: 

$ ocamlc -c -pp 'camlp4o ./pa_deriv.cmo' 

To inspect the result of preprocessing manually, you load an   
appropriate printer, e.g.: 

$ camlp4o ./pa_deriv.cmo pr_o.cmo
Yaron Minsky also answered:
There is this. which is done via a functor rather than camlp4...

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'&lt;1':1

If you know of a better way, please let me know.

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt