Caml Weekly News

Hello

Here is the latest Caml Weekly News, for the week of October 02 to 09, 2007.

Weaktbl: a weak hash table library
Announcing The Decision Procedure Toolkit Version 1.1
Job Announcement
Smoke 2.01 for OCaml 3.09
camlp4: Parsing and transforming class definitions
camlp4: Understanding class_declaration
Wink Technologies release OCaml libraries
camlp4: Creating my own quasiquotations
Camlp5 new release 5.01
camlp4: Where's Loc?
Pretty-printing the OCaml AST from the toplevel
Correct way of programming a CGI script

Weaktbl: a weak hash table library

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/64e3bd4ff4ad7699/966777a90a9da8b6#966777a90a9da8b6

Zheng Li announced:

Follow some advices, I updated Weaktbl to version 0.02. Notable improvements 
include: 

 - the implementation is improved based on the clarified semantics 
 - a new weak stack implementation is used internally (I'm not sure whether 
   it's useful to others so I decide not to expose it through interface) 
 - add new sections to README 

The interface remains the same. Anyone has downloaded the previous version is 
strongly suggested to upgrade. Btw, the code is released under LGPL + linking 
exception. 

Here are the extra sections from the README: 
------------------------------------------- 

..... 

== Semantics == 

Weaktbl have all the same semantics as standard Hashtbl, with the only 
exception: a binding will be automatically garbage collected, if *its key* is 
no longer referenced by (hence unreachable from) the rest part your program. 
Note that 

 - Here, "its key" indicates the exact (physical) key with which the binding 
   was added, not any other keys of its equivalent class even if they may 
   (or may not) be structurally equivalent to this key. 
 - You may use noncollectable values as keys (such as int), but they won't be 
   collected by the Weaktbl (this is the same convention all weak data 
   structures follow). Since the keys won't be collected, according to our 
   principle -- key's aliveness decide its' binding's aliveness, the bindings 
   won't be collected too. You're actually using Weaktbl as a persistent 
   Hashtbl, but that's still not too bad. 

Finally, the polymorphic hash interface of Weaktbl takes ''equal x y = (compare 
x y) = 0'' and ''hash = Hashtbl.hash'' as the default setting, the same as the 
polymorphic interface of standard Hashtbl. 

== Application == 

Weaktbl is useful for many kinds of applications, where standard Hashtbl's 
persistent storage prevents the necessity of automatic garbage 
collection. We'll illustrate this with a few typical examples: 

=== Dict cache === 

Hash table is typically used for quick dictionary lookup. Suppose you're working 
with a huge data set and key->value lookup is the most frequent operations. To 
have all the data into a hash table in memory is out of consideration; but to 
have everything external is way too expensive (e.g. looking up in a large file 
or querying from a external database each time). Fortunately, you can work on 
one small part of the data set at a time, and the range of this small part 
evolving gradually, i.e. some data coming into the range of vision, others 
going out of scope. A typical working mechanics can be written as 

  let query key = 
    try lookup $key in $hashtable (* quick *) 
    with Not_found -> 
      let value = lookup $key in $external_storage in (* slow *) 
      add $key $value to $hashtable; (* we may query it quite often recently *) 
      value 

You'll soon find the problem -- $hashtable here will growing forever! The 
situation is, with the evolving of working range, some keys are out of reach 
and should be collected by the GC, so should the bindings related to them. 
However due to the Hashtbl's persistence, this won't happen. First, the key 
itself is persistently referenced by Hashtbl, it wouldn't likely to be GCed, 
not mention the bindings. For a weak hash table, this is not a problem: 
keys themselves are *weakly* referenced by the table, besides when a key is 
forgotten (GCed) by your program, its binding(s) will be forgotten (GCed) by 
the table too [*]. 

=== User-level GC === 

Suppose you are working on graph-like data structure. Each node is represented 
as a id -> {links: id list; info: huge_data} bindings in a hash table, where 
the links is all the nodes directly reachable from node id, and the info is 
large blocks of information affiliated to the node id. The graph structure 
keeps evolving over time, i.e. new nodes being added, new links being 
established, old links being broken and old nodes being dropped etc. So far 
it's just normal. Image you break a link, i.e. remove some $id_b from the links 
field of $id_a, it's possible this is the only bridge between the sub-graph 
$id_b is located and the outside world, besides there's no active id directly 
points to this sub-graph. In short, the sub-graph is disconnected, obsoleted 
and unreachable. Because the graph data structure keeps evolving, it's very 
important to GC those unreachable parts. How this is done with a standard 
Hashtbl? Whenever we break a line $id_a -> $id_b, we check all the nodes' links 
field to see whether this is the last link to $id_b. If so, we first remove 
$id_b, break each outward link of $id_b and recursively check whether these 
broken links produce more unreachable nodes... With weak hash table, you do 
nothing. When the last link to a sub-graph is broken, it simply means the 
sub-graph is no longer referenced by the rest program, so it's GCed 
automatically.

Announcing The Decision Procedure Toolkit Version 1.1

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/2f4fb90022cf8374/efa073fe7ddd67ad#efa073fe7ddd67ad

Amit Goel, Jim Grundy and Sava Krstic announced:

We are pleased to announce the open source availability of the 
Decision Procedure Toolkit (DPT).  DPT is a modern SMT solver.  This 
release provides a MiniSAT-like DPLL solver, a DPLL(T) style theory 
combination mechanism, and a solver for the theory of Uninterpreted 
Functions (EUF).  The next release will add a linear arithmetic solver 
and a cooperation mechanism for more than one theory. 

DPT is an open source project licensed under the Apache 2.0 license. 
You can download DPT from sourceforge: 

http://sourceforge.net/projects/dpt 

DPT is intended to serve as a platform for experiments in SMT solvers 
and their applications.  Subsequent releases will include features 
like model generation, proof production and interpolants.  We also 
expect to support parametric theories and the HOL logic. 

The DPT design philosophy emphasizes flexible interfaces and 
transparent implementation over raw speed.  DPT is implemented in 
OCaml.  These decisions not withstanding, speed is good, and so we 
have made a reasonable effort to ensure DPT is fast. 

Further announcements about DPT will be made on the dpt-announce mailing 
list.  You can subscribe to the list via the project web site.

Jim Grundy later added:

DPLL = The Davis-Putnam-Logemann-Loveland backtracking algorithm for 
deciding the satisfiability of propositional logic formulae.  Modern SAT 
solvers (mini-SAT, chaff, etc) use cunning variants of DPLL - as does DPT. 

DPLL(T) is an algorithm that combines a DPLL solver with a solver for 
some theory to yield a combined solver.  In the case of DPT, we have 
included a EUF solver.  EUF is the theory of equality of uninterpreted 
functions.    You can pose DPT problems propositional problems with the 
atoms are propositional variables or equations between variables and 
(uninterpreted) function applications. 

DPT is completely implemented in OCaml - even the DPLL solver, and yet 
we get good (read seemingly competitive) from the tool.

Denis Bueno asked and Jim Grundy answered:

> Have you benchmarked against Minisat?  Is DPT a re-implementation of 
> the Minisat paper using OCaml, or is simply a solver in the DPLL 
> framework as opposed to a solver aiming to mimic Minisat?

We have benchmarked against MiniSAT - at little. 
Naturally MiniSAT is faster.  For problems that combine SAT reasoning 
with theory reasoning then the extra SAT performance doesn't all get 
translated into extra combined theory solving performance - hence our 
overall performance is not too shabby. 

Our SAT solver is very much MiniSAT like, but with some extra features 
and a more open API to better facilitate it's use in a collaborative 
solving tool.  The code is very cleanly written (IMHO), commented, and 
heavy with assertions. It may serve as a good starting place for someone 
wishing to learn about how MiniSAT like algorithms work. 

Our SAT performance - on a few selected benchmarks we have tried - is 
about 1/2 - 1/3 of MiniSAT.  If you start playing with the garbage 
collection tuning, which we have yet to experiment much with, you seem 
to be able to get better than 1/3.

Job Announcement

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/eeb092296dc09468/b088e1f1c3db1b45#b088e1f1c3db1b45

Yaron Minsky announced:

I apologize for boring long-time subscribes to this list, but I would 
like to mention yet again that Jane Street is hiring OCaml programmers. 
Over the last few years we have built up a team of more than 20 
functional programmers, and we're looking to hire more.   

A couple of notes: 

  * We have offices in London, New York and Tokyo.  We are especially 
    eager to hire OCaml developers into the Tokyo office. 
  * We are particularly interested in developers who have experience in 
    systems administration and design. 

Here's a quick summary of what Jane Street is all about: 

------------------------------------------------- 

Jane Street is a trading firm that brings a deep understanding of 
trading, a scientific approach, and innovative technology to bear on the 
problem of trading profitably on the world's highly-competitive 
financial markets.  We run a small, nimble operation where technology 
and trading are tightly integrated. 

At Jane Street, there is room to get deeply involved in a number of 
areas at the same time. We are actively looking for people interested in 
software development, system administration, and quantitative 
research--potentially all on the same day. 

The ideal candidate has: 

    * A commitment to the practical. One of the big attractions of our 
      work is the opportunity to apply serious ideas to real-world 
      problems. 
    * Experience with functional programming languages (OCaml, SML, 
      Scheme, Haskell, Lisp, F#, Erlang, etc) is important. Applicants 
      should also have experience with UNIX and a strong understanding 
      of computers and technology. 
    * Good second-order knowledge. In trading, understanding the 
      boundary between what you do and don't know is as (or more) 
      important than how much you know. 
    * If you're interested in research, a strong mathematical background 
      is a must. This includes a good understanding of probability and 
      statistics, calculus, algorithms, etc. We draw on ideas from 
      everywhere we can, so we value interest and experience in a range 
      of scientific fields. 

The environment at Jane Street is open, informal, intellectual, and 
fun. You can wear a t-shirt and jeans to the office every day, the 
kitchen is stocked, and discussions are always lively. We encourage a 
focus on learning, both through formal seminars and classes, as well as 
through day-to-day conversations with colleagues. Other perks include 
competitive salaries, rapid advancement for people who do well, 
excellent benefits, free lunch, and a gym on-site (the last is NY only). 

If you are interested, send an application to Yaron Minsky 
(yminsky@janestcapital.com) including a resume, cover letter, and 
optionally some sample code you've written. 

If you'd like to know a little bit more about how we came to use OCaml 
as our primary development language, take a look at these slides from a 
talk we gave at CUFP 2006 and the article I wrote for the Monad Reader. 

   http://www.galois.com/cufp/slides/2006/YaronMinsky.pdf 
   http://www.haskell.org/sitewiki/images/0/03/TMR-Issue7.pdf

Smoke 2.01 for OCaml 3.09

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/d65f3d2996672557/403fbcceb25331f0#403fbcceb25331f0

Jon Harrop announced:

Matthieu Dubuget kindly pointed out that our previous upload of the free 
edition of Smoke for OCaml 3.09 still had the erroneous requirement upon our 
internal stdlib2 library. 

As around a third of the downloads are still for 3.09, we've removed this 
dependency so you can still use Smoke without upgrading to 3.10: 

  http://www.ffconsultancy.com/products/smoke_vector_graphics/free.html 

If you're interested in buying add-ons or a commercial license for Smoke, 
please e-mail me or register your interest on-line: 

  http://www.ffconsultancy.com/products/smoke_vector_graphics/register.html

camlp4: Parsing and transforming class definitions

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/cc23ede409e2066f/8f851b20eced21d8#8f851b20eced21d8

Joel Reymont asked and Martin Jambon answered:

> I would like to use camlp4 to parse OCaml class definitions and transform the 
> code if the class inherits from a given one. Are there any OCaml extensions 
> that manipulate classes? 

See Jacques Garrigue's pa_oo extension. I think Nicolas made a port to 
camlp4 3.10. 
http://www.math.nagoya-u.ac.jp/~garrigue/code/pa_oo.ml

Nicolas Pouillard later added:

No it's not part of the distribution (but I think it should).

camlp4: Understanding class_declaration

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/74eadb9d34f722c4/e7fd90720e221125#e7fd90720e221125

Joel Reymont asked and Nicolas Pouillard answered:

> Would someone kindly explain LEFTA, SELF, ANTIQUOT and QUOTATION below? 

The main page about camlp4 extensible grammars/parsers is: 
  http://brion.inria.fr/gallium/index.php/Extensible_Parser 
The syntax of grammars is given in: 
  http://brion.inria.fr/gallium/index.php/EXTEND 

* LEFTA is left associative (its the default). 
* SELF  refers  to  the  current  rule  (class_declaration here), in most common 
cases SELF does what you want. 

ANTIQUOT  and  QUOTATION  are  token  types  like STRING, INT... The backquote 
syntax  mark  the  begining  of an OCaml pattern. So ANTIQUOT is a constructor 
(the  token  type [1]). The lexical syntax of an antiquotation is "$name:...$" 
or  "$...$",  it  use  in  order to treat quotations [2] such as 
  <:class_expr< myclass = object method m = $e$ end >> 
where `e' should better be an expression (Ast.expr). 

[1]: http://brion.inria.fr/gallium/index.php/Generic_Token_Type 
[2]: http://brion.inria.fr/gallium/index.php/Quotation 

>     class_declaration: 
>        [ LEFTA 
>          [ c1 = SELF; "and"; c2 = SELF -> 
>              <:class_expr< $c1$ and $c2$ >> 
>          | `ANTIQUOT (""|"cdcl"|"anti"|"list" as n) s -> 
>              <:class_expr< $anti:mk_anti ~c:"class_expr" n s$ >> 
>          | `QUOTATION x -> Quotation.expand _loc x   
> Quotation.DynAst.class_expr_tag 
>          | ci = class_info_for_class_expr; ce = class_fun_binding -> 
>              <:class_expr< $ci$ = $ce$ >> 
>        ] ] 
>
> It seems they are variants but that's about as much as I understand.   
> What are the "cdcl", "anti" or "list", for example? Why are they   
> strings? 

They  are  antiquotations  names  so  you  can  write  (don't pay attention to 
"anti") <:class_expr< $cdcl:x$ and $list:xs$ >>. 

> And why is `QUOTATION x expanded into Quotation.expand _loc x   
> Quotation.DynAst.class_expr_tag? 

When  you  see  <:class_expr<...>>  you  give  "..." to the quotation expander 
manager  that  will call the registered expanding function (the class_expr_tag 
is too indicate the requested type).

Wink Technologies release OCaml libraries

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/7bd2ae0a9415340d/1db1cb3ce338a9e2#1db1cb3ce338a9e2

Gerd Stolpmann announced:

I'm proud to report that we've won another company supporting OCaml, 
nameley Wink Technologies, in Los Altos, California (http://wink.com). 
They developed a people search engine that lets you find people in the web 
(it is online, check it out), and a substantial part of it is written in 
OCaml. Although Wink is still tiny in comparison with its competitors this 
is nevertheless an important investment into into our beloved language. 

As a commitment to OCaml Wink releases two libraries as open source: 
"Files" is a batch-oriented persistent key/value container, and "Netdns" 
is a DNS stub resolver that is capable of doing asynchronous name 
resolutions. You find these libraries at http://oss.wink.com. Please note 
that the packing is not yet optimal, and more polishing will be done soon. 
But the best is: there is really exciting stuff in the queue of the to be 
released code, especially for distributed computing.

camlp4: Creating my own quasiquotations

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/aaeeeb19ede6e677/fcf006a1f8a25c2a#fcf006a1f8a25c2a

Joel Reymont asked, Pietro Abate suggested, and Nicolas Pouillard said:

> > The "One-day compiler" presentation [1] mentions creating your own   
> > quasiquoations, <:cstmt< ... >> in that particular case. The   
> > presentation does not explain how this is accomplished, though. Are   
> > there examples? 
> >     Thanks, Joel 
> > [1] http://www.venge.net/graydon/talks/mkc/html/index.html
>
> maybe you can find something here: 
> http://www.venge.net/graydon/cgi-bin/viewcvs.cgi/src/mkc/ 

Here is some pointers about that subjects: 

The camlp4 wiki: http://brion.inria.fr/gallium/index.php/Camlp4 

About quotations: http://brion.inria.fr/gallium/index.php/Quotation 

A small but complete example of a new quotation expander: 
  http://brion.inria.fr/gallium/index.php/Lambda_calculus_quotations

Camlp5 new release 5.01

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/3f49d366dd2960ff/3d8a7867acc7cb13#3d8a7867acc7cb13

Daniel de Rauglaudre announced:

New release of Camlp5: 5.01 

Fixed two major bugs: 
 * in grammars, there was parsing confusion when using entries with 
   qualified names with the same final name (e.g. A.foo, B.foo), 
   resulting wrong parse errors => fixed 
 * the syntax "a, b as c, d" (in pattern in normal syntax) did not 
   work any more => fixed 

New features: 
 * added library module Diff to compare two arrays, implemented with 
   the same algorithm than the Unix 'diff' command 
 * added flag 'E' in pretty print normal and revised syntax to allow 
   equilibrated display in match cases, if statement, and cases in parsers 
   and grammars (equilibrated = if one case needs a newline, all cases 
   are printed with newlines also) 

A new version of the pretty printer in Scheme syntax is in preparation. 
Some changes in the parser in Scheme syntax still exist in this version. 

Details in file CHANGES in the distribution and in the site. 

Download the sources and the documentation at: 
   http://pauillac.inria.fr/~ddr/camlp5/

camlp4: Where's Loc?

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/f873ec31a67e06bb/ac2f6c54d291d4c1#ac2f6c54d291d4c1

Nathaniel Gray asked and Nicolas Pouillard answered:

> I'm trying to figure out how to use the Camlp4MacroParser extension 
> but I can't seem to find the Loc module it requires: 

Log is exposed to the user in Camlp4.PreCast, so: 

open Camlp4.PreCast;; 

> macro.ml: 
> """ 
> open Camlp4.Sig 
> let here = __LOCATION__ 
> """ 
>
> Build command: 
> ocamlc -I +camlp4 -pp "camlp4o -parser Camlp4MacroParser" 
> camlp4lib.cma -o macro macro.ml 
>
> Result: 
> File "macro.ml", line 2, characters 11-23: 
> Unbound value Loc.of_tuple 
>
> According to the documentation the Loc module's supposed to be in 
> Camlp4.Sig.  Any clues? 

In sig there is the Loc module type.

Pretty-printing the OCaml AST from the toplevel

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/1633bf5b4de0ba71/cdf43fb2df35549f#cdf43fb2df35549f

Joel Reymont asked and Nicolas Pouillard answered:

> Are there any examples of pretty-printing the OCaml AST from the   
> toplevel? 

$ rlwrap ocaml camlp4of.cma 
open Camlp4.PreCast;; 
module PP = Camlp4.Printers.OCaml.Make(Syntax);; 
let pp = new PP.printer ();; 
let ghost = Loc.ghost;; 
module PP = Camlp4.Printers.OCaml.Make(Syntax);; 
Format.eprintf "%a@." pp#expr <:expr@ghost< 3 + 4 >>;; 

> I'm looking to use this during interactive debugging. 
>
> I see the following example in the camlp4 changes doc 
>
> camlp4 -parser OCaml -printer OCamlr foo.ml 
>
> but I'm still browsing through Camlp4.ml to figure out what that does   
> exactly. 

Camlp4.ml  is  a  generated file. It's perhaps not the best way to read camlp4 
sources.

Correct way of programming a CGI script

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/22eb15a131b77391/e24c44c737b60a37#e24c44c737b60a37

Tom asked:

Hi! I am in a process of making a website (which might receive substantial 
amounts of traffic), and am considering options for the backend. I discarded 
PHP and other similar server-side scripting languages, due to performance 
problems (I suspect that PHP and similar could not scale well, unless I 
implemented complex caching techniques). I plan to use OCaml to generate 
static .html documents from the content from the database. Since the content 
will probably change not as often as it will be accessed, I believe this is 
the better way (as opposed to accessing the database every time a user wants 
to load the page).

So, OCaml programs will only be run seldomly to access the database and 
generate HTML files, using the data fetched from the DB. However, I am still 
worried whether this would cause too much performance impact.

Dario Teixeira answered:

I suggest you take a look at Ocsigen (http://www.ocsigen.org/).  It's 
a fully-featured web server written in OCaml, that not only supports 
static pages and traditional CGI programming, but also has a module 
called Eliom that allows you to build dynamic websites using all the 
best features of the OCaml language. 

As for performance, the bottleneck will surely be the database backend. 
Even when generating dynamic pages with Eliom, Ocsigen can easily output 
close to a hundred pages per second on a decent machine.  (And of course 
it's even faster with static content!)

Gerd Stolpmann also answered:

Have a look at ocamlnet (http://ocamlnet.sf.net). It has plenty of ways of 
building web apps. For example, you can easily run your own HTTP server 
in a multi-processing or multi-threading setup. Or you can connect your 
web app with Apache by using fastcgi or a few other available protocols. 
All this is pretty much scalable. 

There is no support for generating HTML, however. 

An example for a stand-alone webserver (it is accompanied only by a 
config file): 

https://godirepo.camlcity.org/wwwsvn/trunk/code/examples/nethttpd/netplex.ml?rev=1122&root=lib-ocamlnet2&view=auto

Here is the same for the "connect to Apache" approach: 

https://godirepo.camlcity.org/wwwsvn/trunk/code/examples/cgi/netcgi2-plex/?root=lib-ocamlnet2

In either way, it is possible to keep the connection to the db in case 
you need it for generating the page.

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'&lt;1':1
zM

If you know of a better way, please let me know.

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt