Previous week Up Next week


Here is the latest Caml Weekly News, for the week of May 10 to 17, 2011.

  1. NYU's Center for Genomics and Systems Biology seeks OCaml programmer
  2. Difference between [ `A ] and [< `A ]
  3. ODisco, for large-scale data processing in OCaml
  4. Other Caml News

NYU's Center for Genomics and Systems Biology seeks OCaml programmer


Ashish Agarwal announced:
I am excited to tell you about a new opportunity to develop OCaml
software at The Center for Genomics and Systems Biology (CGSB) at New
York University (NYU), located in the heart of Manhattan. The
position's main function will be to develop software in the OCaml
language to manage, analyze, and display the vast amounts of data
generated by next-generation sequencing technologies. NYU's strong
commitment to this field is represented by its $100M investment in the
brand new CGSB building, which houses the latest sequencing platforms
and excellent high performance computing facilities.

You will support the computational needs of several experimental labs
by contributing to the following infrastructure:

o A database for tracking samples, very large quantities of raw data
and analysis results
o A website for users to submit new samples, monitor progress of their
workflow, and visualize data
o A system for distributing batch jobs to a cluster, accounting for
dependencies between jobs and cached results

The ideal candidate will be an experienced functional programmer with
knowledge of many OCaml libraries and tools, such as database
bindings, ocsigen, ocamlnet, batteries, janestreet-core,
etc. Experience in the following areas is a plus but not required:
bioinformatics, statistics, type theory, distributed computing, and
UNIX systems administration.

NYU researchers are using sequencing technologies to investigate basic
questions about the nature of life and to address fundamental problems
in human health. The very large datasets generated by these
technologies pose significant computational challenges for which the
robust principles of functional programming are ideally suited.

Please contact me to discuss this position further. Thank you.

Difference between [ `A ] and [< `A ]


Dario Teixeira asked and Jacques Garrigue replied:
> I've seen OCaml code "in the wild" where both of the following signatures
> are present: (the type parameter for 't' is a phantom type)
>  val foo: [< `A ] t -> unit
>  val bar: [ `A ] t -> unit
> But is there any practical difference between [ `A ] and [< `A ] given
> that there is only one element in the set?

In this particular case the two types are almost equivalent.
The only counterexample I could find is unifying with the following
private row type:

  type leA = private [< `A]

leA is unifiable with [< `A] but not with [`A].

The difference becomes more significant when there is an argument.
For instance, [<  `A of int] and [< `A of bool] are unifiable, giving [< `A 
of int & bool],
but [`A of int] cannot be unified with [`A of bool].

Note that some old versions of OCaml did some "singleton promotion", i.e.
[< `A of int] was automatically converted to [`A of int].
This was removed as an unnecessary complication, and also because you
might actually want to distinguish the two for private row types.

ODisco, for large-scale data processing in OCaml


Prashanth Mundkur:
The Disco team is pleased to announce the possibility of doing
large-scale data analysis (ala map-reduce) in OCaml.

Disco [1] is an open-source distributed computing framework inspired
by the map-reduce paradigm.  It includes a distributed replicating
tag-based filesystem that allows you to store your datasets in a
fault-tolerant manner.  Disco comes with additional tools: DiscoDB [2]
for implementing efficient mapping objects and Discodex [3] for
distributed indices for querying large datasets.

Disco has been in production use at Nokia for two years, and is used
to process terabytes of data daily [4].

The core job scheduling, cluster monitoring and filesystem logic of
Disco is written in Erlang, leveraging the strengths of Erlang in
concurrency and distribution.  The primary language for writing
compute jobs is currently Python; however, the latest Disco 0.4
release [5] has opened up the Disco worker interface, allowing jobs
written to be written in any language.

ODisco is the first available non-Python implementation of this Disco
worker interface, and allows distributed processing of large-scale
datasets in OCaml.  The computation is not restricted to a
record-oriented key-value style interface; the OCaml task directly
gets access to the input data source and writes the output data in
whatever format it chooses.  The overall computation however currently
still follows the traditional map-reduce dataflow, with
map/shuffle/reduce stages.

ODisco is available at and also in
the 3.12 section of Godi as the godi-odisco package.

Please let us know if you have any issues with either ODisco or Disco
on the Disco mailing list.

Happy hacking!

[1] Disco Project,
[2] DiscoDB,
[3] Discodex,
[4] Disco at Nokia,
[5] Disco 0.4 release,

Other Caml News

From the ocamlcore planet blog:
Thanks to Alp Mestan, we now include in the Caml Weekly News the links to the
recent posts from the ocamlcore planet blog at

Dose3 in debian experimental !:

OCaml as a SQL*Plus replacement?:

Dynlink as dlopen..:



Announcing: OCI*ML:

Running a classical proof with choice in Agda:

How newcomers can easily contribute to the OCaml Batteries:

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt