Caml Weekly News

Hello

Here is the latest Caml Weekly News, for the week of 24 to 31 May, 2005.

OCaml CSV (comma-separated values) mini-library 1.0.3
JIT ?
Efficient I/O with threads
CamlTemplate 1.0 released
BCP relating to -pack?
Ocaml Programmer Position
Embedding Javascript in OCaml
looking for devs
Job for functional programmer
infix functions
OS Prototype
Seemingly inconsistent labels for List module
SAFFIRE: type checking the OCaml/C FFI

OCaml CSV (comma-separated values) mini-library 1.0.3

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/9286991d81ae52bf3d33cf5e71230358.en.html

Richard Jones announced:

I'm pleased to announce version 1.0.3 of the mini-library for handling
CSV files in OCaml.  This library is released under LGPL with the
OCaml linking exception.

http://merjis.com/developers/csv

This version comes with a handy command line tool called 'csvtool' for
processing CSV files from shell scripts.

JIT ?

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/5db04cbc701ed6f332f58c997a69f1ac.en.html

Christopher Alexander Stein asked and Jon Harrop answered:

> Can the ocamlrun bytecode interpreter do just-in-time compilation
> or is ocamlopt the way to go for performance instead of ocamlc?

ocamlopt is the way to go for performance.

ocaml JIT compiles to bytecode which is then interpreted. ocamlc compiled to 
bytecode which is interpreted. ocamlopt compiles straight to native code and 
typically produces programs which are several times faster.

Basile Starynkevitch wrote a real JIT compiler for OCaml (compiling to native 
code on-the-fly) called ocamljit:

  http://cristal.inria.fr/~starynke/ocamljit.html

Efficient I/O with threads

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/7b8093e8e88d0016895e22b0392933a7.en.html

Yaron Minsky said:

We've been running into some interesting problems building highly
efficient I/O routines in threaded code in ocaml, and I'm curious if
anyone else has some thoughts on this.  The basic problem seems to be
that the locking and unlocking of the IO channels seems to take a
large fraction of the execution time.

A little bit of background first.  The data type we're outputting is
basically a simple s-expression, with the following type:

type sexp = Atom of string | List of sexp list

We write out an s-expression by writing a tag-byte to determine
whether the s-expression is an atom or a string.  If the s-expression
is an atom, we then write a 4-byte int, which is the length of the
string, and then the string.  If the s-expression is a list, we write
an atom which is the number of s-expression that are contained, and
then write those s-expressions.

It's very easy to write parsing and marshalling for this type of wire
protocol, but that code turns out to be quite inefficient, because you
end up making too many calls to the input and output functions, and
each one of those calls requires releasing and acquiring locks.  I
just can't think of a clean way of implementing a reader for this kind
of protocol.  (a writer could be done by writing stuff to a buffer
first, and then writing the whole buffer out at the socket at once.)

Any thoughts?

He then added:

An addendum.  One thing that was pointed out to me in some private
emails was that buffering could solve the problem on the reading side
as well.  That is true, as far as it  goes --- that's why I said that
I can't think of a _clean_ way of handling it.  One of the nice things
about ocaml IO channels is that they handle buffering, and it seems a
shame to have to reimplement buffering on top of them.

Put another way, the problem with input/output channels appears to be
that the buffering is done on the wrong side of the lock.  You
shouldn't have to do any locking to do IO when the request can be
satisfied from the buffer.  The fact that IO channels always require
you to acquire the lock means that the performance is crappy unless
you bundle up writes by yourself.

Fixing this is perhaps too deep of a change to drive into the OCaml
system at this point.  Is this a problem that is addressed by the I/O
channels provided by any other library such as extlib?

Nicolas Cannasse answered:

> Fixing this is perhaps too deep of a change to drive into the
> OCaml system at this point.  Is this a problem that is
> addressed by the I/O channels provided by any other library
> such as extlib?

I can maybe answer on that one.

Extlib IO channels provide "high-level" channels. A channel is just a record
of lambdas that are used to read and write to it. There are implementations
for reading and writing from caml low level channels, but also to input and
output directly from a string. You can also create your own channels by
providing the appropriate API functions ( 3 functions for input channels :
read / input / close  and 4 functions for output channels : write / output /
flush / close ).

This approach means that you can easily wrap one channel with another. For
example there is a Base64 module that takes a channel as parameter and
returns a channel that will either perform encoding or decoding in B64 and
read/write to the underlying channel. The same approach could be used to add
a buffer for either reading or writing.

ExtLib IO channels are focused more on usability than performances. Using
them require a very small overhead compared to using direct caml channels
but is more flexible (you can later retarget your output to a string, or
wrap it with a compression or encoding library) and if you're performing IO
on disk it should not be so much different in terms of performances.

Here's the module documentation :
http://ocaml-lib.sourceforge.net/doc/IO.html

Gerd Stolpmann also answered:

I just looked into the sources of the OCaml runtime. The additional work
to lock/unlock the I/O channels is very, very small, just a
pthread_mutex_lock and a pthread_mutex_unlock for every operation. What
counts more is the general overhead for the multi-threading machinery.
For every blocking system call a lot of additional overhead is
necessary.

As an alternative, you can try the object channels of Ocamlnet. With

let in_ch =
  Netchannels.lift_in (`Raw (new Netchannels.input_descr in_fd))

and

let out_ch =
  Netchannels.lift_out (`Raw (new Netchannels.output_descr out_fd))

you get object channels over the file descriptors in_fd, out_fd that
implement buffers by O'Caml code and work much like the built-in
channels. These channels aren't protected against concurrent usage, and
may be more light-weight because of this.

CamlTemplate 1.0 released

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/a40a280e9ce9eb13c18f25d8d15d4782.en.html

Benjamin Geer announced:

CamlTemplate 1.0 has been released.

CamlTemplate is library for generating text from templates in
Objective Caml. It can be used to generate web pages, scripts,
SQL queries, XML documents and other sorts of text.

Features:

   * A versatile, easy-to-learn template syntax that supports
     common scripting-language constructs, while encouraging a
     separation between presentation logic and application
     logic.

   * The supported Caml data structures accomodate lists, tables
     and trees of items in a straightforward manner.

   * Works well with mod_caml and mod_fastcgi.

   * Supports any ASCII-compatible encoding, including UTF-8.

   * Optional support for multithreading.

Changes since the last release:

   * Fixed bug: META always depended on 'threads' even if
     threads weren't enabled. Thread support is now compiled
     into a separate file which can be linked in if needed,
     instead of being enabled when CamlTemplate is compiled
     (thanks to Janne Hellsten).

   * Fixed incorrect interpretation of backslashes.

   * Fixed reading of template files on Cygwin (thanks to Janne
     Hellsten).

   * Fixed incorrect handling of syntax errors in macro call
     arguments.

   * Added a FastCGI example.

CamlTemplate is available via GODI, or from:

http://saucecode.org/camltemplate

BCP relating to -pack?

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/74f83cbf2ad949db719f4eac64493835.en.html

Alex Baretta asked:

The AS/Xcaml and FreerP projects are now getting big enough that
namespace conflicts begin to emerge within them. We are thinking about
changing our build system so as to encapsulate libraries as single
cmo/cmi packages to introduce a hierarchy in the namespace. The problem
I foresee is that the -pack directive to the compiler breaks the code,
because all modules referring to module X within xyz.cma would need to
open module Xyz. Patching the entire project is in my opinion contrary
to the "a posteriori" approach to namespace management taken by the Caml
team with the -pack directive. So here is my question.

Has anyone already faced this kind of problem on a fairly large project
( ~ 100 kloc)? What are the Best Current Practices relating to -pack?

Richard Jones answered:

A better way to create a hierarchical namespace seems to be to use
some character _other than_ '.' (dot) to separate the levels.

For instance:

  Pxp_document
  Net_httpclient   (nearly)

This allows a third party to come along and add packages to the same
"namespace", eg. Pxp_myextension.

Using dot / -pack doesn't allow extension and doesn't allow the package
to be spread over several cma files.

Rich.

(I'm not claiming that I've used this convention in my own packages,
but I ought to have done ...)

Alex Baretta then said:

Given the current implementation of namespaces in Ocaml, I agree. For a
new project, I'd adopt this convetion. Currently, I have to decide
whether we have to change all module names to match a sensibile
convention, and consequently all references to the modules within the
code, or if we ought to package everything up with -pack. The second
solution is only viable if it does not break the current code, which is
currently not written with "open Library_name" directives. If this is
not possible, we'll have to stick with cmas and change module names all
over the place, which is highly undesirable.

If the namespace management could be entirely delegated to the command
line parameters of ocamlc, I'd be a happier man.

Ocaml Programmer Position

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/fd69a81d02d961d4d09eca95a5d13a4f.en.html

Andres Varon said:

The American Museum of Natural History (http://www.amnh.org) seeks to fill
several positions at various levels for a two-year project for application and
systems development for the study of emerging infectious disease. One position
(the first in the list), is for an Ocaml programmer which (I hope), might be of
direct interest for some of you. The others are CS related, so I'm leaving
them as these could be of your interest too.

Programmer: Computer Science or engineering BS or MS degree, with at least 3 years of 
experience in scientific computation. Knowledge of at least one functional language 
(ML, Lisp, Haskell) is required, preferably Ocaml or ML and the C language. Experience
in parallel computing desirable. Experience in UNIX/LINUX environments necessary.

Algorithm Scientist: Ph.D. in computational science to perform research and
implementation of algorithms for full-genome phylogenetic and biogeographic analysis.
Experience in algorithm design , especially combinatorial optimization problems
crucial. Experience in string and parallel algorithms, and computational biology,
desired. Programming skills for prototyping necessary.

Systems Scientist: Ph.D. in computational science to perform R&D of a computational
system to integrate results of whole genome phylogenetic analysis with geographic and
phenotypic data. Experience in data modeling and development of middleware and user
interfaces for large-scale data management across diverse research sites is crucial.
Experience in geographic information systems important, programming skills in Java
preferred.

Systems Manager: Ph.D. with five years experience in scientific computing, hardware,
and software maintenance. Strategic planning, specification and purchasing of
hardware, grant writing, training of personnel are key skills.

For more information, please contact Ward Wheeler, wheeler@amnh.org

Embedding Javascript in OCaml

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/241450029b04c12c4890d6c6d92db10c.en.html

Alain Frisch announced:

I'd like to announce the first public release of SpiderCaml, a
library to embed Javascript interpreters in OCaml applications.

It relies on SpiderMonkey, the historic implementation of Javascript 
from Netscape, now part of the Mozilla project and compliant with the 
ECMA spec.

The library comes with a very simple Javascript shell.


Download:
  http://yquem.inria.fr/~frisch/SpiderCaml/download/SpiderCaml-0.1.tar.gz

API:
  http://yquem.inria.fr/~frisch/SpiderCaml/doc


The API is not considered stable yet. Comments to improve it are welcome.

looking for devs

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/8cf1db6105fd8dae18874f45311ed947.en.html

Jonathan Roewen said:

I have an operating system project called Desert Spring-Time, written
almost entirely in OCaml, and I'm looking for some devs that are into
low-level stuff, i.e. writing device drivers using OCaml.

The device driver of most interest at the moment is DECchip 21140 NIC
that VirtualPC uses. We have ne2k (isa) for qemu/bochs emulation, and
realtek 8139 (pci) for real hardware.

Any hardcore ocaml hackers looking for a challenge are most welcome.

I've found one spec sheet: google ec-qc0cb-te. And driver is known as
tulip in bsd/linux it seems.

Our first release should be coming along by end of May, so having a
driver by then would be awesome =)

Job for functional programmer

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/b111d712efd43c4ff0ff229a8cb48558.en.html

Simon Peyton-Jones said:

Here's a job advert for a functional programmer to work at Credit Suisse First
Boston. Nowadays, banks are doing lots of interesting things with computers,
and they are pretty knowledgeable about programming languages too. This
modelling and analytics group, who I know slightly, are specifically looking
for someone who knows functional programming. It looks like an interesting
job, so I'm taking the liberty of spamming the Haskell and Caml mailing lists.

Simon



Credit Suisse First Boston is looking to recruit a Computer Scientist for the
Global Modelling and Analytics Group in the Securities Division.

The plans: 
To use functional programming concepts to: 
evaluate financial models in a distributed environment. 
develop domain specific language tools to increase productivity in the creation
of financial models. 

What we are looking for: An outstanding individual from an academic or research
background for a position in New York or London. Prior financial experience is
not required. The key attributes that are sought: 

  An advanced degree with high honours in Computer Science or Mathematics.
  Parallel and distributed functional programming expertise.
  Experience writing compilers.
  Scientific programming experience, preferably C++.
  Interest in using academic ideas in a real world environment.
  Excellent communication skills in order to convey new ideas to our modelling
  team.

Alternative employment options: 
In order to allow us to attract the best candidates we would be willing to
consider flexible employment options; For example, we could offer a position to
a post doctoral researcher or tenured academic who is on sabbatical from their
academic post .

Contact: Neville Dwyer [neville.dwyer@csfb.com]


Background information on Credit Suisse First Boston & the Global Modelling and
Analytics Group: 

Global Modelling & Analytics Group: The group develops mathematical models in
the area of derivatives and relative value modelling for all business areas of
the Securities Division including Interest Rate Products, Foreign Exchange,
Equities, Credit, Local currency, Fund Linked Products, CDOs and Mortgages. As
the group is based on the trading floor, it is immersed in a highly interactive
environment and is ideally placed to effectively respond to the needs of the
trading floor and contribute to the genesis of new financial products.
Occasionally, brief trips to our offices worldwide will be necessary

Division: Credit Suisse First Boston's Securities Division incorporates
underwriting, research, sales and trading of a broad range of financial
instruments. These include government and corporate bonds, money markets,
foreign exchange, U.S. and international equity and equity related securities,
precious metals and real estate related assets. The division also provides a
full range of derivative products that address the financing, risk management
and investment needs of its customers. The Securities division services over
5,000 corporate, sovereign and institutional customers worldwide.

The firm: Credit Suisse First Boston (CSFB) is a leading global investment
bank serving institutional, corporate, government and individual clients.
CSFB's businesses include securities underwriting, sales and trading,
investment banking, private equity, financial advisory services and retail
online brokerage services. It operates in 68 locations in 33 countries across
5 continents, and has some 17,000 staff worldwide. The Firm is a business unit
of the Zurich based Credit Suisse Group, a leading global financial services
company. For more information on Credit Suisse First Boston, please visit our
corporate website at http://www.csfb.com

Our commitment to providing outstanding service to our clients, our focus on
teamwork, diversity and excellence means our recruitment of the best and
brightest people is essential to our success.

infix functions

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/b3e1a86c65afa1b9438f0c6828902fae.en.html

Jonathan Roewen asked and Jean-Christophe Filliatre answered:

> I see some pervasive functions are infix, and I'm wondering if there's
> any plan to support making any arbitrary infix functions?
> 
> For instance, the Int32 (etc) modules are horrible to use cause of the
> prefix functions. These are perfect candidates for being infix. And
> being an OS project, there are a lot of instances where we need the
> extra precision, and having to do things like add some_int32
> another_int32 complex. Especially when you have to throw in
> bitshifting, AND and OR, and other magic.

In some simple cases, it can help to insert

        let (+) = Int32.add
        let (-) = Int32.sub
        ...

at the  beginning of your files  (or better to  put these declarations
within  a small  module that  you open  only when  you need  the infix
notation).  You  can even  adopt  other  notations,  such as  +!,  -!,
etc.  Only the  first  character  is used  to  determine the  operator
precedence.

Beware of the lexical issue with multiplication :-)

Vincenzo Ciancia also answered:

> For instance, the Int32 (etc) modules are horrible to use cause of the
> prefix functions. These are perfect candidates for being infix.

Not arbitrary, but there are some "free" symbols that can be defined (I
don't know exactly how many and what, but I guess it's on the manual or in
the lexer :) ), and all the infix operators can be redefined. Example of
redefining "+":

########
# module InfixInt32 = struct
    let (+) = Int32.add
  end;;
module InfixInt32 : sig val ( + ) : int32 -> int32 -> int32 end
# 3+3;;
- : int = 6
# open InfixInt32;;
# 3+3;;
This expression has type int but is here used with type int32
#######

example of defining "+*"

#######
# let (+*) = Int32.add;;
val ( +* ) : int32 -> int32 -> int32 = <fun>
# 30l +* 20l;;
- : int32 = 50l
#######

I don't know if there is a way to force inlining of "+*" though, but I
suppose that you could finally resort to defining your operators in camlp4

http://pauillac.inria.fr/caml/camlp4/manual/manual006.html

see section 5.3.1

Richard Jones also answered:

You can create infix operators in the basic language.  You have to use
the right first character in the operator - the scanner appears to use
the first character to decide whether the operator is infix or prefix.
This is rather obliquely documented here:

http://caml.inria.fr/pub/docs/manual-ocaml/manual009.html

(Look for the section "Prefix and infix symbols").

So:

        $ ocaml    
                Objective Caml version 3.08.2

        # #load "nums.cma";;
        # let (+^) = Int32.add;;
        val ( +^ ) : int32 -> int32 -> int32 = <fun>
        # 2000000000l +^ 1l;;
        - : int32 = 2000000001l

It's also possible to create infix functions; however you have to use
the camlp4 preprocessor and your functions become reserved words in
the language.  Here is an example of an infix function which should
get you started:

        open Pcaml
        
        EXTEND
           expr: AFTER "apply"
           [ LEFTA
               [ e1 = expr; "map_with"; e2 = expr ->
                   <:expr< List.map $e2$ $e1$ >>
               ]
           ];
        END

So using that extension you could write code like:

        list map_with (fun elem -> ...)

Use the following Makefile rule to compile the extension:

operators.cmo: operators.ml4
        $(OCAMLC) -c -pp "camlp4o pa_extend.cmo q_MLast.cmo -impl" -I +camlp4 \
          -impl $<

and the following rule to compile code using this extension:

OCAMLPP := -pp "camlp4o ./operators.cmo"
OCAMLC := ocamlc.opt
OCAMLCFLAGS := -w s -g $(OCAMLPP)

.ml.cmo:
        $(OCAMLC) $(OCAMLCFLAGS) $(OCAMLCINCS) -c $<

Then padiolea suggested:

> So using that extension you could write code like:
>
>       list map_with (fun elem -> ...)

I think it is simpler for such cases to have a generic operator such as
let (+>) o f = f o

and then just do
 [1;2;3;4] +> map (fun x -> x+1)
which is reminescent of object notation.

OS Prototype

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/034a45e75fe608992a6dfbf1786f1d2d.en.html

Jonathan Roewen announced:

Desert Spring-Time: An Operating System Project

This is the official release of the Desert Spring-Time prototype
operating system. It is written almost entirely in OCaml.

The system is comprised of:
 -  IDE driver and partition reading code
 -  NE2000 ISA driver
 -  Realtek 8139 PCI driver
 -  VBE (via GRUB) support in 32bit mode
 -  PS/2 mouse & keyboard support
 -  Networking stack

It has two command-line tools for testing networking: nslookup and
ping. Each takes an address, and an optional interface number (an
integer starting at 0) as arguments. Interfaces are automatically set
up once a DHCP lease has successfully been obtainedâthere is currently
no support for manually configuring an interface.

The sources are available online at
http://glek.net/subversion/os/kernel, as well as a floppy image
http://dst.purevoid.org/.

To test in qemu, the following options are required to test networking
and graphics: -std-vga (for VBE), and -isa (for NE2000 card to be
present in ISA mode).

Seemingly inconsistent labels for List module

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/73ced3c4131eaf6203b88f228b53e3eb.en.html

Yaron Minsky asked and Jacques Garrigue answered:

> I've noticed what appear to be inconsistent labelling on some list
> functions, and I'm wondering if I'm properly understanding the reasons
> behind the way the labels work.
> 
> For example, in the various association list functions, in some cases
> the association list is passed with a ~map label, and sometimes with
> no label.  Another odd case is the mem and memq functions, both of
> which label the list being queried with the label ~set.  In this case,
> the labelling mostly seems kind of useless rather than inconsistent.

There are reasons for both :-)
The ~set label is there, so that you can easily define the membership
function.

   let in_a = List.mem ~set:a

Same thing for ~map in List.mem_assoc.
However, there is no label in List.remove_assoc, because there it
doesn't really make sense: it maps an association list to a new
association list.
There is no label either in List.assoc for a dirty reason:
as the result is a polymorphic variable, if there were a label, one
wouldn't be able to omit it in applications. List.assoc is used very
often.

> I'm asking all of this because I'm playing around with writing a
> labelled version of the extlib interface, and I'm wondering whether
> these are mistakes that should be fixed, or whether there are good
> reasons for them and they should be preserved.

So, there are good reasons, but you may make different choices. The
labelling of the standard library is intentionally light; in other
libraries you might want to put more. Or, conversely, if you choose to
have only a labelled version (avoids maintaining two versions), you
must be careful of using labels only where they will not get in the
way.

SAFFIRE: type checking the OCaml/C FFI

Archive: http://caml.inria.fr/pub/ml-archives/caml-list/2005/05/1ca9bd6f3b9a39adfd50b76211716499.en.html

Michael Furr announced:

Announcing,

     SAFFIRE: Static Analysis of Foreign Function InteRfacEs

Saffire is a static analysis program that detects bugs in programs that
use the OCaml/C foreign function interface. Saffire works by performing
type inference across both OCaml and C to make sure that values are used
consistently across the language boundary. For instance, if a OCaml passes
a record to a C function, that C function should not treat the data as an
integer. Saffire also tracks what C variables point into the OCaml heap
and ensure they are always registered with CAMLparam/local before any
allocation functions are called.

Saffire is currently only a proof of concept implementation and does not
handle every corner of the OCaml grammar.  For example, polymorphic
variants and objects are not supported.  For a detailed list of what is
and what is not currently supported, please see the website below.  For a
more complete discussion on how Saffire works, you may be interested in
reading our upcoming PLDI paper (also available from the site).

Saffire is implemented as a combination of camlp4 and a CIL module and is
freely available/redistributable.  The license is the same as CIL
(standard 3-clause BSD).

http://www.cs.umd.edu/~furr/saffire/

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'&lt;1':1
zM

If you know of a better way, please let me know.

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt