Caml Weekly News

Previous week Up Next week
Hello,

Here is the latest Caml Weekly News, week 03 to 10 December, 2002.

01) BigArray.Array2 vs Array.make_matrix
02) Literate programming
03) Trees revised in Baire
04) python-style array access
05) ocaml2: ocaml with editable line history
06) ocamlgsl
07) Who is osp maintainer?
08) RPM & Debian packaging of Ocaml software
09) float pretty-printing precision, once more
10) labels overhead

======================================================================
01) BigArray.Array2 vs Array.make_matrix
----------------------------------------------------------------------
Kevin Murphy asked and Xavier Leroy answered:

> I was wondering what are the relative merits (in terms of speed,
> portability and ease of use) between using arrays of arrays, and the
> BigArray class.
> 
> While web browsing, I saw a couple of people claim that BigArray is
> significantly slower than the standard array class:
> 
> http://caml.inria.fr/archives/200105/msg00217.html
>  claims BigArray is 3 times slower
> 
> www.cs.cornell.edu/courses/cs612/2001sp/ projects/ocaml-arrays/OCaml.pd
>  claims BigArray is about 10 times slower

There are earlier discussions of this issue in the archives of the
caml-list.  See for instance my reply:

http://pauillac.inria.fr/bin/wilma_hiliter/caml-list/200207/msg00445.html

In general, bigarrays are slower than regular arrays, although the 
slowdown depends a lot on how the code is written, e.g. ensuring that
bigarrays are fully monomorphic improves performance quite a lot.

The main purpose of bigarrays is not to speed up pure Caml code, but
to facilitate interfacing with external numerical libraries.

In some circumstances, bigarrays have a few added bonuses as well:
compact representation of arrays of small floats, small integers, and
complex numbers; support for dimensions >= 2^22 on 32-bit machines;
and no-copying slicing and shrinking operations.  But all this comes
at some expense in speed for raw array accesses.

> Plus an interface to some standard linear algebra code would be nice...

You mean, like LACAML?
  http://www.ai.univie.ac.at/~markus/home/ocaml_sources.html

======================================================================
02) Literate programming
----------------------------------------------------------------------
Alessandro Baretta asked and Wolfram answered:

> 
> Is there any literate programming tool supporting Ocaml (well)?

FunnelWeb --- http://www.ross.net/funnelweb/ --- is language independent;
my main uses for OCaml programming,
besides literate documentation itself, are:

* Have ``val'' and ``let'' clauses together in the literate source
  by using alternating incrementally defined macros 
  going into the .mli and into the .ml file.

* Write exported type declarations only once, in macros that are invoked in
  both the .mli and the .ml files.

* I find that changes affecting many modules become much easier   
  through the fact that all 50 modules are in a single source file.
  (However, this setup might not easily work well for a multi-person project,
   and is not a necessity when using FunnelWeb.)

* Parameterized macros for families of (typically small) similar functions ---
  one argument goes into (at least) the function name,
  others may go into the parameter list,
  and definitely some into the implementation.

* To a much lesser extent, using parameterised macro setups for signatures
  shared between parametrically polymorphic and functorised data structures
  (e.g., for being able to switch the underlying implementation   
   from Map to HashTable and back).

======================================================================
03) Trees revised in Baire
----------------------------------------------------------------------
Diego Olivier Fernandez Pons announced:

Trees implementation in Baire has been revised. You will find now
complete implementation of sets and maps in several balancing schemes
(AVL, red and black, weight balanced). The code is in English,
commented in French, English and Spanish.

You will find all this in the download page of Baire (and there is now
a partial translation of Baire main page in english)

For those who also read french, there is a document explaining more in
detail the Baire theoretical foundations. You will find it in the
"documentation" part of the Baire site. Anyway, you will find some
explanations in English in the source code.

The Baire site has moved. The Caml Hump has already corrected his link
so you may safely follow it.

(Note: the Hump is at http://caml.inria.fr/humps/)

======================================================================
04) python-style array access
----------------------------------------------------------------------
Yaron Minsky described:

One of the things I miss from Python is the small syntactical touches that
simplify common operations.  one of my favorites is the way python does
array indexing.  You can use negative numbers to count from the end of the
array, and you can easily take subranges of arrays or strings.  It's
actually quite simple to get the same functionality in ocaml, and here's
some simple code for doing it.  It adds a bit of overhead (an extra
function call), but where that small overhead isn't an issue, I think this
is a nice solution.  If you include the following code in the beginning of
your module, you'll be able to do the following kind of tricks: 

# "foobar".[-1];;
- : char = 'r'
# "foobar".[-2];;
- : char = 'a'
# String.rsub "foobar" 0 (-1);;
- : string = "fooba"
# String.rsub "foobar" 1 0;;
- : string = "oobar"
# String.rsub "foobar" 1 3;;
- : string = "oo"
# String.rsub "foobar" 1 (-1);;
- : string = "ooba"

The same kind of things can be done with arrays.  rsub stans for "range
sub", by the way.  Note that a 0 in the second position of rsub means the
very last as opposed to the very first index position.

------ Snip Here -------

open StdLabels
open MoreLabels

module Array = 
struct
  include Array
  let normalize ar i = if i < 0 then length ar + i else i
  let get ar i = get ar (normalize ar i)
  let rsub ar start stop =
    let stop = if stop = 0 then length ar else stop in
    let pos = normalize ar start in
    let len = (normalize ar stop) - pos in
    sub ar ~pos ~len
end

module String =
struct
  include String
  let normalize str i = if i < 0 then length str + i else i
  let get str i = get str (normalize str i)
  let rsub str start stop =
    let stop = if stop = 0 then length str else stop in
    let pos = normalize str start in
    let len = (normalize str stop) - pos in
    sub str ~pos ~len
end

======================================================================
05) ocaml2: ocaml with editable line history
----------------------------------------------------------------------
SooHyoung Oh announced:

I'm very pleased for my first contribution on ocaml world.
<ocaml2> is the ocaml interpreter with editable line history which
implemented using libgetline library.
Please visit http://www.taglib.co.kr/ocaml/index.html and enjoy it.

===================== Readme ==============
What is ocaml2?
    If you want to correct mistyped lines in ocaml, ocaml2 is very useful.
    It supports editable line histories using libgetline.
    See getline manuaual for more information.

Version: ocaml2 version 0.1

Prerequsite: (I tested only on these releases.)
    ocaml: version 3.06
    libgetline: version 3.9 

Files:
    README - This file
    Makefile - Makefile
    getline.ml - external function declarations
    log.make - make log
    ocaml2.ml - main program
    prim_getline.c - C interface functions for libgetline

How to make:
    (0) untar and cd in the directory
        % tar zxvf ocaml2.tgz
        % cd ocaml2
    (1) edit Makefile (eg: % vi Makefile)
    I assume that "getline.h" is in /usr/local/include and
    libgetline.a in /usr/local/lib.
    (2) % make
    (3) (optional testing) % ./ocaml2
    (4) (optional installation)
        % su
        # make install

How to use:
    See getline manual. (eg. % man getline)

======================================================================
06) ocamlgsl
----------------------------------------------------------------------
Olivier Andrieu announced:

This is to announce the release of ocamlgsl version 0.2.1 , available
here : http://oandrieu.nerim.net/ocaml/gsl/ . GSL 1.2 and ocaml 3.06
are required. See the README if you're using your gcc version is < 3.

Since the previously announced release, there's support for single
precision floats (in bigarrays), complex numbers linear algebra and a
couple of new modules (notably ODE).

GSL is the GNU Scientific Library http://sources.redhat.com/gsl/ , "a
collection of routines for numerical computing".

======================================================================
07) Who is osp maintainer?
----------------------------------------------------------------------
SooHyoung Oh asked and Johann Spies answered:

> I think osp (Ocaml Server Page) is very good solution for writing
> web pages with ocaml + HTML. But the original osp seems for window
> users.
> 
> I changed some for unix world, and I'd like to send patches to the
> maintainer.
> 
> Who is the osp maintainer?

I think John Small <john@rogare.com> wrote it.

> ps:
> You can find the source in http://www.taglib.co.kr/ocaml/index.html with
> name "osp2".
> If you want how it works, you can taste it in
> http://www.taglib.co.kr/cgi-bin/osp/demo.osp

======================================================================
08) RPM & Debian packaging of Ocaml software
----------------------------------------------------------------------
Basile Starynkevitch asked:

I am developping an opensource piece of software in Ocaml (it is the
monitor of POESIA - see http://www.poesia-filter.org for details). my
code needs ocamlfind, netstring, camlp4, pxp

I am not extremely familiar (but did read a few documents on) Debian &
RPM packaging.

So I am seeking for examples of Debian & RPM packaging of Ocaml code
which depends on stuff like netstring, pxp, etc...

Any examples are welcome!

Stefano Zacchiroli answered:

Regarding debian packaging you can look at the following packages, which
are written in ocaml and distributed as debian packages:

- advi
- cameleon
- fort
- ledit
- unison

If you are also looking for debian packaging of ocaml libraries you can
find a lot of packages, look for packages which are named using the
pattern lib*-ocaml and/or lib*-ocaml-dev.

Anyway I think that debian specific questions are a bit OT on this list
so for more info on this please ask on 
debian-ocaml-maint@lists.debian.org mailing lists (archives available at
http://lists.debian.org/debian-ocaml-maint/).

Sven Luther added:

Not to forget the ocaml_packaging_policy document which you would find
in /usr/share/doc/ocaml/ocaml_packaging_policy.gz on a debian box with
ocaml installed.

It contains rules and recomendation on how to do a good debian package
of ocaml stuff.

Vitaly Lugovsky answered as well:

You can take a look at ocamlfind-based packaging of OCaml stuff that I  
have done for ALT Linux distribution (AFAIR, it's awailable from the 
rpmfind now).

======================================================================
09) float pretty-printing precision, once more
----------------------------------------------------------------------
Jean-Marc Eber asked:

caml 3.06+1:

# let f = 1. /. 86400.;;
val f : float = 1.15740740741e-05
# let s = string_of_float f;;
val s : string = "1.15740740741e-05"
# let f1 = float_of_string s;;
val f1 : float = 1.15740740741e-05
# f1 = f;;
- : bool = false
# f1 -. f;;
- : float = 2.59259844496e-17

This situation may be understandable, but is unfortunate.

Disclaimer: I'm not a specialist of the IEEE float format.

Do I have at hand, at least on an architecture supporting the IEEE format, a
function that pretty-prints any valid float value (by valid I mean that I
exclude the "special" values like NaN, infinity, etc.) so that
float_of_string applied to the resulting string returns my initial value,
or, at least, a value that, if substracted from my initial one, returns
zero ?

Background:

In fact, my question goes a little bit further, as it concerns indeed the
parsing of floats in the caml compiler (that uses internally float_of_string
if I'm correct).

Suppose you calculate somewhere (with an caml program, say) a float
constant (such a calculation may last for hours!), and you want after
obtaining the result to *generate* a caml source using this calculated
value. You will probably generate something like

let my_const = <a float text representation>

But my example shows that you are loosing precision and accuracy if you
just use string_of_float.

Of course the goal is to incorporate this value in a caml source, not
to read it in binary form from a file (that would be easy!).

Do anybody know a solution to my problem ?

Yaron Minsky answered:

I'm no expert on floating point representations either, but sprintf with
a sufficiently long length seems to work.

# let test x = float_of_string (sprintf "%.30e" x) = x;;
val test : float -> bool = <fun>
# test (sqrt 2.);;
- : bool = true
# test (sqrt 2. *. 1.343e26);;
- : bool = true

Xavier Leroy answered as well:

Well, you get a relative error of 2e-12.  Are you sure your   
computations require more precision than this?  In physics, the answer
would be almost universally "no".  In finance, you know better than I...

> Suppose you calculate somewhere (with an caml program, say) a float  
> constant (such a calculation may last for hours!), and you want after  
> obtaining the result to *generate* a caml source using this calculated  
> value. You will probably generate something like  
>   
> let my_const = <a float text representation>  
>   
> But my example shows that you are loosing precision and accuracy if you  
> just use string_of_float. 

There are two approaches to your problem:

1- The physicist's approach:
Your float constant is known to have no more than N significant digits
(e.g. because it's based on measurements that have a 10^-N error margin).
Then, use the printf %g format corresponding to N to generate your source. 

2- The programmer's approach:
Having no idea on the actual precision of their data, programmers want
bit-for-bit identity on float representation.  You can do that in
OCaml using Int64.bits_of_float and Int64.float_of_bits, which give
direct access to the IEEE bit-level representation of floats.
For instance, generate your Caml code with

printf "let my_const = Int64.float_of_bits(Int64.of_string "%Ld")n" 
       (Int64.bits_of_float my_const)

That will generate something like

let my_const = Int64.float_of_bits(Int64.of_string "4532949752942055721")

which is truly unreadable, but it guaranteed to give you the exact
same float when executed.   

(To help reading the source, consider putting the decimal
representation of the float in a generated comment.)

And Damien Doligez lectured:

[string_of_float loses precision on floating-point numbers]

In the current working version (3.06+18), the precision used by
string_of_float has been increased to 17 digits.  I *think* this
is enough to represent all the double-precision floating-point
numbers:

          Objective Caml version 3.06+18 (2002-11-07)

  # let f = 1. /. 86400.;;
  val f : float = 1.1574074074074073e-05
  # let s = string_of_float f;;
  val s : string = "1.1574074074074073e-05"
  # let f1 = float_of_string s;;
  val f1 : float = 1.1574074074074073e-05
  # f1 = f;;
  - : bool = true
  # f1 -. f;;
  - : float = 0.

However, it has the unfortunate side-effect of revealing this
awful truth about FP numbers: many "interesting" numbers are
impossible to represent in floating-point.  For example:

  # 0.1;;
  - : float = 0.10000000000000001

There is no floating-point number equal to 0.1 and the
best approximation you can get is 0.10000000000000001.
I expect this will quickly become a much-used FAQ entry...

I don't know if we will keep this change for the next release
of O'Caml.  But you can always use (Printf.sprintf "%.17g")
instead of string_of_float, and you'll get all the digits,
significant or not.

======================================================================
10) labels overhead
----------------------------------------------------------------------
Oleg asked and Jacques Garrigue answered:

> In his book Paradigms of AI Programming, P. Norvig mentions that Lisp
> functions with keyword arguments [1] suffer a large degree of overhead and
> that this may also be true for optional and rest arguments, although to a
> lesser degree [2], depending on the platform [3].
> 
> I'm wondering if the same is true for O'Caml. I'm guessing that it's not,
> since complete function applications using labels can be transformed into   
> "normal" function calls at compile time. Am I right?
> 
> Thanks,
> Oleg
> 
> [1] Called "arguments with labels" in O'Caml
> [2] In Lisp, optional arguments do not have labels associated with them
> [3] Chapter 10.3

You are right.
There is no overhead for non-optional labels in complete applications.
The overhead for partial applications is basically the same as doing
it by hand, so you cannot say that it is really related to labels.

There is an overhead for optional arguments: they are just encoded as
(Some arg) if present and (None) if absent, and the decoding is dynamic.
This shouldn't be a problem in practive: except when wrapping very
simple arithmetic operations, the overhead is neglectible with respect
to the cost of the function itself.

In both cases, typing allows us to be much more efficient than Common
Lisp.

======================================================================
Old cwn
----------------------------------------------------------------------

If you happen to miss a cwn, you can send me a message
(alan.schmitt@inria.fr) and I'll mail it to you, or go take a look at
the archive (http://pauillac.inria.fr/~aschmitt/cwn/). If you also wish
to receive it every week by mail, just tell me so.

======================================================================

Alan Schmitt