Caml Weekly News

Previous week Up Next week
Hello,

Here is the latest Caml Weekly News, week 27 May to 03 June, 2003.

1) ocamlnet status ? other project ?
2) Those recent reports regarding F#
3) Las Vegas geometry for intersection of line segments
4) Set/Map with intervals and/or order-related operations
5) Good examples for a Camlp4 beginner?
6) implementing bit vectors in OCaml

==============================================================================
1) ocamlnet status ? other project ?
------------------------------------------------------------------------------
Francois Rouaix asked and Fabrice Le Fessant answered:

>  Hi all,
>  I'm looking into possible code bases for a new  proxy project (in the 
>  style of V6, http://cristal.inria.fr/~rouaix/V6/, but with lots of 
>  other protocols such as IMAP4, filesystems, ...).
>  In the OCaml world, it seems that OCamlnet was the largest library of 
>  protocols (HTTP, FTP, ...), but it looks like it stopped in late 2001. 
>  Has it evolved since ?
>  There's also Python code (Twisted Project) but HTTP seems to be 
>  oversimplified (e.g. 1998 style HTTP).
>  And of course, there's Java code everywhere.
>  
>  Any other solution that I should be aware of ?

You can also look at:
* The CDK, and its net/ library. I think there is a simple HTTP client, HTTP
  server and an FTP client.
* MLdonkey, where the net/ library (mldonkey/src/utils/net/) contains the
  kernel of a mono-threaded generic network program (ie BasicSockets,
  TcpBufferedSockets, UdpSockets, TcpServerSockets), wired on select/poll
  system calls, with some kind of bandwidth control, and an HTTP client and
  server.

==============================================================================
2) Those recent reports regarding F#
------------------------------------------------------------------------------
Don Syme explained:

Dear Caml-list,

As one of my projects at Microsoft Research I have developed an
implementation of a subset of the OCaml language for the .NET platform.  
This implementation is called F#, and also comes with some relatively 
minor extensions to allow the programmer to access .NET libraries.

There have been some utterly speculative (and entirely off-the-mark!!)
internet press reports about this project in the last few days (e.g. see
internetnews.com).  As a result I thought it wise to add the following
clarification to the F# website and to post it to this list.

--------------
Clarification regarding recent press reports about F#:  Despite reports
suggesting otherwise, F# is a relatively small research project designed
to demonstrate that it is possible to easily implement ML-like languages
for use on the .NET Framework.  There are no current plans to
commercialize F#, and the source code for the F# compiler is due to be
published in June 2003. F# is public, on-going research, and Microsoft
Research regularly and openly collaborates with universities on
programming languages.  There has been a long tradition of implementing
ML-like languages within research laboratories as these have been widely
accepted as foundational languages for programming language research,
including the Caml project (encompassing both Caml-light and OCaml),
Moscow ML, Dependent ML and many other extensions to Standard ML. The
implementations have often proved useful in practice, and are good for
teaching the foundations of programming.
--------------

The best thing that I can see having come out of this is that ML-like
languages and OCaml in particular have been given an unexpected
publicity boost.  As you all know I think OCaml is a great programming
language and implementation, and if the fact that a small research group
at Microsoft Research takes this class of languages seriously somehow
helps their uptake then that's a very good thing in the long run.

He then added:

A clarification to my clarification: the internetnews.com article was
actually just reporting speculation by contributors to Slashdot.org and
elsewhere.  The internetnews.com article itself sensibly tended towards
the conclusion that F# was just a small research project.

==============================================================================
3) Las Vegas geometry for intersection of line segments
------------------------------------------------------------------------------
Eray Ozkural announced:

I implemented Kenneth Clarkson's randomized incremental construction for
computing the trapezoidal map of line segments in ocaml.

I'm distributing it under GPL (with a catch) for those who are interested in
computational geometry, it's got some primitives, etc. that might be useful
as well:

http://borg.cs.bilkent.edu.tr/~exa/code/lasvegas-geom-1.0.tar.bz2

It was a nice experience writing this kind of code on ocaml.

==============================================================================
4) Set/Map with intervals and/or order-related operations
------------------------------------------------------------------------------
Yamagata Yoriyuki asked and Diego Olivier Fernandez Pons answered:

> Perhaps I will develop my own since I need balancing schema, and a
> Map-like structure (not only a Set-like structure.) based on Diet.
> I only need queries for individual elements, not intervals. So I
> don't think a k-d tree is appropriate. Also I need the data
> structure purely functional, because the cost of copying is not
> negligible. So, even though several people points out the existence
> of C implementations, I will opt an ocaml one. The dom type of
> FaCiLe is essentially a list of intervals. Maybe I can use Baire
> implementations.

I am afraid I haven't yet understood your problem and I will need more
explanations to be able to help you.

There are several points in your message I would like to comment :
- Tree-like data structure balancing
- Diet (discrete interval encoding scheme)

1. Balancing tree-like data structure

You should separate the design of the tree-like data structure and its
balancing.

First write down your unbalanced data structure and get your
algorithms right, only then choose one and implement one of the
multiple balancing schemes available

The trick is very simple (Stephen Adams) : encapsulate the balancing
information in << smart constructors >>.

At the beginning this constructors will be trivial

let makeTree = fun l v r -> N (l, v, r)

and your functions will look like this

let rec insert x = function
  | E -> single x
  | N (l, v, r) ->
     match compare x v with
        | n when n < 0 -> makeTree (insert x l) v r
        | n when n > 0 -> makeTree l v (insert x r)
        | _ -> failwith "already in the data structure"

Once everything goes right, you will add a balancing constructor and
replace your [makeTree] by the balancing constructor [makeBTree].

Sometimes you will find usefull not to recompute all the balancing
information at every node, then you just need to add a cache. This is
of course optional.

...
| E -> injection x
| N (l, v, r, _) -> (* the last int is a cache for the constructor *)
...

Your balanced function will look like this

let rec insert x = function
  | E -> single x
  | N (l, v, r, _) ->
     match compare x v with
        | n when n < 0 -> makeBTree (insert x l) v r
        | n when n > 0 -> makeBTree l v (insert x r)
        | _ -> failwith "already in the data structure"

Conclusion : You do not need to bother with balancing schemes from the
beginning. Once your unbalanced data structure works fine, you can add
them very easely.

Baire contains a few examples :
- unbalancedSet.ml contains unbalanced trees and AVL-trees without
cache (does not change the data type) (0)
- avlSet.ml contains avl sets with cache (1)
- weighBalancedSet.ml contains WB sets with cache (2)
...

(1) and (2) are just a copy-paste of (0) with the described
modifications and (1) and (2) share all but 40 lines of code (namely
the 4 balancing constructors).


2. Discrete interval encoding tree

The idea of diet is to replace successive integers by an interval

{3} {5} {6} {7} {9} -> {3} [5-7] {9}

Then, a Diet is just a tree of intervals and singletons with some
join/separate instructions. Since I do not understand what you mean by
'map-like operations' I don't know if this is appropriate.

{3, 4} {5, 2} {6, 2} {7, 3} {9, 4} -> {3, 4} [5-6, 2] {7, 3} {9, 4}

This will work (I mean with a significant compression rate) only if
you can ensure that consecutive bindings will have very often the same
image.

Or are you binding sigletons and intervals to values ?

{3, 4} {[5-7], 2} {[6-10], 5} {9, 4}

find 5 myDataStructure -> [(5, 2); (5, 5)]

Are the intervals supposed to be disjoint ?

{3, 4} {[5-7], 2} {[8-10], 3}

find 5 myDataStructure -> (5, 2)

Or are you binding keys to multiple values

{3, {4}} {5, {2, 5}} {6, {2, 5}} {7, {2, 5}} {8, 5} {9, {4, 5}} {10,
{5}}

find 5 myDataStructure -> [(5, 2); (5, 5)]

Multidimensional data structure (k-d trees, range trees, cartesian
trees) may be what you are looking for. It depends highly on what you
mean by "set/map with intervals" and "order related operations".

Could you give a few examples of what you expect of such a data
structure ?

==============================================================================
5) Good examples for a Camlp4 beginner?
------------------------------------------------------------------------------
Matt Gushee asked:

Over the past couple of weeks I have been learning about the various
OCaml parsing and lexing tools, with an emphasis on Camlp4. It's
fascinating, and I've learned a lot, but I am still having trouble
grasping how the different components fit together. I think what I need
now is to look at some examples of working, real-world code that use
Camlp4 ... something non-trivial, but not enormously complex. Can anyone
suggest a good place to start?

Thanks in advance for any suggestions.

Alexander Voinov answered:

See a simple extension, which adds some sugar on top of the List module
functionality. My purpose in this case was to give close analogs to 
Python's loops over sequences (e.g. lists), to facilitate transition. An
example is attached as well.  

listsugar.ml:

open Pcaml;;

EXTEND
  GLOBAL: expr;

  expr:
    [ [ "map"; list = expr; "with"; OPT "|"; clauses = LIST1 clause SEP "|" ->
          <:expr< List.map (fun [ $list:clauses$ ]) $list$ >> ]

    | [ "iterate"; list = expr; "with"; OPT "|"; clauses = LIST1 clause SEP "|" ->
          <:expr< List.iter (fun [ $list:clauses$ ]) $list$ >> ]

    | [ "foldr"; list = expr; "from"; initval = expr; "with"; OPT "|"; 
        clauses = LIST1 clause SEP "|" ->
          <:expr< List.fold_right (fun a b ->
                                     match (a, b) with [ $list:clauses$ ])
                                   $list$ $initval$ >> ]

    | [ "foldl"; list = expr; "from"; initval = expr; "with"; OPT "|"; 
        clauses = LIST1 clause SEP "|" ->
          <:expr< List.fold_left (fun a b ->
                                     match (a, b) with [ $list:clauses$ ])
                                   $initval$ $list$ >> ]
    ];

  clause:
    [[ p = patt; w = OPT when_expr; "->"; e = expr -> (p, w, e)]];

  when_expr:
    [[ "when"; e = expr -> e ]];
END;;

testsugar.ml:

open Printf

let _ =
  let values = [1; 2; 3; 4; 5; 6; 7] in

  let vals2 =
    map values with
      | 1 | 3 | 7 -> 10
      | v when v mod 2 == 0 -> v + 1
      | v -> v - 1
  in

  let valfmtd = String.concat ", " (map vals2 with v -> sprintf "<%d>" v) in

  let suml, sumsquares, nelem =
    foldl values from 0.0, 0.0, 0 with
        (suml0, sumsq0, n), value ->
          let fv = float value in
          (suml0 +. fv, sumsq0 +. fv *. fv, n + 1)
  in
  let mean = suml /. float nelem in
  let sdev = sqrt ((sumsquares -. float nelem *. mean *. mean)
                   /. (float nelem -. 1.0)) in

  let sumr =
    foldr values from 0.0 with
        value, sum0 -> (
          printf "found value = %d, sum0 = %5.2fn" value sum0;
          sum0 +. float value
        )
  in

  printf "%sn" valfmtd;
  printf "mean: %g, sdev: %gn" mean sdev;
  printf "sumr: %gn" sumr;

  iterate values with
    | 1 -> printf "onen"
    | v when v mod 2 == 0 -> printf "%d is evenn" v
    | v -> printf "%d is oddn" v

Basile Starynkevitch proposed:

Yes, you might look into the tracing facilities of Poesia monitor;

get the following files (under CVS) from
http://www2.poesia-filter.org:8000/cgi-bin/cvsweb.cgi/PoesiaSoft/PoesiaMonIcap/

README.trace
pa_trace.ml

trace.ml
trace.mli

In a few words, the pa_trace.ml preprocessing file transform
expressions like
   trace FOO "x=%d y=%d" x y tracend
into
  begin
    if Trace.flags.foo then
        Trace.tracer Trace.FOO  [filename]  [lineno]  [colno]
          (Printf.sprintf "x=%d y=%d" x y
  end

where [filename] is actually the string of the source filename, eg
"foobar.ml", [lineno] is the line number, eg 43, etc.

Even if I coded some camlp4 stuff, I definitely do not claim to be a
camlp4 expert. In particular, I never coded any pretty printer.

Of course you could have a look into the Coq proof system, which
(IIRC) initated the development of camlp4 and which is probably the
major camlp4 user.

==============================================================================
6) implementing bit vectors in OCaml
------------------------------------------------------------------------------
Norman Ramsey asked:

We have a program that is spending a lot of time in set operations,
and we're thinking of trying an imperative implementation based on bit vectors.
I would hope that the basis of such an implementation would be an array
of native integers, but on scrutinizing the manual (espeically the chapter
on interfacing to C), I have concluded that such a thing is not possible.
Our choices appear to be

  * An array of native integers, which will be implemented as an array
    of pointers to native integers, because integers are boxed.

  * An array of tagged integers, which will be less efficient as
    dividing by 31 is more expensive than shifting.

How would the gurus recommend that we proceed?  Is there a better,
still efficient data structure for a set of small integers?

Could the compiler gods be persuaded to provide unboxed
representations for arrays of untagged integers, as is already done
for `float array'?

Claude Marche proposed:

Did you have a look at the caml hump ? The Bitv module may suit your  
needs.

http://www.lri.fr/~filliatr/software.en.html

Xavier Leroy answered:

As others mentioned, you can use strings, which are really arrays of
bytes.  For instance:

type t = string

let create nbits = String.make ((nbits + 7) / 8) '\000'

let is_set s n =
  (Char.code s.[n lsr 3]) land (1 lsl (n land 7)) <> 0

let set s n =
  let i = n lsr 3 in
  s.[i] <- Char.unsafe_chr (Char.code s.[i] lor (1 lsl (n land 7)))

let clear s n =
  let i = n lsr 3 in
  s.[i] <- Char.unsafe_chr (Char.code s.[i] land
                                         (0xFF lxor (1 lsl (n land 7))))

Operations on whole sets such as union and intersection will not be as
fast as they could be with an array of 32- or 64-bit integers
(you're processing the set 8 bits at a time instead of 32 or 64 bits
at a time), though.

Another option is to use 1-dimensional bigarrays of kind int32 or
nativeint.  In bytecode, bigarray accesses are much slower than string
accesses.  However, with sufficient type constraints, ocamlopt can
generate good inline code for bigarray accesses.

For very sparse sets, binary trees (as in the Set module) or Patricia
trees (as in J.-C. Filliâtre's library) can be more efficient, though.

> Could the compiler gods be persuaded to provide unboxed
> representations for arrays of untagged integers, as is already done
> for `float array'?

The special case for float arrays is already a bit of a hack and it
isn't clear this was a really good idea -- although it sure helps with
benchmarks :-)

The problem with this special case is that every polymorphic array
operation (when the array type is not known statically) is turned into
a run-time test

        if this_is_a_float_array
        then box_float(load_float_from_array)
        else load_word_from_array

If additional special cases were added, these generic array operations
would become even more costly and generate even bigger code...

==============================================================================
Old cwn
------------------------------------------------------------------------------

If you happen to miss a cwn, you can send me a message
(alan.schmitt@inria.fr) and I'll mail it to you, or go take a look at
the archive (http://pauillac.inria.fr/~aschmitt/cwn/). If you also wish
to receive it every week by mail, just tell me so.

==============================================================================

Alan Schmitt