Caml Weekly News

Hello

Here is the latest Caml Weekly News, for the week of 19 to 26 July, 2005.

parsing OCaml code
(Mostly) Functional Design?
Pattern Matching Papers
pftdbns 0.2.8
How to do this properly with OCaml?
ocamldap 2.1.4
Ocamlnet 1.1

parsing OCaml code

Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/29668

Nate Gaylinn asked and Sylvain Le Gall answered:

> I'm trying to figure out a nice way to parse OCaml code in a C++ program. 
> All I need is a very basic parse tree for use in syntax highlighting and 
> indenting OCaml code.
> 
> I would love to make use of some of the programs that already come with 
> OCaml. For instance, ocamlc and camlp4 can both output OCaml parse trees 
> already! The only problem is the formats that these programs output are 
> (to my knowledge) completely undocumented.
> 
> If I don't use these programs, I could always try using Lex and Yacc to do 
> the job, but I'm completely unfamiliar with these utilities and it would 
> take me quite some time to write input for parsing OCaml.
> 
> Does anyone know where I could find reference to the various OCaml parse 
> tree formats? Does anyone know where I could find Yacc grammar 
> descriptions of OCaml, or would I have to write them myself? Does anyone 
> have any other suggestions of how to tackle the parsing problem?

You should have a look to ocaml-ast-analyze
(http://www.carva.org/sylvain.le-gall/ocaml-ast-analyze.html). It is a
module build around camlp4 which can overridde the output of camlp4 (ie
it can replace pr_dump.cmo for example).

I use it to analyze ocaml code in order to find functions (which are
"s_", "f_"... ) followed by a string (s_ 'Coucou') and output only the
string ("msgstrid 'Coucou').

I think it is very useful to do analysis on Ocaml AST.

There is no documentation, but i am planning to write some.

Eric Cooper also answered:

OCaml uses ocamlyacc and ocamllex for its front-end: see
.../parsing/parser.mly and lexer.mll.

ocamlyacc is very similar to yacc/bison; reuse of the grammar should
be straightforward.

ocamllex is similar to lex/flex, but it supports mutually recursive
rules instead of lex's explicit "start conditions", so you might have
a bit more work if you want to translate it.

(Mostly) Functional Design?

Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/29595

In the middle of this huge thread, Doug Kirk said and James Woodyatt added:

> First, since this thread was started by somebody asking for ideas on 
> learning FP, I can site a couple of printed resources that have helped 
> me, but none are written using Ocaml:
>
> "Haskell School of Expression", Hudak, ISBN 0-52164-4089
> "Algorithms: A Functional Programming Approach", Rabhi, Lapalme, ISBN 
> 0-20159-6040
> "Structure & Interpretation of Computer Programs", Sussman, Abelson, 
> Sussman, ISBN 0-26269-2201

I would add that a good tutorial on monads is a resource that every 
functional programmer should read for comprehension.  I haven't found 
one that I can recommend without hesitation, but here's a candidate:

        http://www.nomaware.com/monads/html/

Pattern Matching Papers

Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/29583

Continuing the thread from last week, Norman Ramsey said:

In addition to other papers mentioned on this list, there are two
unpublished papers that may have some value:

   Baudinet, Marianne and David MacQueen. 1985 (December). Tree pattern
   matching for ML (extended abstract). Unpublished manuscript, AT&T Bell
   Laboratories.

   Scott, Kevin and Norman Ramsey. 2000 (May). When do match-compilation
   heuristics matter? Technical Report CS-2000-13, Department of Computer
   Science, University of Virginia.

My paper with Kevin Scott has some tutorial stuff in it.  If you like,
I can try to dig out source code, but it's in SML and may not be so useful.

pftdbns 0.2.8

Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/29724

Oliver Bandel announced:

I have to announce version 0.2.8 of pftdbns,
a tool which is useful in sorting/listing/moving files.

It's name "pftdbns" is a short hand for "put files to directories (sorted) by name structure".

It takes filenames, maps each char of the filename into a char, representing
the charclass of it (a..z and A..Z -> "l" (letter), 0...9 -> "i" (integer" and so on)).

This yields to an easy way of sorting files by names, based upon file-naming
with certain filenaming-conventions.

So, for example  "hello.txt" and "ballo.txt" are part of the same name structure,
as well as "1001.txt" and "8251.txt" but also "8251.jpg" are of the same name
structure. For example "foobar.tex" and "foobar.txt" are equally structured too.

The default behaviour is to move files into directories. The names of the directories
are choosen from the string, which represents the name structure by default.

Since version 0.2.2 the default-behaviour of moving files can be changed
to only list the lilenames/namestructrues (and not moving them).

Changes in 0.2.8:
=================

Bug-Fix:
--------

  -inv option corrected!
         It didn't worked properly in version 0.2.6,
         because the filter function was stupid!

Source restructuring:
---------------------

Some changes in datastructures to have them more clear
and have a more efficient program (multiple/unnecessary
calls of the function "scan_string" removed).
This was done in the parts of the program, where the
file/template lists are build.

I hope you enjoy this program, and I think if you have to handle a lot
of files, this will be very helpful.

You can find the tool here:
        ftp://www.belug.org/new/user/ob/Programs/Tools/pftdbns/ 

There also is a README in this directory, so that you can read more details.

A description can also be found here:
   http://www.belug.org/~ob/ftp-area.html 

pftdbns uses the GPL-license.

How to do this properly with OCaml?

Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/29708

Thomas Fischbacher started a huge (sometimes flaming) thread asking:

I repeatedly stumble across problems where I would like to use a 
programming style which seems quite out of line with how one is supposed 
to use OCaml (to say it otherwise, it looks outright wrong and 
horrendously ugly), but which nevertheless might be just appropriate for 
the situation in question. So, I would like to hear the opinion of the 
OCaml community about this.

Imagine constructing a binary heap of tuples. It is somewhat neat to be 
able to just use an array to store the entries of the heap which is 
pre-allocated to some fixed size and replaced by a larger array whenever 
more space is needed. The only thing known about the heap's entries at 
compile time is that they are bound to be tuples, hence boxed objects,
but nothing else is known about their size or even type.

As one does not have a prototype of such a tuple at the time the array is 
created, it seems to me as if the best thing one could do in OCaml would 
be:

(1) Make an array of <tuple> option and initially fill it with None.

(2) Make an optional array of tuples which is None until the first entry 
is made.

One drawback of approach (2) is that when we "remove" entries from the 
heap, the underlying array will keep unnecessary references to values 
which by chance might be quite big, and which we might want to be 
reclaimed by GC, so that's not too beautiful for a general-purpose 
application.

The major drawback of (1), however, is that there is a further level of 
indirection for array entries, which means some unnecessary consing, as 
well as more work for the machine to get at a given entry.

If I just define a function

let whatever () = Obj.magic (1,2);

then 

let a = Array.make 10 (whatever());;

would give me more or less just what I want. An array of boxed things that 
I then would like to use as in:

# let () = a.(2) <- (1,2,3,4) in a.(2);;
- : int * int * int * int = (1, 2, 3, 4)
# let () = a.(3) <- (5,8,2,1) in a.(2);;
- : int * int * int * int = (1, 2, 3, 4)
# a.(3);;
- : int * int * int * int = (5, 8, 2, 1)

Mind you - in this case, I will only assign int*int*int*int tuples to that 
array, or the result of the evaluation of whatever() when I want to kill 
an entry. Plus, I guarantee never to look at any entry that is set to 
whatever().

In some other situation, I might be inclined to use the same code, but 
with string*bool instead of int*int*int*int. But again, I will promise 
never to put anything else but string*bool into the underlying array, and 
never look at entries which are not set properly. (Which, of course, 
includes never printing such an array at the toplevel unless it is fully 
occupied.)

Please don't tell me that "well, OCaml is not Lisp, after all". 
This I already know. But is there no proper way to get around that 
(technically speaking) unnecessary extra level of indirection that is 
forced upon me due to static type checking?

The very same problem appears in a different guise when one tries to write 
a function that flattens 'a array array -> 'a array. The best I could 
come up with here is:

let join_arrays aa =
  let len_total = Array.fold_left (fun sf a -> sf+Array.length a) 0 aa 
  and nr_arrays = Array.length aa
  in
  if len_total=0
  then
    if nr_arrays = 0 then [||] else aa.(0)
      (* ^ Take a close look! *)
  else
    (* Here, we face the problem that we have to get some value
       of the proper type to init the joined array with.
     *)
    let first_entry =
      (let rec seek pos =
        let array_here = aa.(pos) in
        if Array.length array_here = 0
        then seek (pos+1)
        else array_here.(0)
            (* This is guaranteed to terminate, as len_total>0! *)
      in seek 0)
    in
    let result = Array.make len_total first_entry in
    let transfer_to_result arr offset =
      let nr = Array.length arr in
      for i = 0 to nr-1 do result.(i+offset) <- arr.(i) done
    in
    let rec populate nr_array_now offset =
      if nr_array_now = nr_arrays
      then result
      else 
        let array_now = aa.(nr_array_now) in
        let () = transfer_to_result array_now offset in
        populate (1+nr_array_now) (offset+Array.length array_now)
    in
    populate 0 0
;;

Of course, it is pretty awkward having to scan for the first element in 
the first nonempty array in an array of arrays, especially as that element 
really does not play a that special role.

Could anyone please tell me how to do all this in a more appropriate way?

Deep into the thread (please read the read online), Michael Alexander Hamburg said and Xavier Leroy replied:

> I was constructing a binary heap of tuples the other day.  After pondering
> these options, I just used Obj.magic 0 as the null value in the array.
> The heap was in a module, so nothing else could see the array, and I could
> prove that the code never accessed the null elements, so the use of
> Obj.magic seemed justified.

In other terms:

" I was walking in the city the other day.  I saw a syringe lying on
  the sidewalk.  I stuck the needle in my forearm.  That was a classy
  neighborhood, so the use of the syringe seemed justified. "

Sorry for being sarcastic, but I strongly feel that any suggestion
to use Obj functions should be avoided on this list.  The OCaml
compiler performs some type-dependent optimizations that can result in
incorrect code (w.r.t. GC invariants) if wrong types are given using
Obj.magic.

For instance, the following implementation of "magic" arrays will
eventually cause the GC to crash:

type 'a t = int array
let get (a: 'a t) i = (Obj.magic a.(i) : 'a)
let set (a: 'a t) i (x: 'a) = a.(i) <- (Obj.magic x : int)

while the same code with "string" instead of "int" will not.  You
don't understand why?  Then, don't use Obj.magic.

A few years ago, I spent one full day debugging a mysterious crash
in code provided by a user, then realized that the problem was exactly
the use of Obj.magic outlined above.  I then swore to throw away all
bug reports whose repro case uses Obj.  So, you can break the type
system with Obj, but you get to keep the pieces afterwards.

Coming back to the initial question, I would first warn against
premature optimization: quite possibly the overhead of the "option"
solution is negligible.  If not, just ask the user to pass an initial
value of the heap element type to the "create heap" function.

ocamldap 2.1.4

Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/29781

Eric Stokes announced:

Announcing the release of ocamldap 2.1.4, which includes bugfixes,  
and performance enhancements.

     The BER decoder sees probably its last significant performance  
increase in this release, however thats ok, because the increase is a  
quite large 2.5x speedup. To put this in perspective the bytecode  
decoder is now 10% faster than the first release of the native code  
decoder was, and is 14 times faster than perl's Net::LDAP.

Ocamlnet 1.1

Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/29782

Gerd Stolpmann announced:

The Ocamlnet project announces version 1.1 of this library collection.
This release focuses on important additions:

- Nethttpd is a library with HTTP 1.1 daemon functionality. The core
  of this daemon is quite complete; almost all HTTP 1.1 core features
  are supported (more or less only multipart/byteranges are omitted). 
  The daemon can serve static and dynamic pages. There are two
  operating modes: In reactive mode, the server works sequentially,
  and in event-based mode, the server can do multiplexing. What is
  still missing are ready-to-use MPM containers (but it is already very 
  easy to create a multi-threaded server with very good performance).

  Nethttpd was sponsored by Alex Baretta's company Baretta s.r.l.
  and implemented by me. Thank you, Alex. Nethttpd is released under
  the GPL (unlike the rest of Ocamlnet).

- Smtp is a simple SMTP client library, i.e. it can transfer mail
  messages to a mail server. 

  Smtp is a contribution by Pierre Habouzit.

Furthermore, there are a few bugfixes:

- Netmime.write_mime_message: A bug in CR/LF handling has been resolved.

- The FastCGI library can now handle POST requests larger than 4K

- Newer versions of the PCRE library are now supported.

The addition of Nethttpd also made some refactoring necessary:

- There is a new Nethttp module with common functions for all components
  dealing with HTTP. Nethttp includes an almost complete set of parsers
  and printers for the individual HTTP headers.

- The cgi_environment class type has been extended. Especially,
  one can now set an error logging function.

----------------------------------------------------------------------------
What is Ocamlnet?
----------------------------------------------------------------------------

A collection of modules for the Objective Caml language which focus on 
application-level Internet protocols and conventions.

The current distribution contains:

- a mature implementation of the CGI protocol

- an implementation of the JSERV protocol (AJP-1.2), can be used with
  mod_jserv (Apache JServ) and mod_jk (Jakarta connector) to connect
  application servers written in O'Caml with web servers

- an implementation of the FastCGI protocol, for the same purpose

- an HTTP daemon, for substituting the web server completely

- a POP3 client

- an SMTP client

- a library of string processing functions related to Internet 
  protocols (formerly known as "netstring" and distributed separately):
  MIME encoding/decoding, Date/time parsing, Character encoding
  conversion, HTML parsing and printing, URL parsing and printing,
  OO-representation of channels, and a lot more.

Ocamlnet is developed as a SourceForge project:

  http://sourceforge.net/projects/ocamlnet

Developers and code contributions are welcome.

Ocamlnet is licensed under the zlib/libpng license, except the HTTP
daemon, which is released under the terms of the GPL.

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'&lt;1':1
zM

If you know of a better way, please let me know.

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt