Caml Weekly News

Hello

Here is the latest Caml Weekly News, for the week of July 17 to 24, 2007.

Vec: a functional implementation of extensible arrays
Fast XML parser
fiblib 0.1, a fibonacci heap library
Custom camlp4 lexers

Vec: a functional implementation of extensible arrays

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/a965ef148e7a7201/5c5a35b15dffdf7b#5c5a35b15dffdf7b

Luca de Alfaro announced:

I would like to share with you an Ocaml implementation of extensible 
arrays.  The implementation is functional, based on balanced trees (and on 
the code for Set and Map); I called the module Vec (for vector - I like 
short names).  You can find it at http://www.dealfaro.com/home/vec.html 
Module Vec provides, in log-time:

   -  Access and modification to arbitrary elements (Vec.put n el v puts 
   element el in position n of vector v, for instance). 
   - Concatenation 
   - Insertion and removal of elements from arbitrary positions 
   (auto-enlarging and auto-shrinking the vector). 

as well as: 

   - All kind of iterators and some visitor functions. 
   - Efficient translation to/from lists and arrays. 

An advantage of Vec over List, for very large data structures, is that 
iterating over a Vec of size n requires always stack depth bounded by log n: 
with lists, non-tail-recursive functions can cause stack overflows. 

I have been using this data structure for some months, and it has been very 
handy in a large number of occasions.  I hope it can be as useful to you. 

I would appreciate all advice and feedback.  Also, is there a repository 
where I should upload it?  Do you think it is worth it?

Hugo Ferreira said:

For those of you interested in functional array consider Sylvain Conchon 
and Jean-Christophe Filliâtre paper in [1]. The Union-Find (UF) uses 
persistent arrays as its base data structure. I have made tests with the 
UF using the code provided, an implementation of k-BUF data structure 
(delayed backtracking) and altered version of the persistent array (fat 
nodes + delayed backtracking). The tests I did show that this version of 
persistent arrays is very efficient (especially for single threaded 
backtracking). 

Maybe Luca could pit his implementation against that of the article and 
report on how they fare?

[1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps

Luca de Alfaro answered:

Thanks for the pointer to the excellent paper.  First, let me say that my 
Vec data structure was born to fill a need I had while working on a project: 
while it has been useful to me, I certainly do not claim it is the best that 
can be done, so I am very grateful for these suggestions! 

My Vec data structure is different from persistent arrays.  It is likely to 
be less efficient for get/set use. 
However, it offers at logarithmic cost insertion/removal operations that are 
not present in the standard persistent arrays. 
Consider a Vec a of size 10. 

   - Vec.insert 3 d a inserts value d in position 3 of a, shifting 
   elements 3..9 of a to positions 4..10. 
   - Vec.remove 3 a removes the element in position 3 of a, shifting 
   elements 4..9 to positions 3..8.  Vec.pop is similar and returns the 
   removed element as well. 
   - Vec.concat works in log-time. 

These operations are necessary if you want to use a Vec as a FIFO, for 
example (you append elements at the end, and you get the first element via 
Vec.pop 0 a).  In many algorithms, it is often handy to be able to 
remove/insert elements in the middle of a list. 

In summary, I don't think the Vec data structure is a replacement for arrays 
or persistent arrays in numerically-intensive work.  But if you want a 
flexible data structure for the 90% of the code that is not peformance 
critical, they can be useful. 
Now the question is: can one get better get/set efficiency while retaining 
the ability to insert/remove elements?  (I am sure that there is something 
better to be done...).

Fast XML parser

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/5136b4032ed0f6c2/1c07ca1f78d20b45#1c07ca1f78d20b45

Luca de Alfaro asked and Richard Jones answered:

> I am interested in parsing Wiki markup language that has a few tags, like 
> <pre>...</pre>, <math>...,</math>. 
> These tags are sparse, meaning that the ratio of number of tags / number of 
> bytes is low. 
> I would like, given a string (or a stream) with such tags, to parse it as 
> fast as possible.  Efficiency is a primary consideration, and so is 
> simplicity of the implementation. 
> Do you have any advice about the library I should be using? 

There's some code in COCANWIKI which does exactly this: 

http://sandbox.merjis.com/release 

Look at the file scripts/lib/wikilib.ml. 

It's not a particularly clever implementation, but it has a great deal 
of testing in the real world. 

As well as <xml>-like syntax it also does a lot of standard wiki 
syntax like '* ' for bullet points, paragraphs, indents for 
preformatted sections and so on.  And it outputs pure unadulterated 
XHTML.

fiblib 0.1, a fibonacci heap library

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/eeb75e7d4f7eeb52/8532f5ca530e88d2#8532f5ca530e88d2

Denis Bueno announced:

I'm pleased to announce the first release of fiblib, an efficient, 
imperative Fibonacci heap library, based on the pseudocode of Cormen, 
Leiserson, Rivest, and Stein: 

http://code.google.com/p/fiblib/ 

This implementation provides a base of operations you'd expect from a 
heap (with more to come): 

  - insert 
  - extract min 
  - delete 
  - decrease key 

Other operations, such as union, are slated for another release. 

Though a relatively obscure data structure, they are the best known 
solution for certain algorithms requiring a heap. In particular, a 
fibonacci heap is a win if the number of extract-min and delete 
operations is small compared to the number of other operations. 

Fiblib is released under a BSD-style license, so, if you need a heap, 
try it out, and let me know how it goes!

Custom camlp4 lexers

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/5221659647adbeee/649206d2f9826ef1#649206d2f9826ef1

Jon Harrop asked and Daniel de Rauglaudre answered:

> IIRC, there was mention before that camlp4 even allows you to specify lexers 
> inline. How is this done and are there any examples? 

Perhaps you think of the specific syntax of parsers of character streams ? 

I implemented that in camlp5 (previously named camlp4s). See doc at: 
   http://pauillac.inria.fr/~ddr/camlp5/doc/html/lexers.html 

If interested, camlp5 is downloadable at: 
   http://pauillac.inria.fr/~ddr/camlp5/

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'&lt;1':1
zM

If you know of a better way, please let me know.

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt