Hello
Here is the latest Caml Weekly News, for the week of July 17 to 24, 2007.
I would like to share with you an Ocaml implementation of extensible arrays. The implementation is functional, based on balanced trees (and on the code for Set and Map); I called the module Vec (for vector - I like short names). You can find it at http://www.dealfaro.com/home/vec.html Module Vec provides, in log-time: - Access and modification to arbitrary elements (Vec.put n el v puts element el in position n of vector v, for instance). - Concatenation - Insertion and removal of elements from arbitrary positions (auto-enlarging and auto-shrinking the vector). as well as: - All kind of iterators and some visitor functions. - Efficient translation to/from lists and arrays. An advantage of Vec over List, for very large data structures, is that iterating over a Vec of size n requires always stack depth bounded by log n: with lists, non-tail-recursive functions can cause stack overflows. I have been using this data structure for some months, and it has been very handy in a large number of occasions. I hope it can be as useful to you. I would appreciate all advice and feedback. Also, is there a repository where I should upload it? Do you think it is worth it?Hugo Ferreira said:
For those of you interested in functional array consider Sylvain Conchon and Jean-Christophe Filliâtre paper in [1]. The Union-Find (UF) uses persistent arrays as its base data structure. I have made tests with the UF using the code provided, an implementation of k-BUF data structure (delayed backtracking) and altered version of the persistent array (fat nodes + delayed backtracking). The tests I did show that this version of persistent arrays is very efficient (especially for single threaded backtracking). Maybe Luca could pit his implementation against that of the article and report on how they fare? [1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.psLuca de Alfaro answered:
Thanks for the pointer to the excellent paper. First, let me say that my Vec data structure was born to fill a need I had while working on a project: while it has been useful to me, I certainly do not claim it is the best that can be done, so I am very grateful for these suggestions! My Vec data structure is different from persistent arrays. It is likely to be less efficient for get/set use. However, it offers at logarithmic cost insertion/removal operations that are not present in the standard persistent arrays. Consider a Vec a of size 10. - Vec.insert 3 d a inserts value d in position 3 of a, shifting elements 3..9 of a to positions 4..10. - Vec.remove 3 a removes the element in position 3 of a, shifting elements 4..9 to positions 3..8. Vec.pop is similar and returns the removed element as well. - Vec.concat works in log-time. These operations are necessary if you want to use a Vec as a FIFO, for example (you append elements at the end, and you get the first element via Vec.pop 0 a). In many algorithms, it is often handy to be able to remove/insert elements in the middle of a list. In summary, I don't think the Vec data structure is a replacement for arrays or persistent arrays in numerically-intensive work. But if you want a flexible data structure for the 90% of the code that is not peformance critical, they can be useful. Now the question is: can one get better get/set efficiency while retaining the ability to insert/remove elements? (I am sure that there is something better to be done...).
> I am interested in parsing Wiki markup language that has a few tags, like > <pre>...</pre>, <math>...,</math>. > These tags are sparse, meaning that the ratio of number of tags / number of > bytes is low. > I would like, given a string (or a stream) with such tags, to parse it as > fast as possible. Efficiency is a primary consideration, and so is > simplicity of the implementation. > Do you have any advice about the library I should be using? There's some code in COCANWIKI which does exactly this: http://sandbox.merjis.com/release Look at the file scripts/lib/wikilib.ml. It's not a particularly clever implementation, but it has a great deal of testing in the real world. As well as <xml>-like syntax it also does a lot of standard wiki syntax like '* ' for bullet points, paragraphs, indents for preformatted sections and so on. And it outputs pure unadulterated XHTML.
I'm pleased to announce the first release of fiblib, an efficient, imperative Fibonacci heap library, based on the pseudocode of Cormen, Leiserson, Rivest, and Stein: http://code.google.com/p/fiblib/ This implementation provides a base of operations you'd expect from a heap (with more to come): - insert - extract min - delete - decrease key Other operations, such as union, are slated for another release. Though a relatively obscure data structure, they are the best known solution for certain algorithms requiring a heap. In particular, a fibonacci heap is a win if the number of extract-min and delete operations is small compared to the number of other operations. Fiblib is released under a BSD-style license, so, if you need a heap, try it out, and let me know how it goes!
> IIRC, there was mention before that camlp4 even allows you to specify lexers > inline. How is this done and are there any examples? Perhaps you think of the specific syntax of parsers of character streams ? I implemented that in camlp5 (previously named camlp4s). See doc at: http://pauillac.inria.fr/~ddr/camlp5/doc/html/lexers.html If interested, camlp5 is downloadable at: http://pauillac.inria.fr/~ddr/camlp5/
Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).
:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM
If you know of a better way, please let me know.
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.