Previous week Up Next week

Hello

Here is the latest Caml Weekly News, for the week of July 24 to 31, 2007.

The CWN will go on hiatus for two weeks as I'll be in vacations with very limited internet access.

  1. Knuth-Morris-Pratt
  2. OCamlPCSC
  3. Ropes and rope-like functional extensible vectors with O(1) prepend/append.
  4. XmlRpc-Light 0.4 - Now a server too
  5. 3-yr post for language person

Knuth-Morris-Pratt

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/59b214d45c40e2ce/718c284cc231c5e4#718c284cc231c5e4

Jean-Christophe Filliatre announced:
As a byproduct of the  last ICFP programming contest, I'm releasing an 
implementation of Knuth-Morris-Pratt string searching algorithm that I 
wrote years ago. You can find it here, in my ocaml bazar: 

  http://www.lri.fr/~filliatr/software.en.html 

Just in case it may be useful, as it finally happened to be for myself.
			

OCamlPCSC

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/dcb74e1ce128ea06/0e021a82b0be1235#0e021a82b0be1235

Manuel Preliteiro announced:
I am pleased to announce OCamlPCSC, an OCaml binding to PC/SC library. 
OCamlPCSC is a wrapper to libraries that follow the PC/SC standard for 
Smartcards. In Linux 
it uses PCSC-Lite from M.U.S.C.L.E. (http://www.linuxnet.com/) and in 
windows it uses 
winscard.dll. 

OCamlPCSC is available at http://www.di.ubi.pt/~desousa/ocamlpcsc. 

As any other OCaml library it's free for use and open-source (LGPL license).
			

Ropes and rope-like functional extensible vectors with O(1) prepend/append.

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/4e70beff0f714229/7c60860fa63809c9#7c60860fa63809c9

Mauricio Fernandez announced:
In the aftermath of the ICFP contest, during which I used Luca de Alfaro's 
Vec, I felt like implementing ropes, based on Boehm's paper and the well-known 
code included in his conservative garbage collector. 

I later realized that the basic implementation strategies ("dense" leaves, 
bounded tree height and amortized constant time concatenation of small 
strings) could be generalized to build general extensible vectors similar to 
Vec. 

Such vectors (tentatively named "Vect" until I find a better name) have some 
interesting properties: 
* lower space overhead than other structures based on balanced trees such as 
  Vec.  The overhead can be adjusted, allowing to make "get" faster at the 
  expense of "set" and viceversa. 
* appending or prepending a small vector to an arbitrarily large one in 
  amortized constant time 
* concat, subarray, insert, remove operations in amortized logarithmic time 
* access and modification (get, set) in logarithmic time 

The first one is probably the most compelling. Currently, Vec imposes a 6-word 
overhead per element. Even after the obvious modification consisting in adding 
a new constructor for leaves, the overhead would still be 350%... Vect uses 
compact leaves with a configurable number of elements (32-64 seem good 
choices, leading to worst-case overheads of 100% and 50% respectively), which 
also helps with "get" due to the improved spatial locality. 

You can find the code for both Rope and Vect at 
http://eigenclass.org/repos/oropes/head/ 
It is still young and experimental, but it's maybe worth a look. Any feedback 
is very welcome. 

The documentation can be found under 
http://eigenclass.org/repos/oropes/head/doc/ 

I've spent some time benchmarking it against Vec; you can also find the 
code I used and the resulting graphs at the above address. 

To summarise how it compares to Vec: 
* Vec can be used when persistence is required, but Vect would probably be a 
  poor choice in this case (until that is fixed using lazy rebuilding, which 
  doesn't seem too hard), unless rebalancing explicitly before "taking the 
  snapshot" is an option 
* Vect can append/prepend single elements or small vectors very quickly, in 
  amortized constant time. See 
  http://eigenclass.org/repos/oropes/head/append.png 
* as expected, Vec.set is faster than Vect's in general 
  http://eigenclass.org/repos/oropes/head/set.png 
  However, if the vector is balanced explicitly shortly before an update 
  burst, Vect is somewhat surprisingly faster 
  http://eigenclass.org/repos/oropes/head/set-balanced.png 
  This might be attributed to Vect's smaller memory profile and the fact that 
  it allows better use of the L2 cache, but there seems to be another factor 
  that I have yet to discover. 
* Vect.get is considerably faster than Vec.get 
  http://eigenclass.org/repos/oropes/head/get.png 

The above URL is a darcs repository, so if anybody shoots me a patch I'll be 
happy to apply it :)
			
Jon Harrop asked and Mauricio Fernandez answered:
> Looks awesome! 

> It seems to use mutation internally. Is it not thread safe? 

The structure is persistent and all operations are non-destructive. 
Mutation is used only in the rebalancing operation and in set, but they affect 
an ephemeral forest of ropes/vects and a new string/array respectively, so the 
original structure is never modified and in principle all operations should be 
thread-safe. 

Ropes/vects being functional doesn't mean that they will perform well in a 
persistent setting however, see the clarification at 
http://eigenclass.org/repos/oropes/head/doc/Vect.html 

> I have some suggestions: 
>
> I'd like metadata in every node, so I can provide a constructor that 
> combines 
> the metadata of child nodes and a cull function to accelerate searches. 


If I understand it correctly, that scheme could in the limit turn some O(n) 
searches into O(log n)?), right? 

Unlike Vec, Vect uses "compact" leaves (Leaf of 'a array) of bounded size 
(leaf_size, typically 16-64), which might not fit very well. 

Vect would need to know how to combine the metadata, wouldn't it? I was 
thinking about something like 
  Leaf of ('meta -> 'meta -> 'meta) * 'meta * 'a array 
but I've realized that this wouldn't suffice. So, given that Vect.t is 
currently 

    type 'a t = 
        Empty 
      | Concat of 'a t * int * 'a t * int * int 
      | Leaf of 'a array 

something like this maybe? 

  type ('a, 'meta) t = 
        Empty of ('meta -> 'meta -> 'meta) 
      | Concat of ('meta -> 'meta -> 'meta) * 'meta * 
                  ('a, 'meta) t * int * 
                  ('a, 'meta) t * int * 
                  int 
      | Leaf of ('meta -> 'meta -> meta) * 'meta array * 'a array 
                (* maybe also  * 'meta  to cache the last computation? *)   

or even without the ('meta -> 'meta -> 'meta) part, forcing the user to pass 
the function on each modification?  Just thinking out loud. 

At any rate, it'd be better to provide it as a separate structure, any 
suggestions for the name?. 

> The usual HOFs, like map. 

I just pushed a patch with filter and map. The former is trivially implemented 
with fold + append (thanks to the O(1) append). I was going to code map the 
same way but I ended up making one that returns an isomorphic vect and is 
faster (since there's no need to rebalance). 

So Vect currently has iter, iteri, rangeiter, fold, map and filter. I'm 
considering renaming fold to fold_left and providing fold_right too.
			

XmlRpc-Light 0.4 - Now a server too

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/07d658e57ff7d1c1/ec53e2f3fc7cfa3f#ec53e2f3fc7cfa3f

Dave Benjamin announced:
I have released XmlRpc-Light 0.4, an XML-RPC library for OCaml based on 
Xml-Light and Ocamlnet 2. Source, downloads, and documentation are 
available here: 

     http://code.google.com/p/xmlrpc-light/ 

This version adds support for writing servers. Two methods are currently 
provided: CGI (based on Netcgi2) and Netplex. The obligatory hello-world 
example looks like this: 

     let server = new XmlRpcServer.cgi () in 
     server#register "demo.sayHello" 
       (fun _ -> `String "Hello!"); 
     server#run () 

To build a Netplex server, just change "cgi" to "netplex". An example of 
a Netplex server including the required configuration file is in the 
examples/adder directory. 

All servers support the "system.getCapabilities" and 
"system.listMethods" introspection functions, as well as the 
"system.multicall" protocol. These can be disabled if desired by calling 
the "unregister" method. 

Other changes and improvements: 

   - The default date-time functions use the format 
"20070729T10:42:00-07:00". This seems to be the most common 
interpretation of ISO 8601 used in XML-RPC servers. You can override 
this behavior by calling the "set_datetime_encode" or 
"set_datetime_decode" methods on the client or server. 

   - Date-time parsing errors are now wrapped in XmlRpc.Error so that 
they will be relayed to clients as faults. 

   - Error handling adheres much closer to the XML-RPC specification and 
its list of suggested fault codes and strings. 

   - The client now sends a "text/xml" Content-Type header in requests. 

Thanks to Gerd Stolpmann for the help with Ocamlnet!
			
Richard Jones suggested and Dave Benjamin answered:
> You might want to take a look at Julien Signoles' Calendar library for 
> date/time types and handling: 
>
> http://www.lri.fr/~signoles/prog/calendar/ 

I have this library installed, and indeed considered using it when I 
began writing the date-time support. I would likely have used it, if 
only it had the ability to parse strings. 

I really wish Winer had considered alternatives to ISO 8601--say, UTC 
epoch seconds--in the design of XML-RPC, because it's barely a standard 
at all! There are so many variations and options that writing a parser 
for it borders on natural language processing. Even the W3C suggestion, 
which restricts ISO 8601 to a very small subset, doesn't help here since 
it still conflicts with the common usage in XML-RPC, with hyphens 
omitted between the date values. I decided to err on the side of 
oversimplification, and support only the most common format, leaving in 
hooks for users to customize the behavior as required. 

There is still benefit, of course, in using a standard date-time type. I 
only wonder if it is worth adding another library dependency; I am 
trying hard to keep the list small (currently only Xml-Light and 
Ocamlnet, which in turn requires PCRE). I think it would be great if a 
date-time type were made part of the official OCaml distribution. 

My only qualm with the Calendar library is that I feel a bit 
uncomfortable with a top-level module called "Printer" that is for the 
specific purpose of date formatting. I would assume that a module by 
that name were for communicating with "lpt", if anything. But hey, 
what's in a name, anyway... =) 

Thanks for the advice. I will consider it.
			

3-yr post for language person

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/1792bc907e171834/91afa28b79a0b2e4#91afa28b79a0b2e4

Simon Peyton-Jones announced:
Tim Griffin is advertising a 3-year research associate position at the 
Cambridge Computer Lab, working on a project that seeks to design and 
implement a meta-language for the specification and implementation of correct 
Internet routing protocols. 

He says "A PL person would be perfect". 

Details here: 
http://www.admin.cam.ac.uk/offices/personnel/jobs/vacancies.cgi?job=2114
			

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM

If you know of a better way, please let me know.


Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.


Alan Schmitt