Previous week Up Next week

Hello

Here is the latest Caml Weekly News, for the week of January 23 to 30, 2007.

  1. Job opportunity in Paris, around the CDuce project
  2. Interfacing with C question...
  3. The OCaml Summer Project
  4. cmigrep
  5. TestSimple - A simple unit testing framework
  6. lablpcre-1.0 - a PCRE binding for Objective Caml

Job opportunity in Paris, around the CDuce project

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/7f18a1af60b872fc/84723bd58ca98d22#84723bd58ca98d22

Alain Frisch announced:
The announce below might be of interest for people looking for jobs 
involving OCaml. 

  -- Alain Frisch 

------------------------------------------------------------- 
Position available at Paris 7 

The laboratory PPS of University Paris 7 is looking for candidates for a 
one year position, available immediately, around the CDuce project. 
CDuce is a programming language for XML (see http://www.cduce.org), 
close in spirit to OCaml, and whose compiler is implemented in OCaml. 

According to the profile of the recruited person the position will focus 
more on the development environment (e.g.: libraries for web development 
or web services, Eclipse plugins, Windows port) or on the research 
aspects (e.g.: concurrency, typing, distribution, verification) around 
CDuce. 

Candidates should be fluent in OCaml or in another functional language. 
Experience in Web development, environments for software development 
and/or XML would be useful as well. 

The annual gross salary will be around 28.000 Euros. If interested 
please send a mail to staff@cduce.org as soon as possible. 

Giuseppe Castagna
			

Interfacing with C question...

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/f15790749056f96d/ca001054f0537ea5#ca001054f0537ea5

David Allsopp asked and Remi Vanicat answered:
> Sorry if this an RADBOTFM case. Rule 2 in Chapter 18 of the manual states 
> that all local variables of type value must be declared using CAMLlocal 
> macros. However, later on when demonstrating caml_callback we get the 
> statements: 
> value* format_result_closure = caml_named_value("format_result"); 
> return strdup(String_val(caml_callback(*format_result_closure, Val_int(n)))) 

> (I've "simplified" the opening lines for clarity here - naturally it should 
> be static and once only!). 

> Two questions arise: 

> 1. Presumably it's OK to cache values returned by caml_named_value without 
> declaring them in a CAMLlocal "call" or by using register_global_root? 


No, any C pointer to a caml value must be known to the caml GC, 
because the caml GC might move the caml value. But you might no 
declare such a pointer if you are sur that the GC won't be triger 
between your affectation of the value to the C variable, and the use 
of the C variable.* 
In the given exemple, this is the case: nothing is done between the 
affectation and the use. 

By the way, when in doubt, or when you are a begginer not nowing well 
how the GC work, you should always use the CAMLlocal call. 

> 2. The result of caml_callback is passed straight to String_val. Therefore, 
> if I expand the line to: 
> value result = caml_callback(*format_result_closure, Val_int(n))); 
> return strdup(String_val(result)); 

> then does that work ok without using CAMLlocal1(result); 


Yes, for the same reason: you do nothing between the affectation of 
the result function, and the use of the variable.
			
Hendrik Tews also answered:
> Sorry if this an RADBOTFM case. Rule 2 in Chapter 18 of the manual states 
> that all local variables of type value must be declared using CAMLlocal 
> macros. However, later on when demonstrating caml_callback we get the 
> statements: 
> value* format_result_closure = caml_named_value("format_result"); 

Note the type! format_result_closure is not of type value, Rule 2 
does not apply! 

> 1. Presumably it's OK to cache values returned by caml_named_value without 
> declaring them in a CAMLlocal "call" or by using register_global_root? 

Yes it is OK. And you cannot use CAMLlocal or 
register_global_root, because they only deal with values and not 
pointers to values. The manual guarantees that the value pointed 
to by the result of caml_named_value doesn't move. Probably 
caml_named_value allocates a value outside the heap, registers it 
as a global root and gives you back its address. 

> value result = caml_callback(*format_result_closure, Val_int(n))); 
> return strdup(String_val(result)); 
>
> then does that work ok without using CAMLlocal1(result); 

Yes. You should think of values as pointers that point to data 
that is moved around by the garbage collector. If there is any 
chance that the garbage collector is called, then you must make 
sure that it updates your pointer when it moves the data. Hence 
you have to register the value. 

If the garbage collector is not called under any circumstances 
then the data will not move, the pointer doesn't need to get 
updated and you don't have to register the value. 

Furthermore, if your are sure Is_long(your_variable) is true 
under any circumstances, you don't have to register 
your_variable, because it is not a pointer. 

Of course all that is strongly discouraged.
			

The OCaml Summer Project

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/0fb2a53da1dd7b74/81c4ff8a8d8552a9#81c4ff8a8d8552a9

Yaron M. Minsky announced:
I am pleased to announce the OCaml Summer Project. The project is aimed 
at encouraging growth in the OCaml community by funding students over 
the summer to work on open-source projects in OCaml. At the end of the 
summer, we will fly all of the students who have completed their 
projects succesfully out for a meeting in New York, where people will 
present their projects and get a chance to shmooze with other members of 
the OCaml community. 
The project is being funded and run by Jane Street Capital. As people on 
the list likely know at this point, we make extensive use of OCaml here 
at Jane Street, and are excited about the idea of encouraging and 
growing the OCaml community. 

If you'd like to learn more about the project, you can look at our 
website here: 

    http://osp2007.janestcapital.com 

We'd love to have professors tell their students about the project, 
since we hope it will do some real good in terms of increasing interest 
in functional programming. 

Please direct any questions or suggestions you have to 
osp@janestcapital.com.
			
Gabriel Kerneis asked and Yaron M. Minsky answered:
> And I'm an interested student looking for ideas (the binary trees 
> project on the OCSP page seems fine but I'd be glad to have more 
> ideas) ;-) 
> Other questions : 
> - is it possible to propose several projects (per student) and let 
> the OCSP team decide which one is the better ? 

Just one proposal per student, I'm afraid.  We'd rather you chose a project 
you liked and believed in, and come up with the best proposal you can fo 
rit. 

> (if i were to be selected and finish the job) what kind of 
> visa/passport/etc. do one need to come to the USA (I'm living in 
> France) - but i guess i can find this on the Internet - and to what 
> extent will you pay the flight/food/housing/etc. fees ? 

We will pay for the travel and your stay.  I don't expect we'll pay for all 
of your meals, but we'll have a few dinners for the group.  As for the visa 
issue, my expectation is that students will figure this out on their own.  I 
suspect that for the most part I believe a tourist visa should do, but I 
don't really know the details, and it will no doubt vary from country to 
country.
			
Jon Harrop said:
Sounds like an excellent idea and the projects all look fascinating. However, 
I do have some comments on the "Binary tree library" project: 
OCaml currently has two separate implementations of AVL trees in Map and Set 
functors. Set already has fast union and split operations. 

Having two separate implementations is wasteful but more efficient. The 
underlying tree code could be factored out into another functor but this is 
costly in terms of performance. Also, the OCaml stdlib has used an odd choice 
of optimisations: inlining height calculation (which is quite a small benefit 
in the context of functors and polymorphism) but not amortising single-child 
trees into a separate constructor (which can remove up to 50% of the GC's 
effort). So the code can be made shorter and faster. 

I've already implemented my own AVL set using the node-specialisation trick. 
Performance is ~30% faster, IIRC. I've also wanted to write a functional 
array based on AVL trees (O(log n) lookup but fast sub, append, insert, 
delete etc.) and a camlp4 extension to support pattern matching over this 
type. Lists and arrays are rather priviledged containers in OCaml, having 
pattern matching and literals, but trees are better in many respects and 
would make an excellent general-purpose container. 

Finally, having to use functors does obfuscate OCaml code that deal with Sets 
and Maps in many cases, particularly because there are no built-in Int and 
Float modules so you must write your own. I often find that this superfluous 
code is as long as all of the code using the Sets/Maps. Although it would 
be "dangerous", Sets and Maps implemented without functors are much easier to 
use. After all, Hashtbl is typically used in that way. 

It is also worth noting that several people (Diego, Jean-Christophe) have 
written other tree libraries using various data structures (RB, trie, splay 
etc.). As far as I can tell, AVL trees are a good all-rounder. 

Best of luck with the projects!
			
Xavier Leroy said:
I just wanted to say a big "thank you" to you and the Jane Street 
Capital people for donating the time and money to organize such an 
event.  It will be very interesting to see what comes out of it. 
My attempt to put the visa discussion at rest.  I believe Yaron and 
Markus are right: a tourist visa (or visa waiver program) is 
probably enough; whether you need an actual visa or not is a 
complicated function of your country of citizenship and of the date 
your passport was issued, but for citizens of "old Europe", this 
function returns "no visa needed" with high probability. 

For more details, see the Web sites of the ministry of foreign affairs 
or of the US consulate in your country (URLs for France included below 
for your convenience). 

Let me add that if you never visited Manhattan before, it's well worth 
the trip.  One more reason to participate in this project!

http://www.diplomatie.gouv.fr/fr/pays-zones-geo_833/etats-unis_471/conseils-aux-voyageurs_13261/entree-sejour_29143.html

http://www.amb-usa.fr/consul/niv/needvisa/default.htm
			

cmigrep

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/fe5f01e08e6e80d6/ba2e24f88c7e5ce4#ba2e24f88c7e5ce4

Eric Stokes announced:
I am happy to announce the immediate availability of cmigrep, a small   
utility to mine cmi files for interesting bits of data. cmigrep is   
available in godi, or at http://homepage.mac.com/letaris 
A short description of features, 

cmigrep: <args> <module> 

cmigrep has two modes, the first and most common is that of searching 
for various types of objects inside a module. Objects that you can 
search for include 

switch         purpose 
-t             (regexp) print types with matching names 
-r             (regexp) print record field labels with matching names 
-c             (regexp) print constructors with matching names 
-e             (regexp) print exceptions with matching constructors 
-v             (regexp) print values with matching names 
-o             (regexp) print all classes with matching names 
-a             (regexp) print all names which match the given expression 

These are all very useful for finding specific things inside a given 
module. Here are a few examples, 

find some constructors in the unix module 

itsg106:~ eric$ cmigrep -c SO_ Unix 
SO_DEBUG (* socket_bool_option *) 
SO_BROADCAST (* socket_bool_option *) 
SO_REUSEADDR (* socket_bool_option *) 
SO_KEEPALIVE (* socket_bool_option *) 
SO_DONTROUTE (* socket_bool_option *) 
SO_OOBINLINE (* socket_bool_option *) 
SO_ACCEPTCONN (* socket_bool_option *) 
SO_SNDBUF (* socket_int_option *) 
SO_RCVBUF (* socket_int_option *) 
SO_ERROR (* socket_int_option *) 
SO_TYPE (* socket_int_option *) 
SO_RCVLOWAT (* socket_int_option *) 
SO_SNDLOWAT (* socket_int_option *) 
SO_LINGER (* socket_optint_option *) 
SO_RCVTIMEO (* socket_float_option *) 
SO_SNDTIMEO (* socket_float_option *) 

full types get printed in the case that the constructors have 
arguments. Notice that adding to the include path is modeled after the 
compiler. Findlib is also supported. 

itsg106:~ eric$ cmigrep -c "^Tsig_.*" -I /opt/godi/lib/ocaml/compiler- 
lib Types 
Tsig_value of Ident.t * value_description (* signature_item *) 
Tsig_type of Ident.t * type_declaration * rec_status (*   
signature_item *) 
Tsig_exception of Ident.t * exception_declaration (* signature_item *) 
Tsig_module of Ident.t * module_type * rec_status (* signature_item *) 
Tsig_modtype of Ident.t * modtype_declaration (* signature_item *) 
Tsig_class of Ident.t * class_declaration * rec_status (*   
signature_item *) 
Tsig_cltype of Ident.t * cltype_declaration * rec_status (*   
signature_item *) 

record field labels 

itsg106:~ eric$ cmigrep -r "^st_" Unix 
st_dev: int (* stats *) 
st_ino: int (* stats *) 
st_kind: file_kind (* stats *) 
st_perm: file_perm (* stats *) 
st_nlink: int (* stats *) 
st_uid: int (* stats *) 
st_gid: int (* stats *) 
st_rdev: int (* stats *) 
st_size: int (* stats *) 
st_atime: float (* stats *) 
st_mtime: float (* stats *) 
st_ctime: float (* stats *) 

findlib support, matching value names 

itsg106:~ eric$ cmigrep -package pcre -v for Pcre 
val foreach_line : ?ic:in_channel -> (string -> unit) -> unit 
val foreach_file : string list -> (string -> in_channel -> unit) -> unit 

nested modules 

itsg106:~ eric$ cmigrep -v ".*" Unix.LargeFile 
val lseek : file_descr -> int64 -> seek_command -> int64 
val truncate : string -> int64 -> unit 
val ftruncate : file_descr -> int64 -> unit 
val stat : string -> stats 
val lstat : string -> stats 
val fstat : file_descr -> stats 

types 

itsg106:~ eric$ cmigrep -t ".*" Unix.LargeFile 
type stats = { 
   st_dev : int; 
   st_ino : int; 
   st_kind : file_kind; 
   st_perm : file_perm; 
   st_nlink : int; 
   st_uid : int; 
   st_gid : int; 
   st_rdev : int; 
   st_size : int64; 
   st_atime : float; 
   st_mtime : float; 
   st_ctime : float; 

} 

everything! 
itsg106:~ eric$ cmigrep -a ".*" Unix.LargeFile 
val lseek : file_descr -> int64 -> seek_command -> int64 
val truncate : string -> int64 -> unit 
val ftruncate : file_descr -> int64 -> unit 
type stats = { 
   st_dev : int; 
   st_ino : int; 
   st_kind : file_kind; 
   st_perm : file_perm; 
   st_nlink : int; 
   st_uid : int; 
   st_gid : int; 
   st_rdev : int; 
   st_size : int64; 
   st_atime : float; 
   st_mtime : float; 
   st_ctime : float; 

} 

val stat : string -> stats 
val lstat : string -> stats 
val fstat : file_descr -> stats 
exception declarations 

itsg106:~/cmigrep eric$ ./cmigrep -e ".*" Unix 
exception Unix_error of error * string * string 

The second mode of cmigrep is for searching for modules in it's path, 
this lets you do a regular expression match on the full module path, 
including sub modules. For example, 

itsg106:~/cmigrep eric$ cmigrep -m Net -package netstring 
Netaccel 
Netaccel_link 
Netaddress 
Netaux 
Netaux.ArrayAux 
Netaux.KMP 
Netbuffer 
Netchannels 
Netconversion 
Netdate 
Netdb 
Netencoding 
Netencoding.Html 
Netencoding.Url 
Netencoding.Q 
Netencoding.QuotedPrintable 
Netencoding.Base64 
Nethtml 
Nethtml_scanner 
Nethttp 
Nethttp.Header 
Netmappings 
Netmappings_iso 
Netmappings_jp 
Netmappings_min 
Netmappings_other 
Netmime 
Netsendmail 
Netstream 
Netstring_mt 
Netstring_pcre 
Netstring_str 
Netstring_top 
Netulex 
Netulex.Ulexing 
Netulex.ULB 
Neturl 

here are all modules starting with Net that contain a sub module 
itsg106:~/cmigrep eric$ cmigrep -m "^Net.*\." -package netstring 
Netaux.ArrayAux 
Netaux.KMP 
Netencoding.Html 
Netencoding.Url 
Netencoding.Q 
Netencoding.QuotedPrintable 
Netencoding.Base64 
Nethttp.Header 
Netulex.Ulexing 
Netulex.ULB 

modules exactly one level deep 
itsg106:~/cmigrep eric$ cmigrep -m "^[^.]*\.[^.]*$" 
Bigarray.Array3 
Bigarray.Array2 
Bigarray.Array1 
Bigarray.Genarray 
Hashtbl.Make 
Map.Make 
MoreLabels.Set 
MoreLabels.Map 
MoreLabels.Hashtbl 
Pervasives.LargeFile 
Random.State 
Scanf.Scanning 
Set.Make 
StdLabels.String 
StdLabels.List 
StdLabels.Array 
Unix.LargeFile 
UnixLabels.LargeFile 
Weak.Make
			

TestSimple - A simple unit testing framework

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/88b25e0ff2dffcd1/f20edbf929122803#f20edbf929122803

Stevan Little announced:
I would like to announce the first release of our new unit testing   
framework for OCaml. It can be downloaded from our website here: 

http://www.iinteractive.com/ocaml/ 

It is based heavily on the Perl unit testing framework of the same   
name, and produces TAP output (http://en.wikipedia.org/wiki/Test_Anything_Protocol)
which can be read and analyzed by a wide   
range of existing Perl tools. The goal of this framework is to make   
writing unit tests as simple and as easy as possible (hence the   
name). Here is a basic example taken from the TestSimple test suite   
itself. 

     #use "topfind";; 
     #require "testSimple";; 

     open TestSimple;; 

     plan 9;; 

     diag "... testing O'Caml TestSimple v0.01 ";; 
     ok true "... ok passed";; 
     is 2 2 "... is <int> <int> passed";; 
     is 2. 2. "... is <float> <float> passed";; 
     is "foo" "foo" "... is <string> <string> passed";; 
     is [] [] "... is <'a list> <'a list> passed";; 
     is [1;2;3] [1;2;3] "... is <int list> <int list> passed";; 
     is ["foo";"bar"] ["foo";"bar"] "... is <string list> <string list> passed";; 
     is (1,"foo") (1,"foo") "... is <int * string> <int * string> passed";; 
     is TAPDocument.Ok TAPDocument.Ok "... is <type> <type> passed";; 

As this is our first released OCaml library, any and all feedback is   
very much appreciated.
			

lablpcre-1.0 - a PCRE binding for Objective Caml

Archive: http://groups.google.com/group/fa.caml/browse_frm/thread/5ec631dd1d2cc4f8/d4b5d0fb964a363b#d4b5d0fb964a363b

Robert Roessler announced:
The "1.0" release of the LablPCRE OCaml binding for PCRE is now 
available, fully supporting Linux and Windows builds PCRE versions 6.1 
- 7.0 (current). 
LablPCRE provides simple and easy to use access to regular expression 
matching, offering a rich module-based interface based on PCRE's POSIX 
functions wrapper. 

This release has been built and tested using OCaml 3.09.3 on Fedora 
Core 6 and Windows XP, supports findlib and "hands-off" building and 
installing (no "configure" script or manual file editing required), 
and has pre-built binaries for [native] Windows XP.  The full package 
is licensed under the "new" BSD license, and may be downloaded here: 

http://www.rftp.com/Downloads.shtml
			

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'&lt;1':1
zM

If you know of a better way, please let me know.


Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.


Alan Schmitt