Here is the latest Caml Weekly News, for the week of 15 to 22 February, 2005.
Archive: http://caml.inria.fr/archives/200502/msg00416.htmlGilles Dubochet asked and Jacques Garrigue answered:
> The following O'Caml expression: > let incr g x = match x with > | `Int i -> `Int (i+1) > | x -> (g x);; > > Receives type: > (([> `Int of int ] as 'a) -> ([> `Int of int ] as 'b)) -> 'a -> 'b > > Why? I am quite astonished. I would have expected a type more like: > ([> ] -> [> ]) -> [> `Int of int ] -> [> `Int of int ] > > or in Wand or RÃ©my-like row variable notation with which I am a little > more comfortable (I am not exactly sure the preceding type is correct > in the 'at leat - at most' notation of O'Caml): > ([ 'a ] -> [ 'b ]) -> [ `Int of int | 'a ] -> [ `Int of int | 'b ] The reason is simple enough: variants in ocaml are not based on Remy's rows, but on a type system with kinded variables, more in Ohori's style. This is described in detail in "Simple type inference for structural polymorphism", which you can find at http://wwwfun.kurims.kyoto-u.ac.jp/~garrigue/papers/ The main reason for this choice is avoiding making rows explicit, i.e. having a multi-sorted type algebra. Note also that ocaml object types, while originally based on Remy's system, are hiding their row-variables, and can be described in the same formalism. > Could anyone be kind enough to give me some clues about where to look > at to find an explanation or even better, explain me what is going on? > I am particularly puzzled by the fact that the g function's *argument* > type is 'at least `Int of int'. This rejects the following code which > seems intuitively correct: > incr (function `Float f -> `Float (f+.1.));; Pragmatic reasons for this choice are given in Section 3 of "Typing deep pattern-matching in presence of polymorphic variants" which you can find at the same place.Gilles Dubochet then asked and Jacques Garrigue answered:
> Just to make it crystal clear for me: You say that the "main reason for > this choice is avoiding making rows explicit", does this mean that the > O'Caml team feels that row-based type information is too complicated > for an average user since you either steer away of hide it in an object > model? Basically yes. Hence the clever tricks to have types containing hidden row variables, like #t, [< t], [> t]. Having explicit row variables was considered at some point, but it was not done because types would become much more verbose. Consider for instance that with variants, presence/absence information is needed for each tag. On the other hand, advanced features like explicit polymorphism require some understanding of the presence of hidden variables, and how they interact, so at that level of proficiency it's not clear whether hiding them really helps. And I'm now playing with abstract rows, which makes the decision even less clear...
Archive: http://caml.inria.fr/archives/200502/msg00435.htmlJanne Hellsten asked:
I need to sequentialize two processes accessing the same GDBM database (OCaml's dbm library). Since GDBM allows for only one writer at a time, I need a way to block until the previous writer has finished. I think there is a way to do this with the original GDBM library since it claims to handle the file lockings properly -- I could block based on the returned error codes. However, this functionality does not appear to be exposed through the Dbm module. Of course I can implement this blocking myself with an interprocess mutex. But how can I implement such a mutex in OCaml? I couldn't find anything from the standard library that would resemble my problem. Is it easily doable or do I need to go and start hacking C code? I've never done it on Unix but I know it's easily done on Windows. Thanks in advance, Janne P.S. I'm using GDBM merely as a cache between process invocations, so using a more heavyweight DB is out of the question.Alex Baretta answered:
Unix.lockf is the way to go.Janne Hellsten then replied:
OK, this is what I came up with (uses ExtLib's Std.finally): --- let create_lock name = let chnl = open_out name in output_string chnl "Lockfile - don't delete"; close_out chnl let with_write_block name f = if not (Sys.file_exists name) then create_lock name; let fd = Unix.openfile name [Unix.O_RDWR] 0o660 in Std.finally (fun () -> Unix.close fd) (fun _ -> Unix.lockf fd Unix.F_LOCK 0; f ()) () let test = with_write_block "lock_name" (fun () -> Printf.printf "this code blocks if other process already acquired lock!") --- I hope I'm using lockf correctly.
Archive: http://caml.inria.fr/archives/200502/msg00442.htmlMary Fernandez announced:
We are proud to announce version 0.5.0 of Galax, an open source implementation of XQuery 1.0. This version implements the October 2004 XQuery working drafts and includes a completely redesigned compiler with an algebraic optimizer, a more efficient XML parser, and numerous bug fixes. Galax 0.5.0 can be downloaded from the Galax Web site at: http://www.galaxquery.org/ In addition to the Galax source, pre-compiled binaries are available for Linux, MacOS, Solaris, and Windows with MinGW. Galax provides APIs for O'Caml, C, and Java and is implemented in O'Caml.
Archive: http://caml.inria.fr/archives/200502/msg00439.htmlMike Hamburg asked and Gerd Stolpmann answered:
> Is there any way to call Marshall in a type-safe way? I need to use > marshaling for a networking program, and I'd rather not leave Marshal > as an arbitrary code execution vulnerability (which it is as far as I > can tell: switching on a Marshaled value should produce a computed > jump, which can be set by an attacker to point to an arbitrary place). > Am I stuck writing my own marshal function? Marshal is not type-safe, no chance. I see three options for you: - If it is a closed protocol, you can sign the marshaled values - You can use other serializers. A quite simple and fast serializer is the XDR encoder in my SunRPC implementation (see http://ocaml-programming.de/programming/rpc.html). Other options I know are BER (see ocamldap), XML-RPC, SOAP, and Ensemble. - Write the serializer yourself. Maybe this is an option for you if you need maximum performance.Oliver Bandel then asked and Gerd Stolpmann answered:
> Well.. is there already an XDR-binding for OCaml? Yes, as already pointed out, it is part of my SunRPC implementation: http://ocaml-programming.de/programming/rpc.html . It is not a binding, however, but a pure O'Caml implementation. It is quite easy and obvious how to use the XDR part alone without the rest of RPC. For example, to define a record with an integer and a string of maximum 20 characters: open Xdr open Rtypes let my_type_term = X_struct [ "my_int", X_int; "my_string", (X_string (uint4_of_int 20)) ] let my_type = validate_xdt_type_term my_type_term Now, to encode a value: let my_val = XV_struct [ "my_int", (XV_int (int4_of_int 42)); "my_string", (XV_string "Sample") ] let my_val_as_wire_string = pack_xdr_value_as_string my_val my_type  my_val_as_wire_string can now be sent over the network. For decoding, use: let my_val_again = unpack_xdr_value my_val_as_wire_string my_type  If the string is illegal (e.g. my_string is longer than 20 characters), exceptions will be thrown. One can also use ocamlrpcgen to generate parts of the above code, including automatic conversion between XDR and the corresponding O'Caml types (e.g. an XDR struct is converted to an O'Caml record type). For complex protocols, the overhead of learning ocamlrpcgen is worth the effort. One should also consider using RPC directly rather than to invent a new networking layer.
Archive: http://caml.inria.fr/archives/200502/msg00475.htmlJon Harrop announced:
We've just published our first book on OCaml! http://www.ffconsultancy.com/products/ocaml_for_scientists Let me know what you think. :-)
Archive: http://caml.inria.fr/archives/200502/msg00485.htmlContinuing last week's thread, Jon Harrop said:
> Part of the 27 ML projects are in fact Caml projects. Indeed, the following twelve seem to be written in OCaml: Unison, FFTW, MATHPLOT, WDialog, PXP, GeneWeb, SwiftSurf, FaCiLe, pxpvalidate, Camlserv, Lazy-L and JSON. I just had a look at sourceforge, which has far more projects (14,325 C, 14,813 C++ and 14,074 Java) by comparison, and the accredited language seems to be wrong much more often. Assuming that all of the 132 SML projects are actually in OCaml, this gives 155 projects OCaml, i.e. about two orders of magnitude less common that the most popular languages. Also, I think that web searches for "resume" and "CV" are likely to be inaccurate. Searching for "written in *" seems more reasonable. This gives: 792,000 Java 636,000 C 424,000 Perl 312,000 C++ 221,000 Python 92,900 C# 30,100 Ruby 20,300 Lisp 11,700 Scheme 5,240 ocaml 974 caml Again, OCaml is about two orders of magnitude below the most popular languages.Erik de Castro Lopo added:
For FFTW, only a small part is written in O'Caml, a program which generates optimized C codelets for the FFT butterflys. The vast majority of FFTW is written in C.
Archive: http://caml.inria.fr/archives/200502/msg00510.htmlArthur Chargueraud announced:
I'd like to announce the release of an Ocaml programming course. It is designed for beginners, and does not require any knowledge in programming at all. Anyone aged over 12 should be able to complete the course on his own, and thus learn the fundamentals of programming. For the time being, it is only available in french, but it will be translated into english later on. Table of contents : Chapitre 0 Introduction Chapitre 1 let, if, for Chapitre 2 float, int, ref Chapitre 3 function Chapitre 4 array, char, string Chapitre 5 bool, random Chapitre 6 while, rec Chapitre 7 expression About 130 exercices are part of the tutorial. As you may notice, the course currently only covers about the first half of general programming concepts. I may cope with the second half in the future. The tutorial is available as text version, as PDF or PS files (about 170 pages), but also through an online interactive version : one can test and submit source code, that will be compiled and evaluated online. Access the tutorial : http://www.france-ioi.org/cours_caml/index.php Note : access is free to all, but if you want to submit code you will need to go through a quick registration. The material is hosted by France-IOI, an organisation focused on the development of algorithmics and programming for algorithmics in french speaking countries. If you have specific questions, conctact me on : firstname.lastname@example.org
Archive: http://caml.inria.fr/archives/200502/msg00491.htmlErik de Castro Lopo said:
I am about to port some code from C to O'caml. This code uses the C99 function : long int lrint (double d) ; which performs rounding on the double and then converts that to a long int. In O'caml the only option seems to be: let round_to_int f = int_of_float (f +. 0.5) ;; The problem is that this code on i386 produces really slow code: 804b385: dd 44 98 fc fldl 0xfffffffc(%eax,%ebx,4) 804b389: de c1 faddp %st,%st(1) 804b38b: 83 ec 08 sub $0x8,%esp 804b38e: d9 7c 24 04 fnstcw 0x4(%esp) 804b392: 66 8b 44 24 04 mov 0x4(%esp),%ax 804b397: b4 0c mov $0xc,%ah 804b399: 66 89 44 24 00 mov %ax,0x0(%esp) 804b39e: d9 6c 24 00 fldcw 0x0(%esp) 804b3a2: db 1c 24 fistpl (%esp) 804b3a5: 8b 04 24 mov (%esp),%eax 804b3a8: d9 6c 24 04 fldcw 0x4(%esp) 804b3ac: 83 c4 08 add $0x8,%esp The killer here is the two fldcw (floating point load control word) instructions, around the fistpl (which actually does the float to int conversion). Loading the FP control work causes a flush of the FPU pipeline. In code with a lot of floating point code interspersed with a round to int, there can be a significant slow down due to the fldcw instructions. The lrint function in C, replaces all the above with one fistpl and a single mov instruction and leaves the floating point control word intact. In C code that moved from: (int) floor (f + 0.5) to lrintf (f) I have seen an up to 4 fold increase in speed. I've looked at the code for the O'Caml compiler and I think I know how to implement this, at least for x86 and PowerPC, the two architectures I have access to. If I was to supply a patch would it be accepted? I know other suggestions like this one : http://sardes.inrialpes.fr/~aschmitt/cwn/2003.11.18.html#1 were not viewed favourably, but the addition of a single function with an explicit behaviour is a far neater solution.Robert Roessler answered and Erik de Castro Lopo replied:
> I will preface this by a Slashdot-like "IANANA" (I Am Not A Numerical > Analyst). Nor am I. I don't play one in a TV show either. > The above approach is more or less what you expect if you (as a > compiler code generator) a) want to do rounding following C/C++ > standards ("Truncate (toward 0)"), and b) make no assumption regarding > the state of the IEEE hardware rounding setting... Yes. > You, on the other hand, are willing to make an assumption regarding > the hardware rounding mode - [presumably] that it is set to the > power-on default of "Round to nearest, or to even if equidistant", > which may not be unreasonable - it just needs to be explicit that this > *is* the assumption, and that you have a way of verifying (or at least > reason to believe) that other software components in your app's > environment are not invalidating this assumption. From the lrint manpage on Linux: These functions round their argument to the nearest integer value, using the current rounding direction. If x is infinite or NaN, or if the rounded value is outside the range of the return type, the numeric result is unspecified. A domain error may occur if the magnitude of x is too large. I would suggest something similar for the proposed O'Caml function. > The fact that the default hardware rounding mode does NOT match "(int) > floor (f + 0.5)" should also be mentioned... the "+ 0.5" attempts to > do what the hardware would call "Round up (toward +infinity)" Careful, that could very easily be confused with the C functions ceil(). > This could take the form of a compiler switch exactly like "/QIfist", A compiler switch is significantly more complicated than my proposal for a new built-in function with a well defined behaviour. The compiler switch causes all int_to_float to behave like a round instead of a truncate. My proposal allows a single file to have both behaviours. > Of course, if something like this were to added to ocamlopt (for > target architectures using IEEE floating point), code (an additional > bytecode op?) emulating the same behavior could be added to the > runtime to maintain consistency across the interpreted and native > operating environments - or not. I believe that the bytecode interpreter simply uses the definitions supplied by ocamlopt. Once ocamlopt has this functionality there is not much else required to allow the interpretter to use the same functionality.Robert Roessler replied:
> I would suggest something similar for the proposed O'Caml function. Exactly like the "FIST[P]" instruction without explicit control word saving, setting and restoring - which is believable, since according to your first post, your reference implementation is done this way. > >The fact that the default hardware rounding mode does NOT match "(int) > >floor (f + 0.5)" should also be mentioned... the "+ 0.5" attempts to > >do what the hardware would call "Round up (toward +infinity)" > > > Careful, that could very easily be confused with the C functions ceil(). Umm, 2 of the IEEE hardware rounding modes *do* seem to match floor() and ceil()... it is probably not a coincidence. :) > >This could take the form of a compiler switch exactly like "/QIfist", > > > A compiler switch is significantly more complicated than my proposal > for a new built-in function with a well defined behaviour. > > The compiler switch causes all int_to_float to behave like a round > instead of a truncate. My proposal allows a single file to have > both behaviours. Actually, emulating the VC7 "/QIfist" does NOT cause "all int_to_float to behave like a round instead of a truncate" - it does exactly what we are already talking about: do the int_of_float with a "bare" FIST[P], operating in whatever rounding mode the hardware is already in (presumably one we want and expect it to be in). > >Of course, if something like this were to added to ocamlopt (for > >target architectures using IEEE floating point), code (an additional > >bytecode op?) emulating the same behavior could be added to the > >runtime to maintain consistency across the interpreted and native > >operating environments - or not. > > > I believe that the bytecode interpreter simply uses the definitions > supplied by ocamlopt. Once ocamlopt has this functionality there > is not much else required to allow the interpretter to use the same > functionality. The bytecode interpreter has nothing to do with ocamlopt... in fact, the primitive for doing int_of_float is precisely return Val_long((long) Double_val(f)); which will perform a "truncate" operation for the conversion, since that is what the C standard requires. :)Erik de Castro Lopo replied:
> Actually, emulating the VC7 "/QIfist" does NOT cause "all int_to_float > to behave like a round instead of a truncate" - it does exactly what > we are already talking about: do the int_of_float with a "bare" > FIST[P], operating in whatever rounding mode the hardware is already > in (presumably one we want and expect it to be in). Implementing your proposal, means that int_of_float will change behaviour depending on a compiler flag. I think thats a bad idea. My proposal leaves int_of_float just as it is and defines a new function: val round_to_int : float -> int which expand to a FISTPL instruction and one or two glue instructions.Erik de Castro Lopo added to his original post:
OK, I have a patch that seems to do the right thing on x86 and PowerPC: http://www.mega-nerd.com/tmp/round_to_int.diff I'm not 100% happy with the mods to byterun/floats.c. In order to use the C99 lrintf() function in caml_round_to_int() I had to define _ISOC9X_SOURCE etc before including <math.h>. The real solution (when using GCC) is to add -std=c99 to the command line. Using the following test program: http://www.mega-nerd.com/tmp/round_to_int.ml with the hacked ocamlopt compiler I'm getting about a 3 times speed improvement using round_to_int on a Pentium III. The speedup on P4 is supposed to be even more because of the P4's deeper pipeline. On PowerPC, there is no speedup because PowerPC has two separate instructions for float to int, one which rounds and one which truncates. I'd be interested in any comments from the official ocaml maintainers. What are the chances of this going into the official distribution? I also have access to a machine with a MIPS CPU. I don't know MIPS assembler but I'll see what I can do.Xavier Leroy answered:
> [ "round float to int" primitive ] > with the hacked ocamlopt compiler I'm getting about a 3 times speed > improvement using round_to_int on a Pentium III. The speedup on P4 > is supposed to be even more because of the P4's deeper pipeline. On the other hand, according to the P4 optimization manuals, the P4 is supposed to special-case this particular use of fnstcw / fldcw, so perhaps the situation is no worse than on the P3. At any rate: > I'd be interested in any comments from the official ocaml maintainers. > What are the chances of this going into the official distribution? Essentially zero :-( Basically, this is a case where additional stuff is introduced in the machine-independent parts of ocamlopt and in every code generator just to work around the brain-dead x87 floating-point instruction set. Every other processor (as well as the SSE2 instr.set on the P4) has a fast "truncate float to int" instruction, from which an efficient "round to nearest int" is easy to obtain (truncate (x +. 0.5)). I spent a lot of time in the past trying to extract decent float performance out of the x87 instruction set, under the strict constraint that these performance hacks should be confined to the x86-specific parts of ocamlopt. Nowadays, I no longer care about performance for x87: users who want good float performance should simply use the x86-64 architecture (with SSE2 floats), it's cheaply available from both AMD and Intel and works so much better for floats... At the extreme, I'd rather do a x86+SSE2 port (for P4 and P3M-class processors) than spend more time on x86+x87.Erik de Castro Lopo replied:
> On the other hand, according to the P4 optimization manuals, the P4 is > supposed to special-case this particular use of fnstcw / fldcw, so > perhaps the situation is no worse than on the P3. OK, I've just tested this. On P4 the performance hit of fnstcw / fldcw is not as bad as it is with P3, but its still significant: Using this test program (compiled with my hacked version of ocamlopt): http://www.mega-nerd.com/tmp/round_to_int.ml On a 450MHz P3: Time int_of_float : 5.970000 Time round_to_int : 2.360000 On a 2.8GHz P4: Time int_of_float : 0.420000 Time round_to_int : 0.260000 > Essentially zero :-( Basically, this is a case where additional stuff is > introduced in the machine-independent parts of ocamlopt and in every > code generator just to work around the brain-dead x87 floating-point > instruction set. Obviously it is your decision, but I think round_to_int is a common enough operation to warrant its own function. The ISO C Standards committee thought so. > Every other processor (as well as the SSE2 instr.set Quite honestly I think the value of SSE and SSE2 are over sold. There are certain algorthims which simply can't be made to run as fast on SSE/SSE2 as they run on the x87 FPU. For instance, my audio sample rate converter: http://www.mega-nerd.com/SRC/ If I compile this on a P3 with gcc-3.4 using -mfpmath=sse -msse, the highest quality (and hence slowest) converter runs 50% slower than the x87 FPU version. I have also tried re-writing the algorithm in hand optimised SSE code. The best I could get (I'm not an assembler expert) was still 10% slower than the x87 FPU. I have just now repeated my experiment by compiling SRC on a P4 with -msse2 (-mfpmath=sse2 doesn't work), the converter runs 75% slower than the x87 FPU version. > I spent a lot of time in the past trying to extract decent float > performance out of the x87 instruction set, <snip> > Nowadays, I no longer care about > performance for x87: users who want good float performance should > simply use the x86-64 architecture (with SSE2 floats), I'd love to get my hands on one of these, but I really do doubt that its performance will be much better than that of the P4. The main problem is that generating good SSE/SSE2 code from a high level language is an order of magnitude more difficult than generating code for the x87 FPU.
Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).
If you know of a better way, please let me know.
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.