Previous week Up Next week

Hello

Here is the latest Caml Weekly News, for the week of 11 to 18 January, 2005.

  1. generic functions
  2. camlusb - an ocaml binding for libusb
  3. glibc and ocaml/libasmrun.a
  4. Mutex and posix
  5. ocamldap-2.0.3
  6. Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  7. OCaml can be used to write kernel modules into KOS
  8. ulex+ocamlyacc
  9. crypto library for OCaml?
  10. minimum system requirements

generic functions

Archive: http://caml.inria.fr/archives/200501/msg00046.html

Continuing the thread from last week, John Skaller said and Daniel Yokomizo answered:
> It is also known that overloading CAN work with
> inference with some contraints -- there is a paper somewhere
> on Citeseer on this topic (sorry don't have the URL at the
> moment, if someone finds it please let me know..)

System CT extends ML-style type inference for supporting overloading and
polymorphic recursion. The simplicity of core-ML type inference is
maintained, with no exception or special constructs included for coping with
overloading.

http://www.dcc.ufmg.br/~camarao/CT/

You can get this paper from citeseer too, so perhaps you had it in your
mind.
    

camlusb - an ocaml binding for libusb

Archive: http://caml.inria.fr/archives/200501/msg00089.html

Gina Belmonte announced:
It is my pleasure to announce camlusb, which is now in beta testing and is 
available at

http://camlusb.sourceforge.net

This project allows one to write user space linux usb drivers and applications 
using the functional programming language Ocaml. It is used in GNUDap

http://gnudap.sourceforge.net

an interface to some mp3 player and mass storage devices with a proprietary 
usb protocol.

At the moment, camlusb supports all functions of libusb 0.1.7. If I will get 
enough requests I will try to upgrade to the latest version.
    

glibc and ocaml/libasmrun.a

Archive: http://caml.inria.fr/archives/200501/msg00092.html

Radu Grigore said:
I have trouble distributing a self contained executable.

Two users reported "collect_events: /lib/i686/libc.so.6: version
`GLIBC_2.3' not found". I do not have any control over their system,
so I tried to link statically against glibc by using:

  ocamlopt -ccopt -static collect_events.ml

I get:

  /usr/local/lib/ocaml/libasmrun.a(unix.o)(.text+0x241): In function
`caml_dlopen':
  warning: Using 'dlopen' in statically linked applications requires
at runtime the shared libraries from the glibc version used for
linking

At first I thought that my program uses a library function whose
implementation explicitly tries to load dynamically. But the same
warning is obtained if collect_events.ml is an empty file. It looks
like the interpreter itself wants to link dynamically against glibc.
Is this a correct diagnostic? Is there a way around it?

Other info: 
  ocaml version 3.08.1
  my system: Fedora Core 1
  user system: (probably[0]) RedHat 7
    
Olivier Andrieu answered and Xavier Leroy explained:
> This means the runtime (libasmrun.a) contains calls to dlopen(). Now
> I'm not sure why there are dlopen's in the native (ocamlopt)
> runtime. I thought only the bytecode one is using dlopen (to
> dynamically link stub libraries). If that is correct (ie the native
> runtime doesn't actually use dlopen) then I guess you can ignore the
> warning. 

This is entirely correct.  The reason why the native-code runtime
system contains the wrappers around dlopen() is that there exists
exactly one ocamlopt-compiled program that needs them: it's ocamlc.opt,
the natively-compiled bytecode compiler, which needs to dlopen()
shared libraries to check for the existence of symbols referenced from
external declarations in the Caml source.  

So, unless the Caml program happens to embark most of ocamlc
(unlikely!), you can safely ignore the warning at static linking time.
    

Mutex and posix

Archive: http://caml.inria.fr/archives/200501/msg00098.html

Luca Pascali asked and Xavier Leroy answered:
> Just a little question, my curiosity about the thread module.
> 
> I found in Posix (this is from 'info libc' page on section Mutexes) 
> these three functions
> 
> Function: int pthread_mutex_lock (pthread_mutex_t *mutex))
> Function: int pthread_mutex_trylock (pthread_mutex_t *MUTEX)
> Function: int pthread_mutex_timedlock (pthread_mutex_t *MUTEX, const 
> struct timespec *ABSTIME)
> 
> 1) for waiting indefinetly for a mutex,
> 2) failing immediatly if a mutex is locked,
> 3) wait for a specified amount of time and failing if mutex is still 
> locked when time is expired
> 
> Module Mutex, provides an interface only to the first two functions: 
> lock and try_lock.
> 
> My question is:
> is there any reason for this situation?

The latter is a recent addition to the POSIX threads API -- it's not
in the original POSIX threads spec (POSIX 1003.1c-1995).  I wouldn't
rely on this function being available in all POSIX threads
implementations.
    

ocamldap-2.0.3

Archive: http://caml.inria.fr/archives/200501/msg00124.html

Eric Stokes announced:
This is announcing the immediate release of ocamldap v2.0.3. Because of 
my busy schedule I did not announce the release of 2.0, however this 
release seems to be "stable", and I have more time, so I will announce 
it now.

ocamldap 2.0 features/changes
The Good
-- Another C binding bites the dust! Ocaml implementation of the wire 
protocol, implements the lber subset of ber (useful in itself for 
anyone working on ASN.1 stuff), no a C binding! I am very excited about 
this, it was fun code to write, and I am pleased with the performance 
given the time we spent. Some of you may recall that we promised to do 
this about 6 months ago, and lo, we have delivered.
-- The most exciting consequence of the native implementation of the 
protocol is that we now are not only a client library. We also provide 
a library for constructing LDAP SERVERS! We are very excited about 
this, and we already have a daemon based on this in the late stages of 
development. 
-- Optional SSL and TLS support via Samuel Mimram's ocaml-ssl binding 
(requires openssl)

The Bad
-- The native implementation is not as fast as the C implementation. It 
IS within an order of magnitude on all tested platforms, and we are 
confident that we can do better. Coming close to the C implementation 
may be possible.

The Ugly
-- Unfortunately, we were forced for the sake of the future to make 
some API changes. So Ocamldap 2.x is NOT backward compatible with 1.x. 
The changes are not extensive however. The exception LDAP_Failure has 
been augmented to include all the error information sent back by the 
server. This is especially important for processing referrals, for 
which there was no support in 1.x. The module names have been 
reorganized and standardized, hopefully they will now be less 
confusing. The init function now takes a list of servers instead of 
just one, and each server is an ldap url instead of a host/port. The 
use of this is that init will try each server in the list, and if 
host-names with more than one IP are included it will try each IP 
(round robin DNS). This integrates with the transparent reconnection 
capabilities to allow the user to configure a pool of ldap servers to 
talk to, and if any ONE is working then the library will continue to 
work.

License
Since all the code has been rewritten it is now released under the LGPL 
instead of the BSD license

Obtaining Ocamldap
best obtained via godi
also available from http://ocamldap.sourceforge.net/
    

Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance

Archive: http://caml.inria.fr/archives/200501/msg00110.html

Will M. Farr said:
I've been looking at using ocaml to implement a gravitational n-body  
code, and therefore have quite a bit of interest in its floating-point  
performance.  Also, I'm learning the language by playing around with  
simple programs.  Here's an implementation (really 4) along with timing  
information of the "harmonic" benchmark (essentially summing the  
harmonic series), which can be found here:

http://shootout.alioth.debian.org/sandbox/benchmark.php?test=harmonic&lang=all&sort=cpu

After testing different ways of implementing the ocaml harmonic  
benchmark, I have settled on the following program.  For sizes of 1 000  
000 000 terms, it takes about 25% longer than the corresponding  
algorithm in c (note that I have replaced an int->float conversion for  
each term with a single floating point operation: ifloat := ifloat +.  
1.0).  Since int->float conversions are slow on my machine (PowerBook  
G4), this is a big win (about a factor of 2 in time for the C program).  
  Alas, when the number of terms approaches 16 digits, this method will  
lose accuracy, since <~16-digit number> +. 1.0 = <16-digit number +  
difference in last bit of mantissa>.  However, for sizes like the  
shootout seems to be using, this algorithm works fine (and the usual  
int type won't hold anything close to 16 digits anyway!).  I'm cc-ing  
this to the caml list because there may be people there interested in  
the floating point performance of Caml

Here's the code for the fastest implementation:

let sum_harmonic4 n =
   let sum = ref 1.0 in
   let ifloat = ref 2.0 in
     for i = 2 to n do
       sum := !sum +. 1.0/.(!ifloat);
       ifloat := !ifloat +. 1.0
     done;
     !sum;;

let _ =
   let n = int_of_string (Sys.argv.(1)) in
     Printf.printf "%g\n" (sum_harmonic4 n);;

And here's all the implementations I tried (for those interested in  
such things with ocaml):

let sum_harmonic n =
   let rec loop i sum =
     if i <= n then
       loop (i + 1) (sum +. 1.0/.(float_of_int i))
     else
       sum in
     loop 2 1.0;;

let sum_harmonic2 n =
   let sum = ref 1.0 in
   for i = 2 to n do
     sum := !sum +. 1.0/.(float_of_int i)
   done;
     !sum;;

let sum_harmonic3 n =
   let rec loop i ifloat sum =
     if i <= n then
       loop (i + 1) (ifloat +. 1.0) (sum +. 1.0/.ifloat)
     else
       sum in
     loop 2 2.0 1.0;;

let sum_harmonic4 n =
   let sum = ref 1.0 in
   let ifloat = ref 2.0 in
     for i = 2 to n do
       sum := !sum +. 1.0/.(!ifloat);
       ifloat := !ifloat +. 1.0
     done;
     !sum;;

let _ =
   let n = int_of_string (Sys.argv.(1)) in
     Printf.printf "%g\n" (sum_harmonic4 n);;

The timings for my machine (PowerBook G4, 800 Mhz) are as follows:

time ./harmonic 1000000000:
harmonic: 	user    2m1.590s
			sys     0m0.790s

harmonic2: 	user    2m0.340s
			sys     0m0.440s

harmonic3: 	user    1m44.350s
			sys     0m0.740s

harmonic4: 	user    1m12.680s
			sys     0m0.430s

Each invocation was compiled with "ocamlopt -unsafe -noassert -o  
harmonic harmonic.ml".  It looks like using references and loops is *by  
far* the fastest (and also that my PowerBook is pretty slow to convert  
int->float, but I don't think this is related to ocaml, since the C  
version does the same thing).

Hope you all find this interesting.
    
Many people replied. Here is what Xavier Leroy said:
> Here's an implementation (really 4) along with timing  
> information of the "harmonic" benchmark (essentially summing the  
> harmonic series) [...]
> Here's the code for the fastest implementation:

The following slight modification of your code generates asm code that
is closest to what a C compiler would produce:

let sum_harmonic5 n =
  let sum = ref 1.0 in
  let ifloat = ref 2.0 in
    for i = 2 to n do
      sum := !sum +. 1.0/.(!ifloat);
      ifloat := !ifloat +. 1.0
    done;
    !sum +. 0.0;;

The + 0.0 at the end is ugly but convinces ocamlopt that !sum is best
kept unboxed during the loop.

> (note that I have replaced an int->float conversion for  
> each term with a single floating point operation: ifloat := ifloat +.  
> 1.0).  Since int->float conversions are slow on my machine (PowerBook  
> G4)

Right, the PowerPC does not have an int -> float instruction and that
conversion must be performed with a rather expensive sequence of
instructions (for the gory details, see e.g.
 http://the.wall.riscom.net/books/proc/ppc/cwg/code3.html#303610).

64-bit PPCs have a dedicated instruction to do this conversion,
showing that the IBM/Motorola people learn from their past mistakes...

For Intel processors, it's the reverse conversion (float -> int) that
is slow.  Clearly, the SPEC benchmark doesn't contain much conversions
between floats and ints, otherwise hardware designers would pay more
attention :-)

> this is a big win (about a factor of 2 in time for the C program).  

As others have mentioned, this strongly depends on the processor
instruction set and even on the processor model.  My own benchmarks
(with your Caml code) give the following results:

PPC G4 (Cube)   1 < 2 < 3 < 4 < 5   speed ratio = 1.5
Xeon 2.8        3 < 4 < 1 = 2 < 5   speed ratio = 1.02
Pentium 4 2.0   3 < 1 < 2 < 4 < 5   speed ratio = 1.2
Athlon XP 1.4   4 < 5 < 3 < 1 < 2   speed ratio = 2.2

where 1, 2, 3, 4, 5 refer to the 5 different functions,
1 < 2 means "1 is slower than 2",
and "speed ratio" is the speed difference between fastest and slowest.

The Xeon case is what I was expecting: the running time is dominated by
the time it takes to do the float divisions, everything else is done in
parallel or takes negligible time, so it doesn't matter much how you
write the code.

The Athlon figures are *very* surprising.  It could be the case that
this benchmark falls into a quirk of that (otherwise excellent :-)
processor.  

Actually, this often happens with micro-benchmarks: they are so small
and their mix of operations is so unbalanced that they can easily run
into weird processor behaviors.  So, don't draw conclusions hastily.

John Prevost asks:

> Is the PowerPC ocamlopt back-end less optimized than the x86?

No, not really.  The x86 back-end works harder to work around oddities
in the x86 instruction set (e.g. the lack of floating-point
registers), but that is hardly an optimization, just compensating for
brain damage in the instruction set.  Conversely, the PPC back-end
performs basic-block instruction scheduling while the x86 back-end doesn't.
Instruction scheduling helped with early PPC chips (601, 603) but is
largely irrelevant with modern out-of-order PPC implementations.

> Are you sure that there isn't just a built-in 
> instruction on the x86 that adds an int to a float?

I think there exists one such instruction, but ocamlopt doesn't use
it, and the Intel optimization manuals recommend to do int->float
conversion followed by float addition instead.
    

OCaml can be used to write kernel modules into KOS

Archive: http://caml.inria.fr/archives/200501/msg00146.html

David Mentre said:
For the fun, I have ported the asmrun/ directory of OCaml to a small
operating system, KOS (http://kos.enix.org/). This allows to run native
mode OCaml programs (i386 arch) as KOS kernel modules.

The binding is rather limited, it is more a quick & dirty hack:

 - no I/O;

 - no threads;

 - no access to KOS ressources;

 - no floats;

 - no int64;

 - due to lack of realloc, I have hacked the GC code and I'm pretty sure
   I have overlooked details;

 - other missing things I have forgotten...

Most of those missing functionalities could be implemented, either by
stealing code from glibc/gcc or by writing emulation code.  However I do
not plan to do that job. OCaml badly supports concurrent accesses due to
locking in GC (if I have understood correctly) so it can be of dubtious
use in an operating system. Note however that, as written, this code
could be used to write several /independent/ modules written in ocaml.

I have put modified source code at:
 http://www.linux-france.org/~dmentre/kos/module_ocaml_asmrun.tar.gz
(asmrun license: LGPL)

This tarball includes a diff that documents changes I've made on
original ocaml source code. It could be of some use if somebody wants to
do the same hack in another operating system.
    

ulex+ocamlyacc

Archive: http://caml.inria.fr/archives/200501/msg00134.html

Anastasia Gornostaeva asked and Alain Frisch answered:
> How I can combinate ocamlyacc and ulex (http://www.cduce.org/download/)
> instead of *.mll?

As long as you don't use the symbol_start-like functions from Parsing, you
can do something like:

  Parser.main (fun _ -> Lexer.token lexbuf) (Lexing.from_string "dummy")

(The ocamlyacc generated parser don't really use themselves the lexbuf 
argument...)
    

crypto library for OCaml?

Archive: http://caml.inria.fr/archives/200501/msg00160.html

Vasili Galchin asked and William D. Neumann answered:
> Is there a cryptography library for OCaml?

There are a couple in the humps 
http://caml.inria.fr/humps/caml_Cryptography.html , including Xavier's 
cryptokit and an open-ssl binding.  I've also got my own update of 
cryptokit with some additions (SHA 256/384/512, hash chains, additional 
prime/random number generation routines, etc), and modifications to work 
with GMP (via David Monniaux's mlgmp) that I'd be happy to send your way.
    

minimum system requirements

Archive: http://caml.inria.fr/archives/200501/msg00155.html

Nick asked:
Does anyone know what the absolute minimum system requirements are to
run OCaml (in terms of processor speed, RAM and disk space?

Is anyone aware of efforts to run OCaml on embedded linux platforms?
It seems that if this was possible, there could be some very nice
applications out there...
    
Alex Baretta answered:
We do embedded-Ocaml on fairly powerful embedded PCs, such as ULV 
Centrinos with 512 MB of RAM and 512 MB of flash disk.
    
Chris King also answered:
I've written, compiled, and run O'Caml programs comfortably on a 486DX
(33MHz maybe?) with 8MB of RAM.  The compiler/interpreter takes up a
non-trivial amount of hard disk space (around 30MB), but since the
natively compiled executables are statically linked with the O'Caml
libraries, they can be distributed without any more dependencies than
that of a C binary.  At runtime, O'Caml programs generally use around
half a megabyte of RAM, which is not much more than an equivalent C
program.
    
Eric C. Cooper also answered:
I've run the OCaml interpreter, and native-compiled executables, on
the 400 MHz ARM processor in a Sharp Zaurus, with about 32 MB of RAM.
    

Using folding to read the cwn in vim 6+

Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'&lt;1':1
zM

If you know of a better way, please let me know.


Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.


Alan Schmitt