Caml Weekly News

Previous week Up Next week
Hello,

Here is the latest Caml Weekly News, week 14 to 21 January, 2003.

1) Introduzione alla programmazione funzionale
2) ChartPak - a data visualization library for the web
3) Coyote Gulch test in Caml

======================================================================
1) Introduzione alla programmazione funzionale
----------------------------------------------------------------------
Carla Limongelli said:

In the previous message we have forgotten some unessential information ;-)
the title of the book is "Introduzione alla programmazione funzionale"
and it is published by Esculapio.
The table of contents can be found at
http://www.dia.uniroma3.it/~lambda/libro/

======================================================================
2) ChartPak - a data visualization library for the web
----------------------------------------------------------------------
Matt Gushee announced:

I am pleased to announce the release of ChartPak 1.0a1.

>From the README:

  The primary goal of this project is to provide an easy-to-use library
  for dynamically generating business-oriented data visualizations for
  the web. It will include a wide variety of common chart types (pie
  charts, bar charts, etc.), and may eventually provide support for more
  specialized types of graphics.

  Ultimately it should be possible for users with only modest technical
  skills to create a set of data displays with ChartPak. It will be some
  time before that goal is reached, though. The initial focus is on
  developing a substantial library of chart types and supporting a
  variety of data sources.

  The current release includes a nice demo application, but is unlikely
  to be of any real use unless you happen to run PostgreSQL and to need
  only simple pie charts.

I would also add that this is my first project in OCaml, so I am sure
there is much room for improvement in my code. Your suggestions are most
welcome.

For more information and downloads, please visit:

  http://www.havenrock.com/software/chartpak/

======================================================================
3) Coyote Gulch test in Caml
----------------------------------------------------------------------
(the start of the following thread is at
http://caml.inria.fr/archives/200301/msg00009.html and was discussed in
last week cwn)
Xavier Leroy discussed:

> On Saturday 04 January 2003 01:31 pm, Xavier Leroy wrote:
> > Apparently, the ocamlopt-generated code
> > offers less instruction-level parallelism than the g++-generated code
> > for the float computations.  Still, I haven't really understood where
> > the factor of 2 comes from.  

Oleg asks:

> It's been a couple of weeks. I'm wondering if you got any new insights into 
> this?

Yes: I'm just back from a trip to the US and had plenty of time to
kill during the transatlantic flights :-)

Apparently, one cause of inefficiency is excessive storing of
float results in memory temporaries.  The x86 is a wierd beast: while
loading floats from memory is quite fast (almost as fast as using a
float already on the register stack), storing (the fstp instruction)   
seems to be quite expensive.

Fortunately, a small modification to the ocamlopt x86 code generator 
can remove many of these stores to temporaries in the case of
the Almabench test.   With this modification, the OCaml code runs at
2/3 the speed of the code generated by g++ -O3, which is still not
great but more in-line with previous numerical benchmarks.

I also played with a "-ffast-math" flag for ocamlopt, whereas some math
functions (sin, cos, sqrt, log) are directly expanded into x86
instructions.  With this, we get 85% of the performance of g++ -O3,
which isn't bad, and 2/3 of the performance of g++ -O3 -ffast-math.

At any rate, the changes above to the OCaml code generator need to be
tested more before possible inclusion in the next release.  Never
trust code that you wrote in an airplane, especially while fighting
for the armrest with an elderly central European lady who doesn't
understand any of the languages that you speak :-)

> Just as wild guess: the code contains calls to "sin" and "cos" on the same 
> value. Perhaps GCC manages to optimize those into one call to "sincos"

No, gcc doesn't do that.  But perhaps the Intel compiler does.

David Chase warns:

> Just a silly question, but if you want sin and cos to go faster,
> how much accuracy are you willing to trade away for improved
> performance?  Just for example, by using the Pentium instructions,
> you reduce the number of (accurate) significant bits in the result
> from 53 (IEEE double) to 13 (for some inputs between zero and 2*PI).
> (If you are using 64-bit mantissas, the worst case is only 4 bits of
> accuracy.)

I didn't know that.  At any rate, the sin() and cos() functions from
the Linux libm probably suffer from this loss of precision too,
because they are of the following form:

cos:    fcos instruction
        if operand was in the [-2^64,2^64] range, return
        reduce operand modulo 2pi
        fcos instruction
        return

Hence, using fcos rather than calling cos() should give the same (not
very exact) result as long as the operand is in the [-2^64,2^64] range,
and return a nonsensical result otherwise.

Nickolay Semyonov-Kolchin asks:

> But then this brings up the issue of conformity vr.s performance.  For
> example- the x86 has its 80-bit FP registers in 8087-legacy mode, but
> 64-bit registers if you're using SSE2.  And PowerPC and PA-RISC both have
> extended precision fused multiply-adds (that keep higher precision, i.e.
> don't round, between the multiply and the adds).

ocamlopt uses 80-bit floats for intermediate results on the x86, and
the multiply-add instruction on the PowerPC.  It is true that this can
cause the final results to differ from those of the bytecode compiler,
which uses strict 64-bit float arithmetic, but I believe this is
acceptable, both for the additional speed and because the result is
"more exact" from a numerical analysis standpoint.

> For that matter, could a 
> "conforming" implementation of Ocaml use the 32-bit single precision SSE-1 
> registers?

Using single-precision FP is questionable because of the important
loss in precision.  However, SSE-2 supports double precision
arithmetic on SSE registers, and that could be an adequate target for
ocamlopt-generated code.  I plan to experiment with this soon.

======================================================================
Old cwn
----------------------------------------------------------------------

If you happen to miss a cwn, you can send me a message
(alan.schmitt@inria.fr) and I'll mail it to you, or go take a look at
the archive (http://pauillac.inria.fr/~aschmitt/cwn/). If you also wish
to receive it every week by mail, just tell me so.

======================================================================

Alan Schmitt