OCaml Weekly News

Hello

Here is the latest OCaml Weekly News, for the week of February 02 to 09, 2016.

Package renamings for sexplib, bin_prot and a few other camlp4 syntax extensions
Save callbacks from OCaml to C
Core Suite 113.24.00
Looking for a windows ocaml UTF-16 encoded filename aware library
Notty 0.1.0
Source code for the OCaml-MOOC exercise environment
Other OCaml News

Package renamings for sexplib, bin_prot and a few other camlp4 syntax extensions

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00007.html

Jeremie Dimino continued this old thread:

Just to follow up, the addition of `.syntax` packages was released this week:
https://github.com/ocaml/opam-repository/pull/5523

ygrek then asked and Jeremie Dimino replied:

> I wonder if keeping empty sexplib.syntax subpackage with only
> requires="pa_sexp_conv" line
> will prevent build breakage (one will only need to fix opam dependencies)?

That would work for now. However, sooner or latter some change in
sexplib will break the code generated by pa_sexp_conv and we'll have the same
problem again.
The current forced upgrade is simple (one sed command [1]) and it's a hint
that it's a good time to switch to ppx. We published a tool to help with this
BTW [2]

[1] https://github.com/ocaml/opam-repository/blob/master/CHANGES.md
[2] https://blogs.janestreet.com/converting-a-code-base-from-camlp4-to-ppx
 
https://github.com/janestreet/camlp4-to-ppx

Save callbacks from OCaml to C

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00008.html

Christoph Höger asked:

Seems this problem bugs me forever...

I have a scenario, where some C-stubs call back into OCaml, so the stack
(growing downwards) looks like this:

caml_foo ...
/* OCaml control ends here */
...
c_stub ...
c_library_routine (user_data; &callback_wrapper, ...)
...
c_callback_wrapper(user_data; ...)
/* OCaml controlled again */
caml_callbackN(user_data->closure, ....)


The pattern should be somewhat familiar to anyone who has ever combined
C and OCaml in a non-trivial way: I store the actual ocaml callbacks in
the heap-allocated user_data structure, which is guaranteed to be passed
to the callbacks by the c-library.

The callback wrapper then wraps all arguments and uses caml_callback on
these closures.

The callback wrapper is a pure C-routine, i.e. no CamlParamN or
CamlLocalN macros are used here. But still, the garbage collector runs
amok. After a while I see the user_data pointer set to 0x0! How can that
be, how can it overwrite a stack value of a calling C-routine?

Here is the real callback:

void g(void* user_data, double t, double const *x, double *g) {
  const struct interface_data* data = ((struct interface_data*)user_data);
  static long count = 0;
  count++;

  /* Wrap the values in fresh big arrays */
  value ml_t = caml_copy_double(t);
  value ml_x = caml_ba_alloc_dims(CAML_BA_FLOAT64 | CAML_BA_C_LAYOUT, 1,
(double*)x, data->qs->n);
  value ml_g = caml_ba_alloc_dims(CAML_BA_FLOAT64 | CAML_BA_C_LAYOUT, 1,
g, data->qs->mc);

  /* call the OCaml callback */
  caml_callback3(data->g, ml_t, ml_x, ml_g);
}

Is there anything obvious, I am doing wrong?

Jeremie Dimino replied:

You need to register [ml_t], [ml_x] and [ml_g] as GC roots. Otherwise if the
GC runs in caml_ba_alloc for instance, [ml_t] might ends up containing garbage
even before reaching [caml_callback3]. You can use the normal macros for that:

void g(...) {
CAMLparam0();
CAMLlocal3(ml_t, ml_x, ml_g);
...
CAMLreturn0;
}

Note that &(user_data->g) must be a GC root as well. Are you registering &
(user_data->g) with the GC in any way?

Malcolm Matalka then asked, David Sheets replied and Jeremy Yallop added:

>> Malcolm Matalka:
>>
>> If one is using ctypes, is all of this taken care of?  I have a library
>> that registers a bunch of Ocaml functions in C code, which the C code
>> calls.  I haven't experienced anything bad happening yet, but that
>> doesn't mean much...
>
> David Sheets:
>
> If you use ctypes and pass OCaml closures to C, you *must* retain a
> reference to the closure to avoid it being GCed. If you do not, you
> may experience the exception CallToExpiredClosure sporadically.

Jeremy Yallop:

Besides David's caveat, the answer is yes: ctypes will take care of
registering arguments as GC roots as necessary.

Malcolm Matalka then asked and Jeremy Yallop replied:

> Can you clarify this a bit?  I'm not that familiar with how the C FFI
> works.  If I pass in a closure to a C function and it is registered as a
> GC root, doesn't that mean it won't be GCd if my Ocaml program forgets
> about it or?

That's how roots behave, yes: while a value is registered as a root,
the value won't be collected.   There are (roughly speaking) two types
of root in OCaml: local roots, which persist for the duration of a
function call, and global roots, which persist until explicitly
released.  A C function binding written by hand must ensure that OCaml
values passed to it as arguments are registered as local roots, so
that if a collection occurs while the function is running the values
won't be prematurely collected.

A C binding written using ctypes can generally ignore the matter of
roots.  That's partly because ctypes takes care of root registration,
but also because most types passed between OCaml and C in a ctypes
binding are C values, not OCaml values.  For example, if you want to
pass a structure with several fields between OCaml and C there are two
approaches.  One approach is to represent the structure as an OCaml
record, which involves accessing the fields of the value in your C
binding using various macros, taking care to register values as roots
to protect them from the GC.  The other approach is to represent the
structure as a C struct, which involves accessing its fields in OCaml
using the functions ctypes provides.  (If you enjoy programming in an
untyped dialect of C with ubiquitous concurrency, you'll probably
favour the first approach.  If you prefer programming in OCaml then
the second approach might have some appeal.)

Using the C value representation for values that cross the C-OCaml
boundary generally works well, but when things become higher-order,
the situation changes a bit.  When a C library expects to be given a
first-order value such as a struct we have to give it a struct with
the appropriate layout, since C functions can directly access the
representation of values.  However, when the library expects a
function pointer we have a bit more freedom, since the representation
of functions isn't accessible -- in fact, the only thing that can be
done with a function pointer, besides passing it from place to place,
is calling it.  This freedom means that we can pass an OCaml function,
suitably packaged up, where a C function pointer is expected.

Passing OCaml functions to C as function pointers raises some
interesting issues relating to object lifetime and the garbage
collector.  The main difficulty arises from the fact that once you
pass a function pointer to a C library there's no way of knowing how
long the library holds on to it: for example, the library might
discard the function pointer when the call into the library returns,
or it might store the function pointer in a global variable to be
retrieved and called later.  In order to prevent the associated
function from being collected prematurely, some kind of action is
needed on the OCaml side, whether registering a global root, or
ensuring that the function is reachable from the OCaml program.

> Also, David and I were talking about how to solve this on IRC.  In my
> specific case, callbacks are one-shot, which means I know they need to
> be remembered until they are called then they can (possibly) be freed.
> Is there a nice solution here?  I'd prefer not to store them in some
> other data structure and remove them later just to keep a reference
> alive, if possible.

Storing some kind of references to the functions in a place that the
collector can see is essential to prevent the functions from being
collected prematurely.  The situation is the same whether you use
ctypes or write bindings by hand.

Storing the functions in a table, and removing them automatically
after they're called is one approach.  An alternative is to use the
new Ctypes.Roots module, which will be available in the next release:

   https://github.com/ocamllabs/ocaml-ctypes/blob/182a9e64src/ctypes/ctypes.mli#L419-L435

> That is overhead I'd prefer to avoid, if possible.
> I plan on having possibly hundreds of thousands of these callbacks alive
> at any point in time.

In that case it sounds like there'll be an overhead of up to a few megabytes.

Malcolm Matalka then asked and Jeremy Yallop replied:

> Thank you for the thorough response.  It seems like Ctypes.Roots might
> solve my problem, although the URL gives me a 404.

Oops.  Here's a working URL:

https://github.com/ocamllabs/ocaml-ctypes/blob/182a9e64/src/ctypes/ctypes.mli#L419-L434

> Do you have an estimation of when this will be released (or anything someone like
> myself can do to help?)

There are a few issues left to address before the next release:

   https://github.com/ocamllabs/ocaml-ctypes/milestones/ctypes%200.5

Thanks for the offer of help!  Feedback on the Ctypes.Roots design
would be appreciated.  More generally, some of the outstanding issues
might be fixable by a motivated beginner, e.g.

   https://github.com/ocamllabs/ocaml-ctypes/issues/316
   https://github.com/ocamllabs/ocaml-ctypes/issues/267
   https://github.com/ocamllabs/ocaml-ctypes/issues/106

>> In that case it sounds like there'll be an overhead of up to a few megabytes.
>
> Any suggestions for a datatype to use here?  I do have an object that is
> long lived that represents the event loop I'm integrating against, so I
> can store anything I want in there.  Last night I was really concerned
> about storing this extra information in the loop, just seemed like a
> waste, but in the morning light I'm less worried about it.  I could just
> use a Hashtbl I guess with some reference to the closure.  My current
> idea is to make some integer value and wrap the closure up in something
> like:
> 
> (fun () -> Hashtbl.remove t id; closure ())
>
> What kind of sucks about that is the wrapper needs to be unique to each
> type of closure that gets called, there doesn't seem like a really
> generic way to do this wrapping.  Am I on the wrong track?

If you're only storing the closures in the table, and don't need to
retrieve and call them, then you can use a type which hides the type
of the closure in some way.  One approach is to use Obj.t, and convert
each closure using Obj.repr as you store it.  If using Obj makes you
uneasy (as it generally ought to, although it's currently safe in this
case) then an alternative is to use an existential type, like this:

   type t = T : _ -> t

which allows you to wrap any type of value, regardless of its type:

  # [T (+); T not; T 3; T ""];;
  - : t list = [T <poly>; T <poly>; T <poly>; T <poly>]

Christoph Höger replied to Jeremie Dimino’s first reply, and Jeremie Dimino then said:

>> void g(...) {
>> CAMLparam0();
>> CAMLlocal3(ml_t, ml_x, ml_g);
>> ...
>> CAMLreturn0;
>> }
> 
> I tried this before, but it seemed like the GC would still write into
> the arguments here. Is the semantics of CAMLparam0 that I might have
> additional, unmanaged arguments?

CAMLparam0 is for when you have no [value] arguments but have some local
[value] variables. CAMLlocalN must be preceded by a CAMLparamY, that's why you
need the CAMLparam0.

>> Note that &(user_data->g) must be a GC root as well. Are you
>> registering &(user_data->g) with the GC in any way?
>
> Good question. It _is_ an argument to a function on the other side of
> the stack, so in principle it is alive, but could the GC move it?

Yes, it should be alive. However I imagine that in the main C stub, you have
something like this:

value blah(value g, ...)
{
...
user_data->g = g;
...
}

So if the GC ends up moving [g], it needs to know that it must update
[user_data->g] to point to the new location.

There are several solutions:

- if [user_data] only exists for the duration of the [blah] function call, I
would suggest to change [user_data->g] to be a [value*] instead:

value blah(value g, ...)
{
 CAMLparam1(g);
 ...
user_data->g = &g;
...
}

- if it lives for longer, you need to register [user_data->g] as a global
variable, as described in the manual (Rule 4 of
http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html).

Christoph Höger finally said:

> CAMLparam0 is for when you have no [value] arguments but have some local
> [value] variables. CAMLlocalN must be preceded by a CAMLparamY, that's
> why you need the CAMLparam0.

Thanks. I wasn't sure about that, since I tried before and my stack
values were overwritten (I assumed it was the GC at that time).

> So if the GC ends up moving [g], it needs to know that it must update
> [user_data->g] to point to the new location.

Even more subtle. user_data was a pointer to the body of a custom
allocated block. My huge mistake was to use the custom_val directly
(i.e. I allocated n bytes and passed around the pointer to these n
bytes). As it turns out, the GC may move a custom block by memcopying
its body. So in that body I needed pointers to allocated data not the
allocated data itself.

Core Suite 113.24.00

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00014.html

Jeremie Dimino announced:

I am pleased to announce the 113.24.00 release of the Core suite.

It has been 4 month and a half since the last release, so there are
lots of new things in this release. Thankfully we now have a better
release system and can go back to more frequent releases.

All packages are now in opam and the API documentation is here:

https://ocaml.janestreet.com/ocaml-core/113.24/doc/

[Note: if you run into trouble while running "opam upgrade", try "opam
upgrade core" instead. A PR has been submitted to fix this]

Notable changes:

* This is the first release completely camlp4 free!

Especially some problems with the initial release of our ppx
rewriters have been fixed which make them more usable.

As a side effect of moving from camlp4 to ppx_driver, compilation
times have improved. From a fresh `opam init` in both case:

$ time opam install -y core.113.00.00
[...]
real 6m17.875s
user 5m26.374s
sys 1m10.039s

$ time opam install -y core.113.24.00
[...]
real 3m16.900s
user 3m9.860s
sys 0m53.335s

* We started to handle API upgrades in a more disciplined way by using
`[@deprecated]` attributes. This should help with incompatible
changes in our libraries

* Addition of a few ppx rewriters:

- ppx_expect, a Cram like framework for OCaml
- ppx_let, for monadic/applicative let bindings
- ppx_sexp_message, similar to ppx_sexp_value, but more "message" oriented

We are still making adjustments in ppx_let and ppx_expect so their
usage might slightly change in the next release.

More information can be found on the github project pages:

- https://github.com/janestreet/ppx_expect
- https://github.com/janestreet/ppx_let
- https://github.com/janestreet/ppx_sexp_message

The full changelog for this release can be found here:

https://ocaml.janestreet.com/ocaml-core/113.24/CHANGES.html

This is the full list of packages added/modified:

- async
- async_extended
- async_extra
- async_find
- async_inotify
- async_kernel
- async_parallel
- async_rpc_kernel
- async_shell
- async_smtp
- async_ssl
- async_unix
- bignum
- bin_prot
- core
- core_bench
- core_extended
- core_kernel
- core_profiler
- email_message
- fieldslib
- incremental
- jenga
- ocaml_plugin
- patdiff
- patience_diff
- ppx_assert
- ppx_bench
- ppx_bin_prot
- ppx_compare
- ppx_conv_func
- ppx_core
- ppx_csv_conv
- ppx_custom_printf
- ppx_driver
- ppx_enumerate
- ppx_expect
- ppx_fail
- ppx_fields_conv
- ppx_here
- ppx_inline_test
- ppx_jane
- ppx_let
- ppx_optcomp
- ppx_pipebang
- ppx_sexp_conv
- ppx_sexp_message
- ppx_sexp_value
- ppx_type_conv
- ppx_typerep_conv
- ppx_variants_conv
- ppx_xml_conv
- re2
- rpc_parallel
- sexplib
- textutils
- typerep
- typerep_extended
- variantslib

Looking for a windows ocaml UTF-16 encoded filename aware library

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00025.html

Matthieu Dubuget asked:

I'm currently analysing a NTFS file-tree with a windows OCaml native application.

This application is using:
- Unix.{opendir,readdir,closedir}
- and Unix.LargeFile.lstat

The unix library of OCaml distribution is using ANSI variants of system
functions. This is working fine until files or directories whose UTF-16
encoded name cannot be converted into the code page in use are reached.

I'm about to write a small library to solve this problem: it would mimic the
corresponding code from OCaml unix library, but using WIDE variants of
microsoft system functions in the C stub instead of ANSI variants.

Before going on: do you know of any library that already do this I could use?

Thanks for any link.

Alain Frisch replied:

The real solution is to fix OCaml so that it can interact properly with
arbitrary filenames under Windows. See:

https://github.com/ocaml/ocaml/pull/153
http://caml.inria.fr/mantis/view.php?id=3771

The basic idea is that filenames are represented by OCaml strings representing
an utf-8 encoding of the actual filename.  To reduce code breakage, a fallback
interprets strings that are invalid utf-8 sequences using the current code
page.  But this is still a rather intrusive change, since filenames received
from readdir are always utf-8 encoded, which can break existing code.  (One
could imagine providing two variants of readdir to smooth the migration path.)

Any help reviewing and testing the PR above would be very much appreciated!

Bob Atkey also replied:

I wrote a little C binding to do pretty much what you are asking:

  https://github.com/ContemplateLtd/filesystem-wrapper

My motivation was to be able to support long filenames (> 240 chars) on
Windows, but this entails using the wide versions of the filesystem functions.

I based it on the patch that Alain posted a link to, but only supported the
operations that we needed (openfile, opendir, readdir, closedir and
is_directory). I also had to use an abstract type for pathnames to be able to
handle the bizarre way that windows does long file names (you have to prefix
the absolute name with '\\?\', as far as I can tell).

There is a little Makefile that assumes you are cross-compiling from Linux
with the Debian-packaged cross compiler.

I am completely inexpert in Windows programming, so there are almost certainly
bugs in it. It has been reasonably well tested with long filenames (we were
doing static analysis of Java .class files, some of which are auto generated
from XML Schemas and can have very long names), but I haven't tested it much
on non-ASCII names. It converts back and forth between UTF-16 for Windows to
UTF-8 for OCaml.

As Alain says, the full solution would be to fix OCaml itself.

Notty 0.1.0

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00029.html

David Kaloper announced:

I'd like to announce the first release of Notty, a terminal library. The
blurb:

``Notty is a declarative terminal library for OCaml structured around
a notion of composable images. It tries to abstract away the basic
terminal programming model, and provide one that is simpler and more
expressive.

The core layout engine and IO codecs are pure platform-independent
OCaml. Distribution includes modules with input and output facilities
for Unix, and Lwt on Unix.

As an attempt to redefine terminal programming, Notty has to be
"opinionated". It assumes Unicode throughout, does not have universal
support for various terminals out there, and has a peculiar
programming and rendering model.''


Feedback is welcome. Samples of unsupported input escapes are welcome
too, and so are complaints about the rendering.

Homepage: https://github.com/pqwy/notty
Docs: http://pqwy.github.io/notty

Many thanks to Daniel Bünzli who helped discard an endless stream of
bad ideas. Sadly, he's not omnipotent.

Daniel's work is also what Notty's solid Unicode support builds upon.

Source code for the OCaml-MOOC exercise environment

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00031.html

Benjamin Canou announced:

  We just released the source code of the exercise environment that
OCamlPro developed for the first session of the OCaml MOOC.

It includes the whole platform (code editor, toplevel,
automatic grader), and a few demo exercises.

You can download the source (mostly LGPLv2, some parts under GPLv2):

  https://try.ocamlpro.com/fun-demo/tryocaml-fun.tgz

We also put a standalone demo in which you can try a few exercises:

  https://try.ocamlpro.com/fun-demo/tryocaml_index.html

More details are available here:

  https://try.ocamlpro.com/fun-demo/

Other OCaml News

From the ocamlcore planet blog:

Thanks to Alp Mestan, we now include in the OCaml Weekly News the links to the
recent posts from the ocamlcore planet blog at http://planet.ocaml.org/.

David Mentré: The failures of Debian (and its derivatives)
  http://blog.bentobako.org/index.php?post/2013/01/28/The-failures-of-Debian-%28and-its-derivatives%29

Dario Teixeira: Library authors: Don't forget the examples!
  http://nleyten.com/post/2016/02/04/Library-authors-Don-t-forget-the-examples

Functional Jobs: OCaml server-side developer at Ahrefs Research (Full-time)
  https://functionaljobs.com/jobs/8885-ocaml-server-side-developer-at-ahrefs-research

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt