Hello
Here is the latest OCaml Weekly News, for the week of February 02 to 09, 2016.
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00007.html
Jeremie Dimino continued this old thread:Just to follow up, the addition of `.syntax` packages was released this week: https://github.com/ocaml/opam-repository/pull/5523ygrek then asked and Jeremie Dimino replied:
> I wonder if keeping empty sexplib.syntax subpackage with only > requires="pa_sexp_conv" line > will prevent build breakage (one will only need to fix opam dependencies)? That would work for now. However, sooner or latter some change in sexplib will break the code generated by pa_sexp_conv and we'll have the same problem again. The current forced upgrade is simple (one sed command [1]) and it's a hint that it's a good time to switch to ppx. We published a tool to help with this BTW [2] [1] https://github.com/ocaml/opam-repository/blob/master/CHANGES.md [2] https://blogs.janestreet.com/converting-a-code-base-from-camlp4-to-ppx https://github.com/janestreet/camlp4-to-ppx
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00008.html
Christoph Höger asked:Seems this problem bugs me forever... I have a scenario, where some C-stubs call back into OCaml, so the stack (growing downwards) looks like this: caml_foo ... /* OCaml control ends here */ ... c_stub ... c_library_routine (user_data; &callback_wrapper, ...) ... c_callback_wrapper(user_data; ...) /* OCaml controlled again */ caml_callbackN(user_data->closure, ....) The pattern should be somewhat familiar to anyone who has ever combined C and OCaml in a non-trivial way: I store the actual ocaml callbacks in the heap-allocated user_data structure, which is guaranteed to be passed to the callbacks by the c-library. The callback wrapper then wraps all arguments and uses caml_callback on these closures. The callback wrapper is a pure C-routine, i.e. no CamlParamN or CamlLocalN macros are used here. But still, the garbage collector runs amok. After a while I see the user_data pointer set to 0x0! How can that be, how can it overwrite a stack value of a calling C-routine? Here is the real callback: void g(void* user_data, double t, double const *x, double *g) { const struct interface_data* data = ((struct interface_data*)user_data); static long count = 0; count++; /* Wrap the values in fresh big arrays */ value ml_t = caml_copy_double(t); value ml_x = caml_ba_alloc_dims(CAML_BA_FLOAT64 | CAML_BA_C_LAYOUT, 1, (double*)x, data->qs->n); value ml_g = caml_ba_alloc_dims(CAML_BA_FLOAT64 | CAML_BA_C_LAYOUT, 1, g, data->qs->mc); /* call the OCaml callback */ caml_callback3(data->g, ml_t, ml_x, ml_g); } Is there anything obvious, I am doing wrong?Jeremie Dimino replied:
You need to register [ml_t], [ml_x] and [ml_g] as GC roots. Otherwise if the GC runs in caml_ba_alloc for instance, [ml_t] might ends up containing garbage even before reaching [caml_callback3]. You can use the normal macros for that: void g(...) { CAMLparam0(); CAMLlocal3(ml_t, ml_x, ml_g); ... CAMLreturn0; } Note that &(user_data->g) must be a GC root as well. Are you registering & (user_data->g) with the GC in any way?Malcolm Matalka then asked, David Sheets replied and Jeremy Yallop added:
>> Malcolm Matalka: >> >> If one is using ctypes, is all of this taken care of? I have a library >> that registers a bunch of Ocaml functions in C code, which the C code >> calls. I haven't experienced anything bad happening yet, but that >> doesn't mean much... > > David Sheets: > > If you use ctypes and pass OCaml closures to C, you *must* retain a > reference to the closure to avoid it being GCed. If you do not, you > may experience the exception CallToExpiredClosure sporadically. Jeremy Yallop: Besides David's caveat, the answer is yes: ctypes will take care of registering arguments as GC roots as necessary.Malcolm Matalka then asked and Jeremy Yallop replied:
> Can you clarify this a bit? I'm not that familiar with how the C FFI > works. If I pass in a closure to a C function and it is registered as a > GC root, doesn't that mean it won't be GCd if my Ocaml program forgets > about it or? That's how roots behave, yes: while a value is registered as a root, the value won't be collected. There are (roughly speaking) two types of root in OCaml: local roots, which persist for the duration of a function call, and global roots, which persist until explicitly released. A C function binding written by hand must ensure that OCaml values passed to it as arguments are registered as local roots, so that if a collection occurs while the function is running the values won't be prematurely collected. A C binding written using ctypes can generally ignore the matter of roots. That's partly because ctypes takes care of root registration, but also because most types passed between OCaml and C in a ctypes binding are C values, not OCaml values. For example, if you want to pass a structure with several fields between OCaml and C there are two approaches. One approach is to represent the structure as an OCaml record, which involves accessing the fields of the value in your C binding using various macros, taking care to register values as roots to protect them from the GC. The other approach is to represent the structure as a C struct, which involves accessing its fields in OCaml using the functions ctypes provides. (If you enjoy programming in an untyped dialect of C with ubiquitous concurrency, you'll probably favour the first approach. If you prefer programming in OCaml then the second approach might have some appeal.) Using the C value representation for values that cross the C-OCaml boundary generally works well, but when things become higher-order, the situation changes a bit. When a C library expects to be given a first-order value such as a struct we have to give it a struct with the appropriate layout, since C functions can directly access the representation of values. However, when the library expects a function pointer we have a bit more freedom, since the representation of functions isn't accessible -- in fact, the only thing that can be done with a function pointer, besides passing it from place to place, is calling it. This freedom means that we can pass an OCaml function, suitably packaged up, where a C function pointer is expected. Passing OCaml functions to C as function pointers raises some interesting issues relating to object lifetime and the garbage collector. The main difficulty arises from the fact that once you pass a function pointer to a C library there's no way of knowing how long the library holds on to it: for example, the library might discard the function pointer when the call into the library returns, or it might store the function pointer in a global variable to be retrieved and called later. In order to prevent the associated function from being collected prematurely, some kind of action is needed on the OCaml side, whether registering a global root, or ensuring that the function is reachable from the OCaml program. > Also, David and I were talking about how to solve this on IRC. In my > specific case, callbacks are one-shot, which means I know they need to > be remembered until they are called then they can (possibly) be freed. > Is there a nice solution here? I'd prefer not to store them in some > other data structure and remove them later just to keep a reference > alive, if possible. Storing some kind of references to the functions in a place that the collector can see is essential to prevent the functions from being collected prematurely. The situation is the same whether you use ctypes or write bindings by hand. Storing the functions in a table, and removing them automatically after they're called is one approach. An alternative is to use the new Ctypes.Roots module, which will be available in the next release: https://github.com/ocamllabs/ocaml-ctypes/blob/182a9e64src/ctypes/ctypes.mli#L419-L435 > That is overhead I'd prefer to avoid, if possible. > I plan on having possibly hundreds of thousands of these callbacks alive > at any point in time. In that case it sounds like there'll be an overhead of up to a few megabytes.Malcolm Matalka then asked and Jeremy Yallop replied:
> Thank you for the thorough response. It seems like Ctypes.Roots might > solve my problem, although the URL gives me a 404. Oops. Here's a working URL: https://github.com/ocamllabs/ocaml-ctypes/blob/182a9e64/src/ctypes/ctypes.mli#L419-L434 > Do you have an estimation of when this will be released (or anything someone like > myself can do to help?) There are a few issues left to address before the next release: https://github.com/ocamllabs/ocaml-ctypes/milestones/ctypes%200.5 Thanks for the offer of help! Feedback on the Ctypes.Roots design would be appreciated. More generally, some of the outstanding issues might be fixable by a motivated beginner, e.g. https://github.com/ocamllabs/ocaml-ctypes/issues/316 https://github.com/ocamllabs/ocaml-ctypes/issues/267 https://github.com/ocamllabs/ocaml-ctypes/issues/106 >> In that case it sounds like there'll be an overhead of up to a few megabytes. > > Any suggestions for a datatype to use here? I do have an object that is > long lived that represents the event loop I'm integrating against, so I > can store anything I want in there. Last night I was really concerned > about storing this extra information in the loop, just seemed like a > waste, but in the morning light I'm less worried about it. I could just > use a Hashtbl I guess with some reference to the closure. My current > idea is to make some integer value and wrap the closure up in something > like: > > (fun () -> Hashtbl.remove t id; closure ()) > > What kind of sucks about that is the wrapper needs to be unique to each > type of closure that gets called, there doesn't seem like a really > generic way to do this wrapping. Am I on the wrong track? If you're only storing the closures in the table, and don't need to retrieve and call them, then you can use a type which hides the type of the closure in some way. One approach is to use Obj.t, and convert each closure using Obj.repr as you store it. If using Obj makes you uneasy (as it generally ought to, although it's currently safe in this case) then an alternative is to use an existential type, like this: type t = T : _ -> t which allows you to wrap any type of value, regardless of its type: # [T (+); T not; T 3; T ""];; - : t list = [T <poly>; T <poly>; T <poly>; T <poly>]Christoph Höger replied to Jeremie Dimino’s first reply, and Jeremie Dimino then said:
>> void g(...) { >> CAMLparam0(); >> CAMLlocal3(ml_t, ml_x, ml_g); >> ... >> CAMLreturn0; >> } > > I tried this before, but it seemed like the GC would still write into > the arguments here. Is the semantics of CAMLparam0 that I might have > additional, unmanaged arguments? CAMLparam0 is for when you have no [value] arguments but have some local [value] variables. CAMLlocalN must be preceded by a CAMLparamY, that's why you need the CAMLparam0. >> Note that &(user_data->g) must be a GC root as well. Are you >> registering &(user_data->g) with the GC in any way? > > Good question. It _is_ an argument to a function on the other side of > the stack, so in principle it is alive, but could the GC move it? Yes, it should be alive. However I imagine that in the main C stub, you have something like this: value blah(value g, ...) { ... user_data->g = g; ... } So if the GC ends up moving [g], it needs to know that it must update [user_data->g] to point to the new location. There are several solutions: - if [user_data] only exists for the duration of the [blah] function call, I would suggest to change [user_data->g] to be a [value*] instead: value blah(value g, ...) { CAMLparam1(g); ... user_data->g = &g; ... } - if it lives for longer, you need to register [user_data->g] as a global variable, as described in the manual (Rule 4 of http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html).Christoph Höger finally said:
> CAMLparam0 is for when you have no [value] arguments but have some local > [value] variables. CAMLlocalN must be preceded by a CAMLparamY, that's > why you need the CAMLparam0. Thanks. I wasn't sure about that, since I tried before and my stack values were overwritten (I assumed it was the GC at that time). > So if the GC ends up moving [g], it needs to know that it must update > [user_data->g] to point to the new location. Even more subtle. user_data was a pointer to the body of a custom allocated block. My huge mistake was to use the custom_val directly (i.e. I allocated n bytes and passed around the pointer to these n bytes). As it turns out, the GC may move a custom block by memcopying its body. So in that body I needed pointers to allocated data not the allocated data itself.
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00014.html
Jeremie Dimino announced:I am pleased to announce the 113.24.00 release of the Core suite. It has been 4 month and a half since the last release, so there are lots of new things in this release. Thankfully we now have a better release system and can go back to more frequent releases. All packages are now in opam and the API documentation is here: https://ocaml.janestreet.com/ocaml-core/113.24/doc/ [Note: if you run into trouble while running "opam upgrade", try "opam upgrade core" instead. A PR has been submitted to fix this] Notable changes: * This is the first release completely camlp4 free! Especially some problems with the initial release of our ppx rewriters have been fixed which make them more usable. As a side effect of moving from camlp4 to ppx_driver, compilation times have improved. From a fresh `opam init` in both case: $ time opam install -y core.113.00.00 [...] real 6m17.875s user 5m26.374s sys 1m10.039s $ time opam install -y core.113.24.00 [...] real 3m16.900s user 3m9.860s sys 0m53.335s * We started to handle API upgrades in a more disciplined way by using `[@deprecated]` attributes. This should help with incompatible changes in our libraries * Addition of a few ppx rewriters: - ppx_expect, a Cram like framework for OCaml - ppx_let, for monadic/applicative let bindings - ppx_sexp_message, similar to ppx_sexp_value, but more "message" oriented We are still making adjustments in ppx_let and ppx_expect so their usage might slightly change in the next release. More information can be found on the github project pages: - https://github.com/janestreet/ppx_expect - https://github.com/janestreet/ppx_let - https://github.com/janestreet/ppx_sexp_message The full changelog for this release can be found here: https://ocaml.janestreet.com/ocaml-core/113.24/CHANGES.html This is the full list of packages added/modified: - async - async_extended - async_extra - async_find - async_inotify - async_kernel - async_parallel - async_rpc_kernel - async_shell - async_smtp - async_ssl - async_unix - bignum - bin_prot - core - core_bench - core_extended - core_kernel - core_profiler - email_message - fieldslib - incremental - jenga - ocaml_plugin - patdiff - patience_diff - ppx_assert - ppx_bench - ppx_bin_prot - ppx_compare - ppx_conv_func - ppx_core - ppx_csv_conv - ppx_custom_printf - ppx_driver - ppx_enumerate - ppx_expect - ppx_fail - ppx_fields_conv - ppx_here - ppx_inline_test - ppx_jane - ppx_let - ppx_optcomp - ppx_pipebang - ppx_sexp_conv - ppx_sexp_message - ppx_sexp_value - ppx_type_conv - ppx_typerep_conv - ppx_variants_conv - ppx_xml_conv - re2 - rpc_parallel - sexplib - textutils - typerep - typerep_extended - variantslib
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00025.html
Matthieu Dubuget asked:I'm currently analysing a NTFS file-tree with a windows OCaml native application. This application is using: - Unix.{opendir,readdir,closedir} - and Unix.LargeFile.lstat The unix library of OCaml distribution is using ANSI variants of system functions. This is working fine until files or directories whose UTF-16 encoded name cannot be converted into the code page in use are reached. I'm about to write a small library to solve this problem: it would mimic the corresponding code from OCaml unix library, but using WIDE variants of microsoft system functions in the C stub instead of ANSI variants. Before going on: do you know of any library that already do this I could use? Thanks for any link.Alain Frisch replied:
The real solution is to fix OCaml so that it can interact properly with arbitrary filenames under Windows. See: https://github.com/ocaml/ocaml/pull/153 http://caml.inria.fr/mantis/view.php?id=3771 The basic idea is that filenames are represented by OCaml strings representing an utf-8 encoding of the actual filename. To reduce code breakage, a fallback interprets strings that are invalid utf-8 sequences using the current code page. But this is still a rather intrusive change, since filenames received from readdir are always utf-8 encoded, which can break existing code. (One could imagine providing two variants of readdir to smooth the migration path.) Any help reviewing and testing the PR above would be very much appreciated!Bob Atkey also replied:
I wrote a little C binding to do pretty much what you are asking: https://github.com/ContemplateLtd/filesystem-wrapper My motivation was to be able to support long filenames (> 240 chars) on Windows, but this entails using the wide versions of the filesystem functions. I based it on the patch that Alain posted a link to, but only supported the operations that we needed (openfile, opendir, readdir, closedir and is_directory). I also had to use an abstract type for pathnames to be able to handle the bizarre way that windows does long file names (you have to prefix the absolute name with '\\?\', as far as I can tell). There is a little Makefile that assumes you are cross-compiling from Linux with the Debian-packaged cross compiler. I am completely inexpert in Windows programming, so there are almost certainly bugs in it. It has been reasonably well tested with long filenames (we were doing static analysis of Java .class files, some of which are auto generated from XML Schemas and can have very long names), but I haven't tested it much on non-ASCII names. It converts back and forth between UTF-16 for Windows to UTF-8 for OCaml. As Alain says, the full solution would be to fix OCaml itself.
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00029.html
David Kaloper announced:I'd like to announce the first release of Notty, a terminal library. The blurb: ``Notty is a declarative terminal library for OCaml structured around a notion of composable images. It tries to abstract away the basic terminal programming model, and provide one that is simpler and more expressive. The core layout engine and IO codecs are pure platform-independent OCaml. Distribution includes modules with input and output facilities for Unix, and Lwt on Unix. As an attempt to redefine terminal programming, Notty has to be "opinionated". It assumes Unicode throughout, does not have universal support for various terminals out there, and has a peculiar programming and rendering model.'' Feedback is welcome. Samples of unsupported input escapes are welcome too, and so are complaints about the rendering. Homepage: https://github.com/pqwy/notty Docs: http://pqwy.github.io/notty Many thanks to Daniel Bünzli who helped discard an endless stream of bad ideas. Sadly, he's not omnipotent. Daniel's work is also what Notty's solid Unicode support builds upon.
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2016-02/msg00031.html
Benjamin Canou announced:We just released the source code of the exercise environment that OCamlPro developed for the first session of the OCaml MOOC. It includes the whole platform (code editor, toplevel, automatic grader), and a few demo exercises. You can download the source (mostly LGPLv2, some parts under GPLv2): https://try.ocamlpro.com/fun-demo/tryocaml-fun.tgz We also put a standalone demo in which you can try a few exercises: https://try.ocamlpro.com/fun-demo/tryocaml_index.html More details are available here: https://try.ocamlpro.com/fun-demo/
Thanks to Alp Mestan, we now include in the OCaml Weekly News the links to the recent posts from the ocamlcore planet blog at http://planet.ocaml.org/. David Mentré: The failures of Debian (and its derivatives) http://blog.bentobako.org/index.php?post/2013/01/28/The-failures-of-Debian-%28and-its-derivatives%29 Dario Teixeira: Library authors: Don't forget the examples! http://nleyten.com/post/2016/02/04/Library-authors-Don-t-forget-the-examples Functional Jobs: OCaml server-side developer at Ahrefs Research (Full-time) https://functionaljobs.com/jobs/8885-ocaml-server-side-developer-at-ahrefs-research
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.