OCaml Weekly News

Previous Week Up Next Week

Hello

Here is the latest OCaml Weekly News, for the week of April 16 to 23, 2019.

Table of Contents

Wrapping C++ std::shared_ptr and similar smart pointers

Manuel Hornung asked

I'm trying to create Reason/OCaml bindings for the Skia 2D graphics library. The library makes heavy use of smart pointers similar to `std::shared_ptr`s, but they called them `sk_sp`.

Now my first idea for wrapping these was using a regular pointer that points to a shared pointer. That would trigger the release of the memory behind the shared pointer as soon as the local variable containing the shared pointer goes out of scope though.

I found a solution that looked promising to me in https://github.com/ygrek/scraps/blob/master/cxx_wrapped.h but now I heard that reducing the refcount in the finalizer is also not a good idea. Unfortunately I don't know why that is not a good idea and I also don't have a better one.

Can anyone help me understand this better and point me towards a better approach?

Guillaume Munch-Maccagnoni replied

(Sorry for the delay as I have been busy.)

It all comes down to the fact that tracing and reference counting have different advantages and drawbacks, and the main difference for this question is that RC reclaims promptly, whereas tracing does not reclaim predictably; in addition OCaml is currently poor in terms of predictable resource management.

Smart pointers can be used to manage resources other than memory. (I mean smart pointers that implement deterministic reclamation of resources such as unique or reference-counted pointers; in principle smart pointers are not restricted in what they implement: delayed evaluation, roots for tracing GCs… such exotic pointers are out of the scope of my answer.)

First, you need to determine whether the pointer manages non-memory resources (the destruction closes a file, releases a lock, rolls back some state…). If so, using finalizers is a no-go, because you cannot predict when and in which order finalizers run, and in practice it can be way too late. When that is the case, skip 1). For instance I see that your library has some functions that return RAII guards; quite obviously these cannot be handled with finalizers.

1) Custom blocks with finalizer

If the smart pointer only manages memory, then it is possible to represent it with a custom block with a finalizer attached to it. The GC needs to know the size of what it manages, otherwise it will not work hard enough to reclaim memory and you can end up with a memory leak. This has occasionally been called “the familiar "allocation of custom objects mess up the speed of the major GC" problem”.

The situation is supposed to improve in OCaml 4.08, which introduces a new function caml_alloc_custom_mem that lets you specify the size of the memory managed by the custom block, which the GC's heuristics will take into account. (caml_alloc_custom also has parameters to tweak the GC speed but presumably this was not good enough as witnessed by the multiple bug reports referenced in that PR.)

So you can use as a source of inspiration @ygrek's wrapped pointer you have linked to above, but you must adapt it to tell the OCaml GC the size of the data your custom block contains.

Pros:

  • Expressive: the foreign data is abstracted as an OCaml value that can be passed around, inserted into data structures, etc.

Cons:

  • No-go for non-memory resources.
  • You need to know the size of what you are managing—there is no universal smart pointer wrapper!
  • Not so good for performance/scale or interoperability. Mixing tracing and RC cumulates the drawbacks of both; in particular you inherit the possible unbounded latency due to the upfront deallocation cost of RC (depending on your use-case), and you are even at a risk of creating cycles that are never collected if you mix this method with that one to store OCaml values on the foreign side.

These are some guaranteed theoretical drawbacks, but I imagine that there can be more practical implementation-specific issues (as witnessed by caml_alloc_custom vs caml_alloc_custom_mem). I do not have hands-on experience with custom blocks, and while researching for this answer, I found this usage not very well documented, so I hope that experts can fill-in the gaps and/or correct the above if needed.

2) Deterministic resource management

To avoid the impedance mismatch between smart pointers and the GC, you can rely on deterministic resource management. In OCaml, the idiomatic expression of it is to use “~with_~” wrappers based on unwind-protect [see the example of files]. OCaml 4.08 introduces Fun.protect, an implementation of unwind-protect suitable for OCaml.

Pros:

  • Predictable: can be used for non-memory resources.

Cons:

  • Lacks expressiveness: resources live for the exact duration of their defining scope, and are reclaimed in LIFO order.
  • Allows “use after free”: the resource can be referenced outside of its scope, if not careful.
  • Currently incompatible with asynchronous exceptions: OCaml does not currently allow an implementation of unwind-protect that protects from asynchronous exceptions being raised inside the finally clause.

3) Manual resource management

If neither 1) nor 2) fit the bill, you have to resort to manual resource management, in which the user has to call some free function explicitly (and gets an exception if they use it after free). It is “hard” to program correctly with manual resource management, moreso in the presence of exceptions. For this reason, people mix it with 1) and/or 2); for instance they use unwind-protect in a non-systematic manner, or they attach finalizers to act as a fallback, or both. While with 1) and 2) you are still within the realm of structured programming, with manual resource management you enter the realm of debugging-oriented programming—think programming in a weird dialect of old C++.

Pros:

  • Last resort solution

Cons:

  • Non-idiomatic code
  • Hard to program
  • Hard to reason about the code

Discussions with Serious Industrial OCaml Users a while ago (starting around POPL 2017 in Paris) have let appear OCaml's current issues with resource management. These discussion prompted a proposal for a resource management model for OCaml, inspired by RAII and move semantics from modern C++/Rust. In a nutshell, it aims to lift the expressiveness limitations of 2). Interoperability is probably its most important application.

OCaml 4.08.0+beta3

Damien Doligez announced

Dear OCaml users,

The release of OCaml 4.08.0 is approaching. We have created a third beta version to help you adapt your software to the new features ahead of the release.

The source code is available at these addresses:

https://github.com/ocaml/ocaml/archive/4.08.0+beta3.tar.gz
https://caml.inria.fr/pub/distrib/ocaml-4.08/ocaml-4.08.0+beta3.tar.gz

The compiler is (or will soon be) also available in OPAM with one of the following commands.

opam switch create ocaml-variants.4.08.0+beta3 –repositories=default,beta=git+https://github.com/ocaml/ocaml-beta-repository.git

or

opam switch create ocaml-variants.4.08.0+beta3+<VARIANT> –repositories=default,beta=git+https://github.com/ocaml/ocaml-beta-repository.git

where you replace <VARIANT> with one of these:
afl
default_unsafe_string
flambda
fp
fp+flambda

We want to know about all bugs. Please report them here: https://github.com/ocaml/ocaml/issues

Happy hacking,

– Damien Doligez for the OCaml team.

The changes from beta2 are the following:

  • GPR#1942, GPR#2244: simplification of the static check for recursive definitions (Alban Reynaud and Gabriel Scherer, review by Jeremy Yallop, Armaël Guéneau and Damien Doligez)
  • GPR#1354, GPR#2177: Add fma support to Float module. (Laurent Thévenoux, review by Alain Frisch, Jacques-Henri Jourdan, Xavier Leroy)
  • GPR#2202: Correct Hashtbl.MakeSeeded.{add_seq,replace_seq,of_seq} to use functor hash function instead of default hash function. Hashtbl.Make.of_seq shouldn't create randomized hash tables. (David Allsopp, review by Alain Frisch)
  • * PR#4208, PR#4229, PR#4839, PR#6462, PR#6957, PR#6950, GPR#1063, GPR#2176, GPR#2297: Make (nat)dynlink sound. (Mark Shinwell, Leo White, Nicolás Ojeda Bär, Pierre Chambart)
  • GPR#2317: type_let: be more careful generalizing parts of the pattern (Thomas Refis and Leo White, review by Jacques Garrigue)
  • MPR#6242, GPR#2143, MPR#8558, GPR#8559: optimize some local functions (Alain Frisch, review by Gabriel Scherer)
  • #7829, #8585: Fix pointer comparisons in freelist.c (for 32-bit platforms) (David Allsopp and Damien Doligez)
  • #8567, #8569: on ARM64, use 32-bit loads to access caml_backtrace_active (Xavier Leroy, review by Mark Shinwell and Greta Yorsh)
  • #8568: Fix a memory leak in mmapped bigarrays (Damien Doligez, review by Xavier Leroy and Jérémie Dimino)
  • MPR#7548: printf example in the tutorial part of the manual (Kostikova Oxana, rewiew by Gabriel Scherer, Florian Angeletti, Marcello Seri and Armaël Guéneau)
  • MPR#7547, GPR#2273: Tutorial on Lazy expressions and patterns in OCaml Manual (Ulugbek Abdullaev, review by Florian Angeletti and Gabriel Scherer)
  • GPR#8508: refresh \moduleref macro (Florian Angeletti, review by Gabriel Scherer)
  • MPR#7919, GPR#2311: Fix assembler detection in configure (Sébastien Hinderer, review by David Allsopp)
  • GPR#2295: Restore support for bytecode target XLC/AIX/Power (Konstantin Romanov, review by Sébastien Hinderer and David Allsopp)
  • GPR#8528: get rid of the direct call to the C preprocessor in the testsuite (Sébastien Hinderer, review by David Allsopp)
  • Issue #7938, GPR #8532: Fix alignment detection for ints on 32-bits platforms (Sébastien Hinderer, review by Xavier Leroy)
  • * GPR#8533: Remove some unused configure tests (Stephen Dolan, review by David Allsopp and Sébastien Hinderer)
  • GPR#2207,#8604: Add opam files to allow pinning (Leo White, Greta Yorsh, review by Gabriel Radanne)
  • MPR#7835, GPR#1980, GPR#8548, GPR#8586: separate scope from stamp in idents and explicitly rescope idents when substituting signatures. (Thomas Refis, review by Jacques Garrigue and Leo White)
  • #8550, #8552: Soundness issue with class generalization (Jacques Garrigue, review by Leo White and Thomas Refis, report by Jeremy Yallop)

Menhir and preserving comments from source

Chet Murthy asked

I've used ocamlyacc over the years a lot, and menhir in a couple of projects (including a big one I'm working on right now). I've also used camlp4/camlp5's stream-parsers in a ton of projects. And of course, with ocamllex and sedlexing. I find that with stream-parsers, it's easy to arrange for preserving lexical positions in tokens, and then carrying that across to the parse-tree. To wit,

...
type basic_token = ...... ;;
type token = basic_token * lexical_position_info_t ;;
...

and then in your stream parser, you pattern-match on the first component, e.g.

...
parser [< .... ; '(Tstring s, _) ; ... >] -> yadda yadda
...

But with menhir (and ocamlyacc) it seems like, you need to embed the lexical position info in the token, e.g.

...
type basic_token =
| Tstring of lexicai_position_info_t * string
| Tsemi of lexical_position_info_t
etc
...

Is there some trick I'm missing, for how to use camlyacc/menhir in a manner that allows preserving this positional information during the parse?

gasche replied

To have location/position information in the AST: the standard approach I'm familiar with is not to embed position information in the tokens, but to query it from the lexer or parser at the place where you build your AST values in the parser actions. When using ocamlyacc, I use the Lexing module for this (Lexing.lexeme_{start,end}_p), when using Menhir I use its special symbols ${start,end}pos, ${start,end}pos(n), $loc, $loc(n).

To preserve comments, an approach we use in the OCaml compiler (where comments that are docstrings are kept in the AST) is to have a global table of comments, that is filled by the Lexer, and accessed from parsing actions (there is a function that says basically "collect all the comments from the last time you were called to <this position>").

ppx_protocol_conv 5.0.0

Anders Fugmann announced

It is my pleasure to announce the release of Ppx_protocol_conv version 5.0.0.

Ppx_protocol_conv is a syntax extension to generate functions to serialize and de-serialize ocaml types. The ppx itself does not contain any protocol specific code, but relies on user defined 'drivers' to define serialization and de-serialiazation of basic types and structures.

The library comes with multiple pre-defined drivers:

  • ppx_protocol_conv_json (Yojson.Safe.json)
  • ppx_protocol_conv_jsonm (Ezjson.value)
  • ppx_protocol_conv_msgpack (Msgpck.t)
  • ppx_protocol_conv_xml-light (Xml.xml)
  • ppx_protocol_conv_yaml (Yaml.value)

The library is based on ppxlib and is is compatible with base v0.12. Release 5.0.0 is available through opam.

The project homepage is: https://github.com/andersfugmann/ppx_protocol_conv

The project's wiki pages contains some more information on how to use the library and existing drivers and on how to write you own drivers.

Noteworthy Change This release includes a major rewrite of the core of the library to allow more control by user supplied drivers over the serialization and de-serialization of types. These changes breaks backward compatibility.

The json driver (Ppx_protocol_conv_json) has been updated to be compatible with the serialization format of ppx_deriving_yojson, supporting both [@key], [@name] and [@default] attributes, and can be used as a replacement for ppx_deriving_yojson with few modifications.

Deserialization functions now returns a result type. Old support for exception type errors is available in functions with the _exn suffix. For a complete list of changes, see the Changelog.

As always, comments, suggestions and PRs are more than welcome.

Orsetto: structured data interchange languages (preview release)

james woodyatt announced

I have now released ~preview4 which resolves Issue #8 OCaml 4.07: the new Stdlib.Seq.t is functionally equivalent to Cf_seq.t. For OCaml 4.06, this introduces an external dependency on the seq compatibility package. I've also checked that documentary comments are available with odig, so this might be the last preview release before 1.0. (It depends on whether I decide to remove the support for the ppx_let syntax extension.)

james woodyatt then added

> It depends on whether I decide to remove the support for the ppx_let syntax extension.

I've thought about this, and I will not be removing support for the ppx_let syntax extension. I plan to deprecate it when OCaml 4.08 is released, but it will be retained while I continue supporting OCaml 4.06 and 4.07.

Searching for functions

Jordan Mackie announced

OCaml newbie here - coming from Haskell land out of curiosity.

I'm curious how you guys find your way around stdlib/packages etc?

Example: I'm writing a script and I want to lookup an environment variable. I know there's probably some function along the lines of get_env somewhere, so I'd like to know where it is and what type it has. In Haskell I'd do a hoogle search along the lines of https://www.stackage.org/lts-13.18/hoogle?q=getenv - what would be my process in OCaml?

I tried googling "get env var in Ocaml" - first hit is a link to stdlib, but I'm using base. It did at least give me the hint that Sys is a relevant namespace, so I go and look at the docs for Base.Sys (many clicks later - https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base/Sys/index.html) but getenv isn't listed. But it is apparently there…

There must be a better way?

Yawar Amin

The Hoogle equivalent for OCaml is called odig: https://erratique.ch/software/odig . You can install it locally and have it generate documentation for all installed packages. However, generated documentation is not globally searchable (see last point). Besides that, there are a few other strategies:

  • Familiarize yourself with the standard library that ships with every OCaml distribution: https://caml.inria.fr/pub/docs/manual-ocaml/libref/ . This is the equivalent of Haskell's base package. The Prelude equivalent module is called Pervasives. You will find the Sys module here, and getenv in there.
  • Keep http://opam.ocaml.org/packages/ handy for when you're given a package name to look up. Package documentation is mostly not uploaded to a central location like Haddock. (But people have been talking about setting that up at docs.ocaml.org.) You'll probably need to open up and search through .mli files once in a while.
  • The old-style ocamldoc documentation pages (like the standard library I linked above) have very handy pages indexing types, values, and modules. However, the newer odoc documentation pages which are becoming the de facto standard do not, as of yet. There are a couple of issues tracking this.

Other OCaml News

From the ocamlcore planet blog

Here are links from many OCaml blogs aggregated at OCaml Planet.

Old CWN

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.