OCaml Weekly News
Hello
Here is the latest OCaml Weekly News, for the week of June 02 to 09, 2020.
Table of Contents
Multicore Update: April 2020, with a preprint paper
Continuing this thread, Daniel Bünzli asked and KC Sivaramakrishnan replied
One thing that I didn’t get from the paper is how exactly
ConcurMinor
breaks the current FFI and the impact it would have on the existing eco-system, on a scale from “it affect all projects” to “only people doing that fancy thing” :–) ?
All the projects that use the C API. The details are here: https://github.com/ocaml-multicore/ocaml-multicore/wiki/C-API-changes
At the end of the paper it seems you make the point that
ParMinor
is the solution to go with for the time being. Does this means you are going to leave behind the work done onConcurMinor
or do you intend to continue to maintain it ?
We don't intend to maintain it. It is quite a bit of work to maintain and port the changes across two different GCs.
ParMinor
GC is now at 4.11 branch point (the default multicore compiler is 4.10 + ParMinor now). The ConcMinor
is
at 4.06.1.
Given that ConcMinor
breaks the C API, the ecosystem would have to be fixed for ConcMinor
to be useful. The code
changes are indeed intricate; the differences are not just in the minor GC, but the compilers internal use of the C
API. It will be quite a bit of work to keep both GCs in the same source distribution.
Guillaume Munch-Maccagnoni then said
Given that
ConcMinor
breaks the C API, the ecosystem would have to be fixed forConcMinor
to be useful.
I do not think this is necessarily true.
Here is why I think so, but be warned that this is preliminary as I do not have time to explore this idea further on my own at the moment.
State in Rust
Breaking the C API is a consequence of deciding that all single-threaded shared mutable state must assume they are also shared between threads. So a new read barrier is used to promote values when read from another thread. But for data types that were correct up to now, users must also be careful to avoid races from now on… for instance by avoiding sharing values of such types between domains.
One lesson of Rust is that there are different kinds of mutable state, for different usages, with different means to achieve thread-safety.
The closest there is to current OCaml's mutable
is the notion of single-threaded multiple-writers mutable state
(Cell
). It is made thread-safe in Rust by statically preventing values containing Cell
from crossing thread
boundaries (by virtue of not having the Send
trait). The same restriction is used to make some data structures
more efficient by avoiding the cost of synchronisation (cf. the reference-counting pointer Rc
vs. the atomic
reference-counting pointer Arc
).
This is not enough by itself, and Rust offers other kinds of state for communicating and sharing values between threads.
UnsafeCell
like Ocaml multicore's mutable
(though yours is safe thanks to the work on the memory model): it has
almost no restriction and can be sent across domains, but the user is likewise told to “avoid data races”. It is
rarely used alone, but together with type abstraction it can be used to program safe concurrent data structures.
Lastly, the default notion of state in Rust is linear state, which can be sent freely across threads. Thread-safety is ensured by restricting aliasing using the ownership and borrowing discipline.
A backwards-compatible concurrent collector?
If I had to imagine a backwards-compatible OCaml with static control of interference à la Rust based on ConcMinor
,
it would distinguish the three kinds of state (concretely with other keywords in addition to mutable
). mutable
would keep its current meaning of single-domain, multiple-writers state and not require a read barrier, and in
particular preserve the API. (I count systhreads as single-threaded for this purpose, since here it means "sharing
the same minor heap".)
Programs could progressively transition to other kinds of state when parallelising the program. Concretely, a data
structure like Stack.t
, instead of becoming racy, would keep its current meaning, but users could replace it with a
linear stack or a concurrent stack, two data structures distinct from the first one, when parallelizing their
programs.
So how could this fit with the current plans? It is not entirely clear to me. If people start to rely on parallelism
in an unstructured way (e.g. no clear distinction between different kinds of data types arising from different ways
of ensuring thread-safety) then one will also lose the ability to retrofit ConcMinor
in a backwards-compatible
manner (by losing the information that the current mutable
API is single-threaded). The API breakage of ConcMinor
which might only be virtual right now (if I trust this preliminary, not fully-explored idea) will become real.
(Further difficulties arise with the emulation of the Thread
library with domains, but this could be changed
later.)
But if users are provided in advance with a general direction for a model of control of interference this might
happen differently. And eventually having such a model is desirable in any case, as it helps parallelizing programs
(for instance the Firefox people reported that they had attempted and failed twice to parallelise the CSS engine in
C++ before succeeding with Rust). Furthermore, in an imaginary retrofitting of ConcMinor
, one could imagine
enforcing something like the Send
trait at the level of the read barrier until there is a better way (there would
be two kinds of barriers, one of which would raise an exception if a state happened to be incorrectly shared across
domains, and not be required in the FFI).
I find ConcMinor
interesting from a systems programming perspective compared to the stop-the-world collector
because it could (I hope) offer possibilities such as having a low-latency domain communicating with a higher-latency
domain. Moreover the performance cost of the read barrier might be lower in this scheme if it could be removed for
all but the concurrent data structures.
BAP 2.1.0 Release
Ivan Gotovchits announced
The Carnegie Mellon University Binary Analysis Platform (CMU BAP) is a suite of utilities and libraries that enables analysis of programs that are represented as machine code (aka binaries). CMU BAP is written in OCaml and uses plugin-based architecture to enable extensibility. We also have a domain-specific language, called Primus Lisp, that we use to write analysis, specify verification conditions, interact with the built-in SMT solver, and model the semantics of machine instructions and functions.
The 2.1.0 Release is very rich in new features but the most prominent addition is the new symbolic executor mode for the Primus framework. We also significantly updated the Primus framework, integrated it with our new Knowledge Base, which was introduced in the BAP 2.0 release; we made our interpreter much faster; we added the systems and components facilities, inspired by Common Lisp; and we implemented a gradual type checker for Primus Lisp with type inference. We also added an ability to represent machine instructions as intrinsic functions so now it is possible to express their semantics using Primus Lisp since we added IEEE754 primitives to the Lisp interpreter.
As usual, we upgraded BAP to the newer versions of the Core library and OCaml (we now support OCaml versions from 4.07 to 4.09). We also significantly improved our build times and added an optional omake backend, which we are using in-house.
From the user perspective, one of the key features of BAP as an analysis platform is that you can run BAP on binaries that you can't run otherwise, either because they need special hardware or software, or need to interact with the outside world. In the past couple of months, we have run BAP on various firmware and found numerous zero-day vulnerabilities, particular, we were able to find critical vulnerabilities in the VxWorks operating system that runs on, potentially, billions of devices including mission-critical and military appliances.
As always, questions, suggestions, and opinions are very welcome!
Migrating an Async project to Lwt, a short primer
Michael Bacarella announced
Consider this a post where I think aloud about my experience migrating an Async project to Lwt. I've spent about a weekend doing such a thing, and if, in the process of talking about it here I can save a few people an hour or two (or perhaps inspire confidence to take such a project on in the first place) then it will have been worthwhile.
This wouldn't be a complete post if I didn't also mention @dkim's translation of Real World OCaml's Async examples to Lwt
This was born out of a previous effort where I tried to mix Lwt and Async in the same project. This didn't go so well, so I tried converting the whole thing to Lwt, and it turns out adapting to Lwt if you're an Async person is actually much easier than I thought it would be.
Basics
Both libraries involve promises/futures. Async calls its promises
Deferred.t
, whereas in Lwt they're called Lwt.t
.
In Async you start your program by saying never_returns (Scheduler.go ())
or Command.async_spec
after you set up
your initial Deferred.t
.
In Lwt you say Lwt_main.run
on a top-level Lwt.t
argument. Note you can
re-run Lwt_main.run
in a single program as many times as you want, but
perhaps you shouldn't run multiple Lwt_main.run
in parallel.
There's an easy correspondence between basic operators.
Async | Lwt |
---|---|
Deferred.bind |
Lwt.bind |
Deferred.return |
Lwt.return |
>>= |
>>= |
Deferred.map |
Lwt.map |
>>| |
>|= |
Deferred.don't_wait_for |
Lwt.async |
In_thread.run |
Lwt_preemptive.detach |
Starvation worries
The most important difference between Async and Lwt is that fulfilled promises are acted on immediately, whereas Async kinda punts them to the end of a work queue and runs their thunks later.
A return loop like this starves the rest of Lwt:
open Lwt.Infix let main () = let rec loop () = Lwt.return () >>= fun () -> loop () in Lwt.async (loop ()); Lwt_io.printlf "this line never prints!" ;; let () = Lwt_main.run main ;;
whereas the corresponding Async loop does not starve:
open! Async let main () = let rec loop () = Deferred.return () >>= fun () -> loop () in don't_wait_for (loop ()); printf "this line does print!\n"; return () ;; let () = let cmd = Command.async_spec ~summary:"" Command.Spec.empty main in Command.run cmd ;;
Fortunately there's a workaround. You can get something closer to the Async-style behavior in Lwt by using Lwt.yield ()
instead of Lwt.return ()
.
Spawning threads
From time to time you may need to run something in a system thread. In Async you say In_thread.run
, whereas in
Lwt you say Lwt_preemptive.detach
. For simple things they're pretty much interchangeable, but one stumbling point
for me was that in Async you can create a named thread and always use that for the In_thread.run
, with multiple
simultaneous dispatches to that thread becoming sequenced.
This is really useful for interacting with libraries that aren't so thread friendly.
Lwt's detach doesn't provide an easy way to do this out of the box, but I think you can still deal with thread
unfriendly libraries by using the Lwt_preemptive.run_in_main
call.
Basically, never exit the detach thread you started to interact with your library, and instead have it block on promise that gets filled through run_in_main. In this way you can sequence your detached Lwt thread similarly to Async.
Happy to explain further if this is unclear.
Other libraries
Async.Unix
has a somewhat built-up conception of the UNIX API, whereas
Lwt_main
is more a direct mapping of ocaml's Unix
module to promises.
Async Clock.every
and Clock.after
don't have exact analogs, but you can
make new versions pretty simply.
Example of a shallow imitation of Async Clock.every
let every span f = Lwt.async (fun () -> let span = Time.Span.to_sec span in let rec loop () = f (); Lwt_unix.sleep span >>= fun () -> loop () in loop ()) ;;
Open questions
I haven't sorted out a good Lwt substitute that's as comfortable as Async Pipe yet. Though some combination of
Lwt_stream, Lwt_sequence and lwt-pipe
might fit the bill. If you just happen to know already feel free to
cluephone.
Closing remarks
This is basically everything? I'm almost suspicious that I'm not having more problems, but will happily accept grace when it arises.
Raphaël Proust then said
I haven’t sorted out a good Lwt substitute that’s as comfortable as Async Pipe yet. Though some combination of Lwt_stream, Lwt_sequence and
lwt-pipe
might fit the bill. If you just happen to know already feel free to cluephone.
The Tezos project has a pipe-like module: https://gitlab.com/tezos/tezos/-/blob/master/src/lib_stdlib/lwt_pipe.mli
It hasn't been released as a standalone library (yet) but it is released as part of the tezos-stdlib
package.
I haven't used Async's pipe, so I don't know how close of a match it is.
jose 0.4.0
Ulrik Strid announced
A new release of JOSE has been published to opam
The following changes has been made
- RFC7638: Implement thumbprints @undu
- Make kid optional in the header and jwk to align better with the spec, this is a breaking change
I have started dog fooding the library for a OpenID Connect client which hopefully will help with the design going forward.
OCaml 4.11.0, second alpha release
octachron announced
A new alpha version of OCaml 4.11.0 has been published. Compared to the first alpha version, this version contains the following new bug fixes:
- additional fixes 6673, 1132, +9617: Relax the handling of explicit polymorphic types (Leo White, review by Jacques Garrigue and Gabriel Scherer)
- additional fixes 7364, 2188, +9592, +9609: improvement of the unboxability check for types with a single constructor. Mutually-recursive type declarations can now contain unboxed types. This is based on the paper https://arxiv.org/abs/1811.02300
- 7817, 9546: Unsound inclusion check for polymorphic variant (Jacques Garrigue, report by Mikhail Mandrykin, review by Gabriel Scherer)
- 9549, 9557: Make -flarge-toc the default for PowerPC and introduce -fsmall-toc to enable the previous behaviour. (David Allsopp, report by Nathaniel Wesley Filardo, review by Xavier Leroy)
- 9320, 9550: under Windows, make sure that the Unix.exec* functions properly quote their argument lists. (Xavier Leroy, report by André Maroneze, review by Nicolás Ojeda Bär and David Allsopp)
- 9490, 9505: ensure proper rounding of file times returned by Unix.stat, Unix.lstat, Unix.fstat. (Xavier Leroy and Guillaume Melquiond, report by David Brown, review by Gabriel Scherer and David Allsopp)
- 8676, 9594: turn debugger off in programs launched by the program being debugged (Xavier Leroy, report by Michael Soegtrop, review by Gabriel Scherer)
- 9552: restore ocamloptp build and installation (Florian Angeletti, review by David Allsopp and Xavier Leroy)
- 7708, 9580: Ensure Stdlib documentation index refers to Stdlib. (Stephen Dolan, review by Florian Angeletti, report by Hannes Mehnert)
- 9189, 9281: fix a conflict with Gentoo build system by removing an one-letter Makefile variable. (Florian Angeletti, report by Ralph Seichter, review by David Allsopp and Damien Doligez)
The compiler can be installed as an OPAM switch with one of the following commands
opam switch create ocaml-variants.4.11.0+alpha2 --repositories=default,beta=git+https://github.com/ocaml/ocaml-beta-repository.git
or
opam switch create ocaml-variants.4.11.0+alpha2+<VARIANT> --repositories=default,beta=git+https://github.com/ocaml/ocaml-beta-repository.git
where <VARIANT> is replaced with one of these: afl, flambda, fp, fp+flambda
The source code for the alpha is also available at these addresses:
- https://github.com/ocaml/ocaml/archive/4.11.0+alpha2.tar.gz
- https://caml.inria.fr/pub/distrib/ocaml-4.11/ocaml-4.11.0+alpha2.tar.gz
If you find any bugs, please report them here: https://github.com/ocaml/ocaml/issues
OCaml Workshop 2020: Call for Volunteers
Ivan Gotovchits announced
The OCaml Workshop will be held in the virtual format this year, which poses new challenges and requires people with special talents and training. The Organizing Committee is seeking for members who will volunteer to fill one (or more) of the following roles:
- AV Editor
- Session Host
- Transcribers/Interpreter
- Content Manager
- Accessibility Chair
The roles are described in details below. We are asking prospective Organizing Committee members to contact the Organizing Committee chair ([ivg@ieee.org](mailto:ivg@ieee.org)), indicating which role(s) they are ready to take.
AV Editor
AV (Audio/Video) editors are responsible for previewing the presentations and providing help and feedback to the authors. Ideally we target for one editor per talk.
- Duties
- Preview and (if necessary) post-process or (ask the author to shoot again) the pre-recorded videos.
- Advise authors and help in choice of software and hardware, teach how to set up the camera, light, make sure that the audio is of good quality and, in general, channel our quality guidelines.
- Ensure that all videos are of the same quality, the audio levels are the same, and that everything is loud and clear.
Session Hosts
Session hosts will assist session chairs in streaming the pre-recorded videos as well as helping and moderating the Q&A sessions and the panel session. They will also be responsible for security and be ready to react to potential threats and wrongdoers. Since we will broadcast sessions in several time zones we need several hosts for each session.
- Duties
- Moderating the text chats
- Controlling microphones in the video-conferencing
- Watching for the time
- Performing sound checks
- Welcoming and otherwise guiding participants
Transcribers / Interpreters
We would like to have at least English transcriptions for each talk and translations to other languages are very welcome. Transcriptions enable accessibility as well as potentially increase the audience and publicity as they could be indexed by the search engines.
- Duties
- Create transcriptions for videos, potentially in other languages.
Content Manager
The content manager will be responsible for maintaining the web presence of the conference on https://ocaml.org/. We plan to have all videos available, as well as maintain a page for each submitted work.
Accessibility Chair
We are striving to make the conference accessible to everyone and we are looking for volunteers who have experience in online accessibility.
- Duties
- Helping with the selection of accessible platforms and tools.
- Working with attendees to ensure the necessary access services are included.
- Establishing best practices for preparing and running accessible sessions.
Introduction to Lwt
Raphaël Proust announced
I've published https://raphael-proust.github.io/code/lwt-part-1.html, a 2-part introduction to Lwt.
The main aim of the introduction is to give a good mental model of what promises are, how they behave and how to use them. It assumes basic familiarity with OCaml.
Don't hesitate to ask questions or share feedback.
Other OCaml News
From the ocamlcore planet blog
Here are links from many OCaml blogs aggregated at OCaml Planet.
Old CWN
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.