OCaml Weekly News

Previous Week Up Next Week

Hello

Here is the latest OCaml Weekly News, for the week of October 06 to 13, 2020.

Table of Contents

vec 0.1.0

Alex Ionescu announced

I'm happy to announce the first release of vec, a library for safe dynamic arrays with Rust-like mutability permissions.

You can find the package on opam here, and the source repository is here.

Looking for feedback and suggestions!

Josh Berdine asked and Alex Ionescu replied

I'm curious how you would compare this with Containers.Vector?

Oh, I didn't know that existed. Well, with vec you can control the vector's growth rate (though I imagine this is a very niche feature).

Also, from what I can see CCVector's can only be read-write or read-only, vec also supports write-only vectors. (Which is useful if e.g. you want to pass a buffer to some function to fill it, but don't want the function to read its current data)

Calascibetta Romain later said

Pvec from @dbuenzli (unreleased) wants to provide a vector without the required value to initiate it (and without Obj.magic). I don't know the status of it but it's quite interesting: https://github.com/dbuenzli/pvec

Memtrace, a new memory profiler for OCaml

Yaron Minsky announced

I thought people might be interested in this post, by Luke Maurer.

https://blog.janestreet.com/finding-memory-leaks-with-memtrace/

It announce memtrace-viewer, which is a fancy new memory profiler for OCaml, based on the new support for statistical memory profiling, based on work by Jacques Henri-Jourdan and Stephen Dolan.

I mostly want to say: this is really good stuff. We've gotten a ton of benefit from it at Jane Street. People have found the tool to be easy to use, and they've been able to uncover and fix a wide variety of performance problems with it. So, try it out!

And it would be interesting for people to reply to this thread with any experience reports. I'd love to get a better sense of the utility people are deriving from our tools outside of Jane Street.

David Allsopp then added

This is great, thanks! I had a go with it using the native Windows ports running memtrace-viewer from WSL. The result's a nice workflow (and a few bug fixes here and there…), which I've written up here.

BER MetaOCaml N111, for OCaml 4.11.1

Oleg announced

BER MetaOCaml N111 is a strict superset of OCaml 4.11.1 for ``writing programs that generate programs''. BER MetaOCaml adds to OCaml the type of code values (denoting ``program code'', or future-stage computations), and two basic constructs to build them: quoting and splicing. The generated code can be printed, stored in a file – or compiled and linked-back to the running program, thus implementing run-time code optimization. A well-typed BER MetaOCaml program generates only well-scoped and well-typed programs: The generated code shall compile without type errors. Staging-annotation–free BER MetaOCaml is identical to OCaml; BER MetaOCaml can link to any OCaml-compiled library (and vice versa); findlib and other tools can be used with BER MetaOCaml as they are, in their binary form.

BER MetaOCaml N111 is a superset of the recently released OCaml 4.11.1. The significant addition is let rec insertion: the facility to generate groups of mutually recursive definitions whose size is not statically known. It is now possible to control the scope of let-insertion, which is often necessary when the let-bound expression is effectful. The well-scopedness guarantee is still maintained. Offshoring got more polish.

As the simplest illustration of let rec insertion, the specialization of the Ackermann function

let rec ack m n =
  if m = 0 then n+1 else
  if n = 0 then ack (m-1) 1 else
  ack (m-1) (ack m (n-1))

to m=2 produces the following code

val sac2 : (int -> int) code = .<
  let rec h_6 n_7 = n_7 + 1
  and h_4 n_5 = if n_5 = 0 then h_6 1 else h_6 (h_4 (n_5 - 1))
  and h_2 n_3 = if n_3 = 0 then h_4 1 else h_4 (h_2 (n_3 - 1)) in h_2>.

For more explanations, please see
http://okmij.org/ftp/ML/MetaOCaml.html
which now has dedicated sections on let rec insertions and offshoring. See also ChangeLog and NOTES.txt in the BER MetaOCaml distribution.

BER MetaOCaml N111 should be available though OPAM, hopefully soon. In the meanwhile, it is available as a set of patches to the OCaml 4.11.1 distribution.
http://okmij.org/ftp/ML/ber-metaocaml.tar.gz
See the INSTALL document in that archive. You need the source distribution of OCaml 4.11.1.

Clap 0.1.0 (Command-Line Argument Parsing)

Continuing this thread, rbardou said

I did not know about minicli, thanks for the link!

Clap and minicli indeed share the same approach. Here are some differences I observed.

  • In minicli, the list of arguments is immutable; arguments are not consumed. It is "more pure".
  • As a consequence, minicli does not seem to support unnamed arguments (also known as positional arguments).
  • As another consequence, in minicli each function takes the list of arguments, which makes it a bit more flexible at the cost of being slightly more verbose.
  • In minicli there is a hash table storing the list of seen arguments. It looks like this means that the immutable list of arguments must thus always be the same if you want to call finalize, which relies on this invariant.
  • minicli does not generate a –help (but it would be possible rather easily).
  • minicli considers that negative numbers are not option names, and is thus capable of handling them more naturally (in Clap you would need to escape the dash).
  • I think that minicli does not consider "-abc" to be equivalent to "-a -b -c", whereas Clap differentiates short names (starting with a single dash and which can be factorize like that) and long names (starting with two dashes).
  • minicli's source code is significantly shorter and thus simpler and easier to maintain.

UnixJunkie then added

On another topic, maybe you will be interested by parany, if you don't already know about it. :wink: https://github.com/UnixJunkie/parany

Multicore OCaml: September 2020

Anil Madhavapeddy announced

Welcome to the September 2020 Multicore OCaml report! This update along with the previous monthly updates have been compiled by @shakthimaan, @kayceesrk and @avsm.

Big news this month is that the systhreads compatibility support PR has been merged, which means that Dune (and other users of the Thread module) can compile out of the box. You can now compile the multicore OCaml fork conveniently using the new opam compiler plugin (see announcement):

opam update
opam compiler create "ocaml-multicore/ocaml-multicore:no-effect-syntax"
eval $(opam env)

This selects the branch of multicore OCaml that omits the experimental effect syntax, and thus works with the existing ppx ecosystem. It's quite fun opam installing ecosystem packages and seeing them operate out of the box at long last. There are still a few rough edges to the thread compatibility support (mainly at the C compatibility layer, such as registering external C threads with the GC), but these will be worked out in the coming weeks. We'd like to hear of any build failures you encounter in the opam universe with this: please report them on https://github.com/ocaml-multicore/ocaml-multicore/issues

A number of performance improvements to the multicore OCaml GC and the Sandmark benchmarking project have also been completed through September:

  • we have now included the Kronecker implementation from the Graph500 benchmarks to Sandmark - an n-queen benchmark addition is in progress
  • benchmark runs now provide a count of the OCaml symbols as a code size metric
  • work on building Tezos with multicore OCaml, and integration with the Sandmark benchmarking test suite has also begun.

We have also begun an effort to port Lwt to take advantage of parallelism via Lwt_preemptive. Code samples and test runs have been performed, and Sudha has written an introductory blog post about her early results. Note that this work doesn't change the core behaviour of Lwt (a cooperative futures framework with no context switching between bind calls), but allows parallelism via explicit calls to background preemptive threads.

On the upstreaming efforts to OCaml, the 4.12 release will freeze earlier than usual in October, and so we finished submitting the last of the garbage collector colour changes and are aiming for the work on reliable safe points to go into OCaml 4.13. There have been a lot of runtime changes packed into 4.12 already, and so we will issue a call for testing when the release candidate of 4.12 is cut.

Onto the details of the PRs. As with the previous updates, the Multicore OCaml updates are listed first, which are then followed by the enhancements to the Sandmark benchmarking project. The upstream OCaml ongoing and completed updates are finally mentioned for your reference.

Multicore OCaml

  • Ongoing
    • ocaml-multicore/domainslib#17 Implement channels using Mutex and Condition Variables

      The lib/chan.ml sources have been updated to implement channels using Mutex and Condition Variables, and a LU_decomposition_multicore.exe test has been added for the same.

    • ocaml-multicore/ocaml-multicore#381 Reimplementating systhreads with pthreads

      This PR is actively being reviewed for the use of pthreads in Multicore OCaml. It introduces the Domain Execution Contexts (DEC) which allows multiple threads to run atop a domain.

    • ocaml-multicore/ocaml-multicore#394 Changes to polling placement

      The polls placement is done at the start of the functions and on the back-edge of loops, instead of using Feely's algorithm. This is a work-in-progress.

    • ocaml-multicore/ocaml-multicore#401 Do not handle interrupts recursively

      A domain local variable is introduced to prevent handling of interrupts recursively.

    • ocaml-multicore/ocaml-multicore#402 Split handle_gc_interrupt into handling remote and polling sections

      A caml_poll_gc_work is introduced that has information of GC work done previously in caml_handle_gc_interrupt. This facilitates stw_handler to make calls to poll and not handle service interrupts, as it may lead to unwanted recursion.

    • ocaml-multicore/ocaml-multicore#403 Segmentation fault when building Tezos on Multicore 4.10.0 with no-effects-syntax

      This is an on-going investigation on why the package tezos-embedded-protocol-packer in Tezos is causing a segmentation fault when building with Multicore OCaml.

  • Completed
    • Domainslib
      • ocaml-multicore/domainslib#19 Finer grain signalling with mutex condvar for Channels

        The use of fine grain locking for Mutex and condition variables helps in improving the performance for larger cores, as against a single mutex for all the signalling.

    • Multicore OPAM
      • ocaml-multicore/multicore-opam#31 Patch dune.2.7.1 for Multicore OCaml

        The opam file for dune.2.7.1 has been added along with a patch to bootstrap.ml to get it working for Multicore OCaml, thanks to Chaitanya Koparkar.

      • ocaml-multicore/multicore-opam#32 Add ocamlfind-secondary dependency to dune

        The installation of dune requires ocamlfind-secondary as a dependency for dune.2.7.1, and has been added to the OPAM file.

    • Multicore OCaml
      • ocaml-multicore/ocaml-multicore#395 Move to SPIN_WAIT for all spins and usleep in SPIN_WAIT

        The PR provides the SPIN_WAIT macro for all the busy spin wait loops, and uses caml_plat_spin_wait when busy waiting. This ensures that the same spin strategy is used in different places in the code.

      • ocaml-multicore/ocaml-multicore#397 Relaxation of backup thread signalling

        The signalling to the backup thread from the mutator thread when leaving a blocking section is modified. It reduces the potential Operating System scheduling when re-entering OCaml.

      • ocaml-multicore/ocaml-multicore#400 Demux eventlog for backup thread

        The events in the backup thread were emitting the same process ID as the main thread, and this PR separates them.

        09456772484b8be0899c9812f634816da4db5e7d_2_1380x492.png

        In the above illustration, the backup threads are active when the main thread is waiting on a condition variable.

Benchmarking

  • Ongoing
    • ocaml-bench/sandmark#159 Implement a better way to describe tasklet cpulist

      We need a cleaner way to obtain the taskset list of cores for a benchmark run when we are provided with a number of domains. We should be able to specify hyper-threaded cores, NUMA zones to use, and the specific cores to use for the parallel benchmarks.

    • ocaml-bench/sandmark#173 Addition of nqueens benchmark to multicore-numerical

      A draft version of the classical n queens benchmark has been added for review in Sandmark. This includes both the single and multicore implementation.

  • Completed
    • ocaml-bench/ocaml_bench_scripts#11 Add support for configure option and OCAMLRUNPARAM

      The ocaml_bench_scripts has been updated to support passing configure options and OCAMLRUNPARAM when building and running the benchmarks in Sandmark.

    • ocaml-bench/sandmark#122 Measurements of code size

      The output .bench JSON file produced from the benchmarks now includes a code size metric for the number of CAML symbols. A sample benchmark output is shown below:

      {"name":"knucleotide.", ... ,"codesize":276859.0, ...}
      

      The code size count for few of the benchmarks is given below:

      Benchmark Count
      alt-ergo 2_822_040
      coqc 5_869_305
      cpdf 1_131_376
      nbody.exe 276_710
      stress.exe 84_061
      fft.exe 38_914
    • ocaml-bench/sandmark#170 Graph500 SEQ

      The Graph500 benchmark with a Kronecker graph generator has now been added to Sandmark. The generator builds three kernels for graph construction, Breadth First Search, and Single Source Shortest Paths.

    • ocaml-bench/sandmark#172 Remove Base, Stdio orun dependency for trunk

      The orun sources in Sandmark have been updated to remove the dependency on both Base and Stdio. They have been replaced with functions from Stdlib, List, String and Str.

    • ocaml-bench/sandmark#174 Cleanup our use of sudo for chrt

      The use of sudo has been removed from the Makefile for running parallel benchmarks, to avoid creating output files and directories that require root permissions for access. The use of RUN_BENCH_TARGET=run_orunchrt will execute the benchmarks using chrt -r 1. The user can give permissions to the chrt binary using:

      sudo setcap cap_sys_nice=ep /usr/bin/chrt
      

OCaml

  • Ongoing
    • ocaml/ocaml#9876 Do not cache young_limit in a processor register

      The PR removes the caching of young_limit in a register for ARM64, PowerPC and RISC-V ports, as it is problematic during polling for signals and inter-domain communication in Multicore OCaml.

  • Completed
    • ocaml/ocaml#9756 Garbage collectors colour change

      The gray colour scheme in the Garbage Collector has been removed to facilitate merging with the Multicore OCaml collector. The existing benchmarks in Sandmark suite that did overflow the mark stack are show in the below illustration, and there is little negative impact on the change. 10b2ae2b0f7cffe0148ee97b828ded5d4ed36a21_2_1380x990.png

    As always, we would like to thank all the OCaml developers and users in the community for their continued support and contribution to the project. Be well!

Acronyms

  • ARM: Advanced RISC Machine
  • BFS: Breadth First Search
  • DEC: Domain Execution Context
  • GC: Garbage Collector
  • JSON: JavaScript Object Notation
  • NUMA: Non-Uniform Memory Access
  • OPAM: OCaml Package Manager
  • OS: Operating System
  • PR: Pull Request
  • RISC-V: Reduced Instruction Set Computing - V
  • SSSP: Single Source Shortest Path

sid asked and Anil Madhavapeddy replied

Curious to know: How will merging multicore as a whole work? Until now PRs that are relevant for multicore are being merged gradually into the main OCaml repo. These PRs tend to be smallish mostly (with some exceptions).

But the multicore project has thousands of lines of new code. How will that code be broken up into separate and digestible chunks to get merged? If each of those PRs go through the traditional review process there could be further changes required in multicore itself. I understand everything is finely balanced so this could end up causing issues. Alternatively will will the multicore repo become the main repo (I guess that would be unlikely).

That's a good question @sid. Once all the various architectural dependencies are in place, the GC itself is just a few standalone C files. The current plan is just to do focussed review from the core development team to that particular PR, with plenty of time in the development cycle to facilitate changes. The intention is to branch to OCaml 5.0 when the domains-only support lands, OCaml 4.x maintained as a longer term support branch while the 5.x series settles down. This is a major enough change that we are expecting some deviance from the release cadence of the past few years.

The multicore repo will definitely not become the main OCaml repo. We are reimplementing clean PRs for upstream OCaml, as the multicore repo history is long, storied and not especially useful.

Generally curious how the “end game” will play out…

It's also just the beginning of the game :-) I'm very excited about some of the ongoing developments for post 5.0, such as fibres and effects. We'll have more details (and paper drafts) on that as the research dust settles over the next few quarters.

Spin 0.7.0

Thibaut Mattio announced

I'm happy to announce a new version of Spin (0.7.0).

This release comes with a new official template spa, to generate Single Page Applications with Js_of_ocaml.

It also removes the dependency on Reason, so Spin is now compatibility with OCaml 4.11.

Here's the complete list of changes: https://github.com/tmattio/spin/releases/tag/0.7.0

Not part of this release, but I thought I'd mention it, I created two non-official templates:

If you have any feedback or suggestions, don't hesitate to open an issue.

Bootstrapping our way to Hashconsing and quotations with PPX Rewriters

Chet Murthy announced

This post is about PPX rewriters, using multiple of them in sequence, using one rewriter in implementing others, and getting to something …. somewhat surprisingly complex, in simple steps. All of this has been done using PPX rewriters based on camlp5 (and pa_ppx), but should in principle be doable on ppxlib (the standard support infrastructure for PPX rewriters).

I should note that the portions regarding hash-consing are all a pretty faithful re-implementation and mechanization of the paper of Filliatre and Conchon: Type-Safe Modular Hash-Consing. All errors are mine, of course.

TL;DR This post describes how, starting with an AST type and a parser for it, we can more-or-less automatically generate

  • hash-consed versions of the AST,
  • functions back-and-forth,
  • and surface-syntax "quotation" expanders for both types

so that code doesn't need to manipulate the AST directly, but can instead use the surface syntax (hence being more-or-less indifferent to whether it's applied to the original or hashconsed version of the AST).

All of the code discussed here is available on github at: https://github.com/camlp5 , in projects camlp5/pa_ppx, camlp5/pa_ppx_{migrate,hashcons,q_ast,params}. The latter ones are not (yet) released on OPAM, but will be soon. I apologize in advance for the nonexistent-to-poor documentation: I'm working on it! Working code for everything described below can be found at camlp5/pa_ppx_q_ast/tests, in the directories sexp_example and eg_sexp_example.

Motivation (a Concrete Example)

The ability to transparently introduce hash-consing into a complex collection of AST types should need no argument. The ability to use "quotations" over such an AST type might need some motivation. So consider a type of s-expressions, viz

type sexp =
    Atom of string
  | Cons of sexp * sexp
  | Nil

with the obvious parsing that we're all used-to from LISP/Scheme. The hashconsed version of this type is

type sexp_node =
    Atom of string
  | Cons of sexp * sexp
  | Nil
and sexp = sexp_node hash_consed

with (from the opam package hashcons)

type +'a hash_consed = private {
  hkey : int;
  tag : int;
  node : 'a }

NOTE: there is a nuance here that I'll address at the end of ths post in the section "Appendix B: Types with and without vala".

Let's suppose we want to write the function atoms : sexp -> string list that returns the list of string (those wrapped by Atom) at the leaves of the s-expression. The code is easy enough (just rotate left-child cons-nodes to the right, until we get an atom (or Nil) and then move on to the cdr. This is a good example to consider, because it requires multi-level pattern-matching and multi-level constructor-expressions. So the introduction of meaningless bureaucracy will be palpable.

let rec atoms =
  function
    Nil -> []
  | Atom a -> [a]
  | Cons(Cons(caar, cdar), cdr) ->
      atoms (Cons(caar, Cons (cdar, cdr)))
  | Cons(Nil, cdr) -> atoms cdr
  | Cons(Atom a, cdr) -> a :: atoms cdr

and the hashconsed version is

let rec atoms =
  function
    {node = Nil} -> []
  | {node = Atom a} -> [a]
  | {node = Cons({node = Nil}, cdr)} -> atoms cdr
  | {node = Cons({node = Atom a}, cdr)} -> a :: atoms cdr
  | {node = Cons ({node = Cons (caar, cdar)}, cdr)} ->
      atoms (make_sexp (Cons (caar, (make_sexp (Cons (cdar, cdr))))))

As you can see, there are extra patterns { node = ...} and a new constructor make_sexp (to perform the actual hashtable lookup & consing). And these extra bits appear at multiple levels in both patterns and expressions.

Wouldn't it be nice, if we could write one version of this code, viz.

let rec atoms = function
    <:sexp< () >> -> []
  | <:sexp< $atom:a$ >> -> [a]
  | <:sexp< ( () . $exp:cdr$ ) >> -> atoms cdr
  | <:sexp< ( $atom:a$ . $exp:cdr$ ) >> -> a::(atoms cdr)
  | <:sexp< ( ( $exp:caar$ . $exp:cdar$ ) . $exp:cdr$ ) >> ->
    atoms <:sexp< ( $exp:caar$ . ( $exp:cdar$ . $exp:cdr$ ) ) >>

and merely by changing "<:sexp<" to "<:hcsexp<" get a version of the function that works on hashconsed s-expressions? The text within <:sexp< .... >> is called a "quotation" in Camlp5, and is similar to the same concept in ppx_metaquot and the much older LISP idea of "quasi-quotation". The contained text is parsed with a parser for s-expressions, slightly modified to have indications for where the text $...$ may appear – these are called "anti-quotations', and can contain OCaml source code (e.g. variables) albeit not quotations (so no arbitrary nesting). The "quotation expander" parses this text to AST and applies a converter to produce an OCaml expression or pattern AST that does what the quotation intends. The lovely thing is, by changing out the quotation-expander (replace "sexp" with "hcsexp") we can change the code that is generated, and if the quotation-expander is generated from the type definition, it's not actually any work for the programmer to achieve this.

NOTE: Unlike with ppx_metaquot, the antiquotations can be placed in nearly-arbitrary positions in the parse-tree (hence, in the AST): there is no requirement that they correspond to variable-names or identifiers in the OCaml AST: indeed, our running example will not be the OCaml AST, even though all of this machinery has been applied-to the OCaml AST successfully.

In this post I'll walk you thru how to achieve this goal: building up the machinery, step-by-step, to allow one to write basically arbitrary expressions in your AST's surface syntax, and automatically get either "normal' (no hash-consing) or "hashconsed" patterns & expressions.

A High-Level Plan of Attack

A while back in the "Future of PPX" post ( https://discuss.ocaml.org/t/the-future-of-ppx/3766 ) there was some discussion of hash-consing for ASTs, and the complexities of achieving it. I wrote a reply post "Hashconsing an AST via PPX" ( https://discuss.ocaml.org/t/hashconsing-an-ast-via-ppx/5558 ) where I showed how one could use a PPX rewriter to automate the task of "re-sharing" an AST when writing a top-down/bottom-up rewriter for an AST. (That is, you're walking an AST, making small modifications, but most of it stays the same; so in principle, at a node Add(Mul(e1,e2), e3) when rewriting e3 actually changes it visibly, but Mul(e1,e2) doesn't, we might want the output value's first subtree to be pointer-equal to the input's first subtree.) I called this "rehashcons"ing (admittedly not a great name). The idea being, rehashcons is not trying to hash-cons, but only to restore whatever sharing was there originally, to whatever extent that's possible.

But this is neither sufficient in all cases, nor real hash-consing. The problem with hash-consing is::

A. If you start with an AST type that doesn't have the various bits needed for hash-consing (at a minimum, a type 'a node = { it : 'a ; hashcode : int } and its use at each spot where we want to hash-cons) then you have to first produce a new AST type with those bits inserted. Let's call these the "normal" and "hashconsed" ASTs.

B. You have to write tons of boilerplate code: first, to map to-and-from these two type-families, and second to implement the hash-consing (special constructors, (ephemeral) hashtables here-and-there, etc). Then perhaps you'd like to memoize functions over these hash-consed ASTs, and that's more boilerplate.

C. And then, when it comes to writing expressions and patterns over this new AST type, you have to remember where the "special bits" go – where to insert the special constructors, and where to add {it=....} to patterns.

It's all a bit of a tedious bother, when what we want is to just manipulate the hashconsed data-type as if it were the "normal" type, and have the messy bits filled-in for us.

This post is about how to achieve that.

A plan for how to achieve reasonably transparent hash-consing

Let's first map out the plan of attack:

  1. Start with an AST type (or types) for our language, and a parser (in this case, written using Camlp5's grammar machinery).
  2. Add to the AST type some indications for where antiquotations may go, and modify the parser to parse these antiquotations. We'll call this the "normal" AST type. Note that this is the version with antiquotation markers.
  3. pa_ppx_hashcons Generate a hashconsed version of the AST type from the "normal" AST type.
  4. pa_ppx_migrate Generate functions back-and-forth between "normal" and "hashconsed" versions of the AST type. So we're hashconsing the version with antiquotation markers. We could hashcons the version without antiquotation markers, and everything here would still work out … but it would be more complicated to explain.
  5. pa_ppx_q_ast From each of the "normal" and "hashconsed" AST types, generate functions that can take values of the type and generate OCaml code for patterns and expressions that correspond to those values.
  6. pa_ppx.deriving_plugins.params In implementing the above, we're describing complex tasks, so it's possible that when not automatically inferrable from types, the "hints" we might need to give will be complex. It would be nice if there were a way to automatically generate code to parse such specifications. If we were writing some other application, we might want to use ppx_deriving.yojson (or our equivalent, pa_ppx.deriving_plugins.yojson) and write our specification in JSON. But since we're writing a PPX rewriter, the specification will come in the payload of a PPX attribute/extension. Since our "hints" might need to contain types and expressions, we'd probably like the payload to be real expressions and types. So what we need is a type-driven mapping from expression-ASTs, to expressions.

    Then, when implementing a PPX rewriter, we can write down the type of its "params", and from that generate the function that will convert expression-ASTs to that type.

The implementation of all of the above, is what I will describe in the rest of this note. I've applied it to

  • (simple) s-expressions
  • (simple) deBruijn lambda-terms
  • (simple) named-variable lambda-terms
  • (complex and comprehensive) the entire OCaml AST in Camlp5.

A Worked Example: s-expressions.

In this section, I'll work thru how to apply the ideas above, step-by-step, to s-expressions. Everything described here is working code, documented and tested in camlp5/pa_ppx_q_ast/tests/{sexp_example,eg_sexp_example}.

  • 0. Write the AST type (without antiquotations)

    Copying from above

    type sexp =
        Atom of string
      | Cons of sexp * sexp
      | Nil
    
  • 1. Add antiquotation markers and add a parser
    type sexp =
        Atom of (string vala)
      | Cons of (sexp vala) * (sexp vala)
      | Nil
    

    The type 'a vala is a Camlp5 type-constructor. It contains either a value of type 'a, or an antiquotation. A short argument for why antiquotations markers are necessary, can be found in "Appendix C: Is vala Necessary?"

    Here's the parser:

    sexp: [
      [
        a = V atom "atom" -> sexp_atom a
      | "(" ; l1 = LIST1 v_sexp ; opt_e2 = OPT [ "." ; e2 = v_sexp -> e2 ] ; ")" ->
        match opt_e2 with [
          None -> List.fold_right (fun vse1 se2 -> Sexp.Cons vse1 <:vala< se2 >>) l1 sexp_nil
        | Some ve2 ->
           let (last, l1) = sep_last l1 in
           List.fold_right (fun vse1 se2 -> Sexp.Cons vse1 <:vala< se2 >>) l1
             (Sexp.Cons last ve2)
        ]
      | "(" ; ")" ->
          sexp_nil
      ]
    ]
    ;
    
    v_sexp: [[ v = V sexp "exp" -> v ]];
    
    atom: [[ i = LIDENT -> i | i = UIDENT -> i | i = INT -> i ]] ;
    
    sexp_eoi: [ [ x = sexp; EOI -> x ] ];
    

    This is an LL(1) grammar, interpreted by Camlp5, and the marker for antiquotations is "V". Full details of the grammar language can be found in the Camlp5 documentation. You can see in it, that we've used V sexp "exp" (renamed for convenience to "v_sexp") everywhere internally, and that Atom is parsed by V atom "atom" (again, giving an antiquotation position.).

  • 2. Generate a Hashconsed version of the AST type

    This is the input to the camlp5/pa_ppx_hashcons PPX rewriter. This rewriter implements the method of Jean-Christophe Filliatre and Sylvain Conchon, from their paper Type-Safe Modular Hash-Consing. In short, we specify a few module-names, equality and hash functions for external type-constructors (which necessarily cannot participate in hash-consing) and the type-signatures of memo-izers we wish generated. The rewriter generates efficient hash-constructors and hash/equality-functions: consult the paper for details.

    Here's the actual code:

    [%%import: Sexp.sexp]
    [@@deriving hashcons { hashconsed_module_name = HC
                         ; normal_module_name = OK
                         ; external_types = {
                             Ploc.vala = {
                               preeq = (fun f x y -> match (x,y) with
                                   (Ploc.VaAnt s1, Ploc.VaAnt s2) -> s1=s2
                                 | (Ploc.VaVal v1, Ploc.VaVal v2) -> f v1 v2
                                 )
                             ; prehash = (fun f x -> match x with
                                 Ploc.VaAnt s -> Hashtbl.hash s
                               | Ploc.VaVal v -> f v
                               )
                             }
                           }
                         ; pertype_customization = {
                             sexp = {
                               hashcons_constructor = sexp
                             }
                           }
                         }]
    

    The resulting OCaml module will contain two new modules: OK (which contains a copy of the original AST) and HC (which contains the hashconsed AST, as well as functions for hash-consing, memoizing, etc). The hashconsed AST type is as described in the previous section.

  • Generate functions back-and-forth between "normal" and "hashconsed" versions of the AST type.

    To generate functions back-and-forth between the two versions of the AST type, we use the pa_ppx_migrate PPX rewriter. Here is the input for generating the function from the "normal" (OK) to the "hashconsed" (HC) AST. The reverse direction isn't much different. Notice that we don't actually write any migration code, except for external types (vala). In much-more-complicated examples, the succinctness of this method over the actual code can be quite significant. It has been applied to the 10 versions of the OCaml AST (to generate something quasi-equivalent to ocaml-migrate-parsetree, and the succinctness gains there are significant.

    [%%import: Sexp_hashcons.OK.sexp]
    [@@deriving migrate
        { dispatch_type = dispatch_table_t
        ; dispatch_table_constructor = make_dt
        ; dispatchers = {
            migrate_vala = {
              srctype = [%typ: 'a Ploc.vala]
            ; dsttype = [%typ: 'b Ploc.vala]
            ; subs = [ ([%typ: 'a], [%typ: 'b]) ]
            ; code = _migrate_vala
            }
          ; migrate_sexp_node = {
              srctype = [%typ: sexp_node]
            ; dsttype = [%typ: Sexp_hashcons.HC.sexp_node]
            }
          ; migrate_sexp = {
              srctype = [%typ: sexp]
            ; dsttype = [%typ: Sexp_hashcons.HC.sexp]
            ; code = (fun __dt__ x ->
                Sexp_hashcons.HC.sexp (__dt__.migrate_sexp_node __dt__ x)
              )
            }
          }
        }
    ]
    
  • 4. Generate functions to map (parsed) values to OCaml AST expressions/patterns

    We use the pa_ppx_q_ast PPX rewriter, and invoke it twice: once with the "normal" AST type (Sexp.sexp) and once with the hashconsed type (Sexp_hashcons.HC.sexp). The two quotation-expanders are named, respectively, "sexp" and "hcsexp" (the names are chosen only for this presentation and are not significant).

    module Regular = struct
    type sexp = [%import: Sexp.sexp]
    [@@deriving q_ast { data_source_module = Sexp }]
    
    Quotation.add "sexp"
      (apply_entry Pa_sexp.sexp_eoi E.sexp P.sexp)
    end
    
    module Hashcons = struct
    
    [%%import: Sexp_hashcons.HC.sexp]
    [@@deriving q_ast {
        data_source_module = Sexp_hashcons.HC
      ; quotation_source_module = Sexp_migrate.FromHC
      ; hashconsed = true
      }]
    
    Quotation.add "hcsexp"
      (apply_entry Pa_sexp.sexp_hashcons_eoi E.sexp P.sexp)
    end
    
  • Putting it all together

    So: we start with an AST type and a parser, to which antiquotation markers have been added. We generate a hashconsed version of the AST type, and functions back-and-forth to the "normal" version of the type. Since we have a parser for the "normal" version, we now have a parser for the "hashconsed" version.

    The fancy bit is that Camlp5 has built-in machinery to map values of types in the OCaml AST of Camlp5 (which is a different recursive type, but more-or-less equivalent to the official OCaml AST) to expressions and patterns that correspond to those values. So the actual AST value corresponding to x + 1 can be mapped to either an expression (that when evaluated, produces that value) or a pattern (that matches that value). But since our AST type contains antiquotation markers, the AST value corresponding to $x$ + 1 can also be mapped to an expression/pattern, only this time, with OCaml variable x as the first argument to (+).

    The pa_ppx_q_ast PPX rewriter generalizes this and makes it possible to apply to any AST type (and also to apply to the OCaml AST type in Camlp5, so nothing has been lost).

  • Using the quotations

    And finally, we can use those quotations:

    let rec atoms = function
        <:sexp< () >> -> []
      | <:sexp< $atom:a$ >> -> [a]
      | <:sexp< ( () . $exp:cdr$ ) >> -> atoms cdr
      | <:sexp< ( $atom:a$ . $exp:cdr$ ) >> -> a::(atoms cdr)
      | <:sexp< ( ( $exp:caar$ . $exp:cdar$ ) . $exp:cdr$ ) >> ->
        atoms <:sexp< ( $exp:caar$ . ( $exp:cdar$ . $exp:cdr$ ) ) >>
    
    let rec atoms = function
        <:hcsexp< () >> -> []
      | <:hcsexp< $atom:a$ >> -> [a]
      | <:hcsexp< ( () . $exp:cdr$ ) >> -> atoms cdr
      | <:hcsexp< ( $atom:a$ . $exp:cdr$ ) >> -> a::(atoms cdr)
      | <:hcsexp< ( ( $exp:caar$ . $exp:cdar$ ) . $exp:cdr$ ) >> ->
        atoms <:hcsexp< ( $exp:caar$ . ( $exp:cdar$ . $exp:cdr$ ) ) >>
    

Discussion

I've also applied this same methodology to the entire OCaml AST in Camlp5 (for which there is a parser, as part of Camlp5) and verified that the quotations thus generated pass the same tests as the hand-implemented quotations provided as part of Camlp5.

The quotations of Camlp5 are substantial, and cover almost all of the OCaml language. I believe that this means it is possible to both provide full hash-consing support for a very complex AST type, and full quotation support both for the AST type, and for its automatically-generated hashconsed variant.

Appendix A: Parameter-parsing for PPX Rewriters

I've shown three different PPX rewriters (migrate, hashcons, and q_ast) and in some of their invocations, there are nontrivial OCaml expressions to supply as options. Writing the code that converts these options (OCaml expressions) into values of meaningful types (for the rewriter code) is unutterably boring, time-consuming, and error-prone: it is effectively a demarshalling problem. So I wrote a PPX rewriter, that automates this task, and in fact that is what is used to generate the demarshallers used in these PPX rewriters. Here is an example: the params PPX rewriter as used by pa_ppx_migrate to generate its options demarshaller:

type tyarg_t =
  { srctype : ctyp;
    dsttype : ctyp;
    raw_dstmodule : longid option
      [@name dstmodule];
    dstmodule : longid option
      [@computed longid_of_dstmodule dsttype raw_dstmodule];
    inherit_code : expr option;
    code : expr option;
    custom_branches_code : expr option;
    custom_branches : (lident, case_branch) alist
      [@computed extract_case_branches custom_branches_code];
    custom_fields_code : (lident, expr) alist [@default []];
    skip_fields : lident list [@default []];
    subs : (ctyp * ctyp) list [@default []];
    type_vars : string list
      [@computed compute_type_vars srctype dsttype subs];
    subs_types : ctyp list
      [@computed compute_subs_types loc subs]
}
[@@deriving params]

type default_dispatcher_t =
  { srcmod : longid;
    dstmod : longid;
    types : lident list;
    inherit_code : (lident, expr) alist[@default []]
  }
[@@deriving params]

type t =
  { optional : bool
  ; plugin_name : string
  ; inherit_type : ctyp option;
    dispatch_type_name : lident[@name dispatch_type];
    dispatch_table_constructor : lident;
    declared_dispatchers : (lident, Dispatch1.tyarg_t) alist
      [@default []] [@name dispatchers];
    default_dispatchers : default_dispatcher_t list[@default []];
    dispatchers : (lident, Dispatch1.tyarg_t) alist
      [@computed compute_dispatchers loc type_decls declared_dispatchers default_dispatchers];
    type_decls : (string * MLast.type_decl) list
      [@computed type_decls];
    pretty_rewrites : (string * Prettify.t) list
      [@computed Prettify.mk_from_type_decls type_decls] }
[@@deriving params {formal_args = {t = [type_decls]}}]

This generates a function params : (string * MLast.type_decl) list -> MLast.expr -> t that performs the entire demarshalling task. At a couple of points, we supply functions to handle custom demarshalling operations, but the vast majority of the code (and work) is handled automatically. This is liberating: it means that there is no cost to being precise in describing the data one needs as input, and no need to "encode" arguments into easy-to-parse form. A good comparison is with the "@with" syntax of the ppx_import PPX rewriter, where it's clear that they're shoe-horning types into expression syntax, for want of a nicer syntax that is still easily to manipulate.

Appendix B: Types with without vala

In the "Motivation" section, I introduced a type of sexp, and then the hashconsed version of the type. Neither type had vala markings in it. The mechanism described in this post can equally well work to produce quotation-expanders that work over types without vala. The only difficulty, is that the parser still needs to be defined over a version of the type with vala, because it will be used to parse quotations (which must necessarily contain antiquotations). But everything else works, and in camlp5/pa_ppx_q_ast/tests/{sexp_example,eg_sexp_example} you will find the definition and use of "sexpnovala", a quotation expander for the first sexp type we defined in this post. I didn't bother building the quotation expander for "hcsexpnovala", but it's a straightforward exercise.

In short, the choice of whether your AST type needs to have vala markings in it, is independent of hash-consing. You only need to supply a version of your AST type with vala, along with a parser, for the quotation machinery.

Appendix C: Is vala Necessary?

Consider that sexp type

type sexp =
    Atom of string
  | Cons of sexp * sexp
  | Nil

If we want to provide a ppx_metaquot-like facility for this type, perhaps we can overload the meaning of the strings in atoms. So the s-expression

( _A . foo )

that expands to

Cons(Atom "_A", Atom "foo")

could mean an s-expression with a single meta-variable, _A. But this meta-variable is necessarily a variable that can only denote an s-expression, and not a string. More generally, in an AST type if metavariables are merely overloaded variables, anyplace that a variable cannot appear, cannot be subject to meta-variable-based pattern-matching/substitution. So it's not possible to match on the list of types or expressions in a tuple type/expression, nor the list of branches in a match-with. And on and on. This is why the version of the sexp type with antiquotation markers

type sexp =
    Atom of (string vala)
  | Cons of (sexp vala) * (sexp vala)
  | Nil

includes Atom of (string vala). This allows us to write a pattern with a metavariable that matches an s-expression, e.g.

Cons (VaVal x, VaVal (Atom (VaVal "foo")))

(in quotation syntax, <:sexp< ( $exp:x$ . bar ) >>) or a metavariable that matches the string in an Atom, e.g.

Cons
  (VaVal (Atom (VaVal x)),
   VaVal (Atom (VaVal "foo")))

And this is a general issue in all (meta-)quotation support, not merely for the OCaml AST. If we want high-quality quotation support for our data-types, we need to build high-quality suppot for antiquotations.

Official shutdown date of forge.ocamlcore.org is 2020-10-18

Sylvain Le Gall announced

The website forge.ocamlcore.org will be shut down on 2020-10-18 12:00 UTC+2. I took the last official snapshot of the website 1 month ago. I have prepared dumps for all of the projects and can make them available to the admin of the projects. The dumps contain all the changes since the last snapshot (2020-09-15). Any changes in between the last snapshot and the shutdown, will be discarded.

This is the final step of the official announcement deprecation from 2017[1]. I have ensured that the activity of the website has dropped to almost zero since then. Most of the projects are now hosted elsewhere. The search engine data indicate that the traffic has dropped significantly enough since I have asked to de-index the whole website.

The design of the static website is inspired by the one for http://luaforge.net

[1]: https://forge.ocamlcore.org/forum/forum.php?forum_id=958

What are the actions members can do now?

WARNING: Some web browsers like Chrome will claim that the temporary site name is misleading and too close to forge.ocamlcore.org and it is a potential phishing attempt, but this should be safe.

Here are the actions you can take now:

  • Project admins can get back the dump of project data (see the FAQ)
  • Project admins can send PR to indicate the new project homepage (see the FAQ)
  • Members can send PR to indicate their new homepage (see the the FAQ)

The whole process is still a bit undefined, because I need proof that the people sending PR were admins of the project. I still need to figure out a good way to get these proofs, so I will experiment a bit at the beginning (see the FAQ for the current state).

You can already check what the website will look like on https://forge-ocamlcore-org.netlify.app/

What will happen next?

2020-10-11 -> 2020-10-18:

  • Members can still browse the website and recover what will not be part of the project dump.

Any changes to the website made during this time, will be discarded.

2020-10-18:

  • I will archive a whole copy of the website VM (encrypted)
  • https://forge.ocamlcore.org will become a static website only containing “anonymous” data (see prototype)
  • All file downloads will use https://downloads.ocamlcore.org (this is already the case)
  • All mailing lists will be shut down
  • Source control will not be accessible anymore (this is already the case)
  • All domains *.forge.ocamlcore.org will point to the matching project page or to the FAQ.

2021-04-18:

  • Last time for admins to get back the dump of project data (see the FAQ)
  • Last time for members and admins to update their page on the static website
  • I will destroy the copy of the website VM

Past this point, I will still have the dumps of the project data and will still accept PR for the Github repository, but this will be a “best effort” SLO.

Old CWN

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.