OCaml Weekly News

Previous Week Up Next Week

Hello

Here is the latest OCaml Weekly News, for the week of May 02 to 09, 2023.

Table of Contents

Overview of libraries for showing OCaml values

Anton Bachin said

In light of the recent thread on, among other things, showing OCaml values, and because of Dream’s long-standing need for this to exist in OCaml, I’ve done, as suggested, a comparison of the available libraries. They seem to fall into three categories.

  1. Libraries that walk the runtime representation of values and dump it.

    These provide a 'a -> string functions that can immediately print any value, and are the easiest to use. They are what I need in Dream – something that a normal person can use for debugging without any boilerplate. The accuracy of these libraries is limited because the type information preserved in the runtime is very bare.

    These are:

    1. reason-native’s Console.log.
    2. Dum
    3. Inspect

    There is also Memgraph, but this appears to only output DOT graphs, so I didn’t look into it in detail.

    Since I am interested in these for my purposes, I wrote a tester that compares their outputs, and uploaded it into a gist. See the outputs for Console.log, Dum, Inspect. They are all variants of each other. Each has its own quirks and bugs, but they all look roughly like this:

        > Console.log "abc"
        "abc"
        > Console.log 42
        42
        > Console.log {a = "abc"; b = 42}
        {"abc"; 42}
    

    Interestingly (but predictably), for extensible variants like exn, these dumpers are able to print the string representation of variant constructors even with OCaml’s current runtime.

  2. Libraries, such as ppx_deriving, that have a PPX generate, or the user manually provide, information about types – that is, provide helper values that describe types, and then ask the user to provide that information to walk values and dump them.

    These are unsuitable for my goals. All of these require the user to pass in the type information to the printing function at each call site, because in the absence of modular implicits or type classes, the compiler cannot automatically associate the type information with the values. They provide, roughly, ('a -> string) -> 'a -> string functions. The user has to provide the 'a -> string for each 'a each time they would like to print an 'a.

    If 'a is a “container” type, the required function is a higher-order function that needs additional function(s) for the element types. This is not ergonomic, as each call site where one would like to show a value needs boilerplate. Even if the 'a -> string is precomposed, it requires the user to remember what it is called, pick the right one, and is not resistant to various refactorings such as wrapping in option. But such libraries are accurate, because the type information provided in the boilerplate can be precise.

    These are:

    1. ppx_deriving’s [@@deriving show].
    2. refl, a ppx_deriving-like.
    3. lrt, another ppx_deriving-like.
    4. tpf, again a ppx_deriving-like.
    5. typerep, probably a ppx_deriving-like with ppx_typerep_conv.
    6. repr, which appears to have the user build the type representation manually from combinators and then also pass it where need it.
    7. data-encoding, also fully manual.
    8. cmon, fully manual.
    9. dyn in Dune. Appears to also be fully manual.

    I didn’t try these out in detail because they are all unsuitable as Console.log-alikes for inspecting OCaml values without excessive boilerplate. As can be seen from their documentation, they all require boilerplate in the form of functions/witnesses/calling the right function, depending on the type, at the place where you’d like to show a value. The last four also require the user to manually build up the type representation using combinators.

        M.show_myfpclass FP_normal       (* ppx_deriving: know the function. *)
    
        Refl.show [%refl: (string * int) list] [] ["a", 1; "b", 2];;
                                         (* refl: describe the type. *)
        Print.show ~t:nat_t (S (S (S Z)))
                                         (* lrt: provide the type info. *)
        (* The docs for tpf are too obscure, but it's the same kind of library. *)
    
        Fmt.str "%a\n" (pp t) { foo = None; bar = [ "foo" ] }
                                         (* repr: build & provide the type info. *)
    

    etc. These kinds of approaches are probably also present in other libraries that do e.g. JSON encoding.

  3. Libraries that use a PPX at the call site to provide what looks like an 'a -> string function as in (1), but try to infer the type of the value being shown and derive its printer as in (2).

    These don’t seem to handle separate compilation well, as could be expected, and generally appear fragile.

    These are:

    1. Genprint
    2. OCaml@p

In my opinion, for the needs I see, the best approach would be runtime printing as in (1) with runtime type information that is accessible through pointers or indices stored in OCaml blocks. I wonder if this is what the LexiFi fork does. @nojb?

For those interested, we did the main part of the comparison on stream.

Nicolas Ojeda Bar replied

I wonder if this is what the LexiFi fork does. @nojb?

Not exactly.

We don’t attach type information to values directly, as we don’t want to modify the runtime model of OCaml (also, this would only work with heap-allocated values, which would all become larger).

What we do instead is that when a function has a labeled, non-optional argument of type 'a ttype (here 'a ttype is the “type of types” with constructors corresponding to each kind of type in OCaml) and the argument is not passed explicitly, then the compiler synthetizes it at each callsite.

Concretely, if we define a function of the form

let show ~(t: 'a ttype) (x: 'a) : string =
  match t with
  | Int -> string_of_int x
  | String -> x
  | ...

And we call it with show 42, the compiler inserts ~t:Int as first argument. For efficiency, type witnesses (the values of type 'a ttype) are actually computed at compilation-time whenever possible.

Makes sense?

Later on, Nicolas Ojeda Bar said

I see there is a lot of enthusiasm for adding some form of type reflection to OCaml; that’s great! It is true that at LexiFi we have a tried-and-tested system in use for a long time. Let me try to give some perspective about it and answer some of the questions that came up:

  • The LexiFi patch actually consists of two parts: 1) the representation of types as an OCaml datatype; 2) a patch to the typechecker/middle end to have the compiler automatically generate type witnesses (as sketched above).
  • It is important to note that 1) is to an extent independent of 2); it comes down to giving a suitable definition of the “type of types”. I understand from past discussion that proposals in this direction would be welcome by the OCaml dev team. Accordingly, one should concentrate for the most part in 1) to make progress.
  • For historical reasons the LexiFi version of 1) (ie the type representation, see here and here) has a number of quirks. Furthermore, it makes design choices that may not be the best ones in general. For example, it only represents closed types: no type constructors or type variables can be represented, and so in particular neither can exotic types such as GADTs, first-class modules, extensible types, polymorphic variants, etc.
  • The main challenge in devising a suitable representation of types is deciding how to handle abstract types (see the paper and the slides I linked to in the other thread). At LexiFi abstract types are represented via “global names” (ie we identify an abstract type M.t by its name "M.t"). This works reasonably well in practice, but is not a good solution in general (the notion of “name” for an abstract type is not well-defined). I suspect the answer may be something of a research problem…
  • LexiFi did discuss upstreaming a version of its fork long time ago (~2011), but I suspect it wasn’t done mainly because of the theoretical shortcomings of the current implementation (eg handling of abstract types).
  • Accordingly, the LexiFi fork is not open-source: we don’t have the manpower to support it as an open-source project, we don’t want to release a version of this technology which has known limitations that make it easy to shot yourself in the foot if you don’t know what you are doing, and finally there are some commercial considerations to take into account (but my impression is that if this technology was polished enough that it could be accepted upstream, LexiFi would be happy to do so).
  • Personally, from a distance, https://github.com/thierry-martinez/refl looks rather interesting, but its type representation is quite complex and is not clear it can be made suitable for “practical” use.

I hope this answers some of the questions!

kcas and kcas_data 0.3.0: Software Transactional Memory

Vesa Karvonen announced

I’m happy to announce that, as of version 0.3.0, kcas can be considered to be a software transactional memory (STM) implementation based on lock-free multi-word compare-and-set (MCAS).

The main feature added in 0.3.0 is the ability to block — in a scheduler friendly manner — awaiting for changes to shared memory locations.

Let’s explore this by writing a short example with the help of MDX. (Yes, I’m actually testing this announcement.)

First we’ll require and open the kcas_data library:

# #require "kcas_data"
# open Kcas_data

kcas_data gives us a number of domain safe and composable data structures and communication and synchronization primitives, such as a Queue and a Promise, for concurrent programming.

Let’s then create a message queue:

# let greeter_queue = Queue.create ()
val greeter_queue : '_weak1 Kcas_data.Queue.t = <abstr>

And spawn a “greeter” domain that responds to messages with a greeting:

# let greeter_domain = Domain.spawn @@ fun () ->
    let rec loop () =
      match Queue.take_blocking greeter_queue with
      | `Close -> ()
      | `Greet (target, resolver) ->
        Promise.resolve resolver (Printf.sprintf "Hello, %s!" target);
        loop ()
    in
    loop ()
val greeter_domain : unit Domain.t = <abstr>

Let’s also create a helper function, greet, to interact with the greeter:

# let greet target =
    let promise, resolver = Promise.create () in
    Queue.add (`Greet (target, resolver)) greeter_queue;
    Promise.await promise
val greet : string -> string = <fun>

Now we can call the greeter, which is running in another domain, as if it was a regular function. So, here is to you:

# greet "fellow concurrent programmer"
- : string = "Hello, fellow concurrent programmer!"

And everyone else:

# greet "the rest of the world"
- : string = "Hello, the rest of the world!"

Let’s not forget to clean up:

# Queue.add `Close greeter_queue
- : unit = ()
# Domain.join greeter_domain
- : unit = ()

The blocking mechanism in kcas does not only work with plain domains and systhreads. It can also work across schedulers such as Eio and Domainslib, which both recently merged support for it and should soon have the necessary support out-of-the-box.

Finally, one might ask what is the cost of all this?

It turns out that after some careful optimizations, kcas performs pretty much as well as it used to. As a random data point, at the time of writing this, the Queue provided by the latest version of kcas_data can actually be faster than the Michael-Scott queue implementation from the latest version of lockfree:

Kcas_data.Queue : mean = 0.005985, sd = 0.000001 tp=50121937.812194
Lockfree.MSQueue: mean = 0.013976, sd = 0.000001 tp=21465000.358236

Don’t be fooled, however. It is clear that the composability of kcas adds overhead — probably something generally between 1x to 4x in time and space — compared to non-composable lock-free data structures using plain ~Atomic~s and it has already been demonstrated that a much faster version of the Michael-Scott queue could be implemented.

Nevertheless, the take home message is that STM could very well be fast enough for your application. The extra nanoseconds are probably not going to be the main bottlenecks in most concurrent programs.

Sid Kshatriya asked and Vesa Karvonen replied

Thanks for the readable example! Also very interesting to read blocking transactions portion especially of https://github.com/ocaml-multicore/kcas/#blocking-transactions !

Can you explain how this is integrated with Eio briefly – I know you have added a link but the discussion there is quite detailed…

It is pretty simple.

DLA (domain-local-await) basically stores, in the DLS, a domain (or systhread) specific function that implements the blocking mechanism. Inside a scheduler like Eio (and Domainslib and pretty much any imaginable scheduler) there is a loop (running in each domain managed by the scheduler) that takes ready fibers from a queue or some other collection and runs them on the domain. Just before entering that loop, Eio installs an Eio specific implementation of the blocking mechanism for DLA. That Eio specific implementation of blocking uses the algebraic effects (and cancellation protocol) that Eio normally uses for blocking. The support for Domainslib works the same way — before domainslib enters the loop running ready Domainslib tasks, a Domainslib specific blocking implementation is installed for DLA.

A single program can have multiple different domains running different schedulers. A library like kcas, that just wants to be able to block, can then obtain the blocking implementation from DLA without directly depending on the scheduler.

Most people should not need to know anything about DLA. It should be considered an internal implementation detail and, in the future, we might use some other standard blocking mechanism. An advantage of DLA is that it can be made to work today without changes to the runtime or Stdlib. DLA is also relatively non-intrusive. It doesn’t require making extensive changes to a scheduler and installing the support is essentially free — it just takes a few words of memory per domain. DLA should also be future proof such that once a standard blocking approach emerges, it should be possible to change the default DLA implementation to use the standard blocking mechanism.

OCaml.org Newsletter: March 2023

Thibaut Mattio announced

Welcome to the inaugural edition of the OCaml.org newsletter!

Following the example of the now-retired Multicore monthlies, and the Compiler newsletter, we’ll be running a monthly newsletter on the progress we’re making on the development of OCaml.org.

This newsletter has been compiled by @sabine and @tmattio and offers a recap’ of the work we’ve been doing on OCaml.org in March.

We highlight the work we’ve been doing in three distinct areas:

  • Package Documentation: Following user feedback, in the past months, we’ve been focusing on improving the package documentation area. It started earlier this year with the team running a survey and user interviews, and we’re nearing the end of the improvements.
  • Learn Area: As a next milestone after improving the package documentation, we started work on the learn area with the aim to improve the learning experience of new OCaml users and offer new documentation resources to both beginners and experienced developers.
  • General Maintenance: We also worked on general maintenance and improvements, and we’ll highlight some of them.

Many thanks to all of the community members who contributed by participating in surveys, giving feedback on Discuss, and opening issues and Pull Requests! Your contributions and feedback enable us to make progress on making OCaml.org the best resource to learn OCaml and discover OCaml packages!

Package Documentation

When we started to work on the package documentation navigation, we reached out to the community on the OCaml Discuss forums with a survey on the Package and Learn areas on OCaml.org. The goal behind this was to enable our new team member, a UX/UI designer, to quickly get up to speed and make impactful contributions to OCaml.org. Thanks to the active participation of the community, this turned out to be a highly effective method to identify the most impactful issues to work on.

This month, we completed personas representing different types of users which includes mid-level developer, student, team lead, senior developer, academic instructor/researcher.

We designed the UI and user flows for two possible design options of the package section (which includes the package overview page, package documentation, the package search results, as well as an upcoming page that lists all versions of a package). You can access the designs on Figma.

We’ve been making good progress on a (low-fidelity) implementation of the designs we have in Figma, and we still have a few UI elements to rework to align the site to the designs.

Relevant PRs/Issues:

  1. We now display README/CHANGELOG/LICENSE on the package overview layout, instead of within the documentation layout. This better reflects their status as “files that accompany the package”.
  2. The source code download button and source hash display was reworked to have a better UX.
  3. The package overview page was rearranged. This improves the styling and placement of dependencies, tags, description, publication date.
  4. The authors/maintainers display was improved to (1) render an automatically-generated avatar if we don’t have one for the given user, and (2) hide excessive amounts of authors/maintainers behind a “show more” button.
  5. To make it easier to scan for relevant dependencies, we separate the dependencies into “development dependencies” and regular dependencies
  6. After moving the package overview sidebar to the left, the package overview page and the package documentation page were unified to use the same layout
  7. We now render a table of contents on the package overview pages

Learn Area

We started the discovery phase in which we are taking inventory of the current content and structure of the OCaml.org Learn area. We reviewed the user interview videos from the Q1 survey on the Learn and Package areas to extract user needs and pain points. We also started preparing a survey that specifically targets new OCaml users (both programming beginners and experienced developers). At the time we publish this newsletter, we’ve already completed the survey and we’ll be sharing the results in the next issue of this newsletter.

Improving the Learn Area will be our biggest focus in the coming months, so expect more updates on this in the following newsletters.

General Maintenance

Creating a tutorial on sequences

Cuihtlauac Alvarado announced

Following up on ocaml.org tutorial updates, I’ve created one on sequences:

Again, feedback is highly welcome

You started to learn OCaml less than 12 months ago? Please help us with our user survey on the OCaml.org Learning Area

Sabine Schmaltz announced

Here’s the promised update on what kind of feedback we got out of the survey.

We distributed the survey (1) here on the OCaml Discuss, and (2) on the OCaml Discord server. The survey was also shared via (3) LinkedIn, (4) Twitter, and possibly more channels. We asked newcomers to OCaml to participate.

57 people responded to the survey and we had to close the survey early so that we have the capacity to properly analyze and categorize all feedback.

Invites have been sent out by Claire to the participants who volunteered to be interviewed.

We have found last time that the interviews helped us understand the status quo and the potential improvements from the community’s varied perspectives much better than we did only from the survey. I feel this has been a critical factor to enable us to change things for the better.

So thank you for taking the time to help us! :camel:

Editor’s note: this post is very long, please follow the link above to read it.

Explorations on Package Management in Dune

Thibaut Mattio announced

The OCaml Platform team is excited to announce that we have started explorations to add support for package management in Dune.

This joint work led by the Dune, opam and opam-monorepo teams, started a few months ago with discussions on how to address what is one of the most important pain points reported by the community. This is the continuation of the focus we had in 2022 on prototyping new developer workflows and we’ll explore new workflows this integration enables throughout 2023. In particular, we’ve been developing similar workflows with opam-monorepo and we’re building on our experience of the past 3 years to explore how to implement these workflows in Dune.

As we want to involve the community as much as possible and start gathering feedback early, we wrote an RFC on GitHub that lays down a high-level overview of the different features.

The project is in the prototyping stage, so the goal of the RFC is to invite feedback and discussion from the community, rather than serve as a definitive spec of package management in Dune. We will be opening separate RFCs for the different parts as we continue our explorations.

If you want to follow our work, you can look at issues and Pull Requests tagged as `package-management` in Dune. You can also have a look at the dev meeting minutes for Dune and opam.

It’s the beginning of quite a bit project, with many people involved, so I want to take a moment to acknowledge the contributions of everyone. The initiative is spearheaded by Tarides and the Dune, opam and opam-monorepo development teams are leading the project. We are also grateful for the generous funding provided by Jane Street, whose support is instrumental to continuously improve the OCaml Platform.

Happy coding!

Functional web applications running in the browser

Continuing this thread, Helmut announced

Sounds compelling :slight_smile:

If you have the time, it’d be great if you could implement the 7 GUIs tasks to serve as example code to your lib.

https://eugenkiss.github.io/7guis/

I find this benchmark quite good: small enough in scope not to be a time sink, but complex enough to make one understand the limitations or advantages of a language or library

I have implemented some of the examples:

The last two examples of the benchmark are still missing. You can find the sources at: https://github.com/hbr/fmlib/tree/browser/src/examples/browser

Ahrefs is now built with Melange

Javier Chávarri announced

Since last September, the Melange, Dune, and Ahrefs development teams have been working to enhance the integration between Dune and Melange. As a company that uses a lot of OCaml in the backend, Ahrefs saw an opportunity to bring its frontend stack closer to OCaml by using Melange while still integrating with the JavaScript ecosystem of UI libraries. Thus, the company decided to invest and actively participate to make this integration happen.

I am happy to announce we achieved a significant milestone in this integration process: we transitioned all Ahrefs frontend projects to use Melange. We have explained this transition in detail in a blog post:

https://tech.ahrefs.com/ahrefs-is-now-built-with-melange-b14f5ec56df4

Regarding the current state of Melange, it’s worth noting that our focus thus far has been on designing and implementing the Dune-Melange integration and applying it within Ahrefs. The goal has been to demonstrate that the toolchain can scale and be used in mid-large codebases, and the result has been successful so far. The process has been beneficial not only for Melange but also for Dune itself, as we were able to identify and address some performance issues, including a significant performance fix that made some build commands nearly 10 times faster in our case.

While we’ve made significant progress with the Dune-Melange integration, we recognize that there is still work to be done to improve the documentation and developer experience. Currently, Melange lacks a dedicated documentation site, and the latest functionality isn’t yet available in published versions of Dune and Melange on the opam repository.

We’re actively working to address this, but in the meantime, we invite those who are adventurous to explore the melange-opam-template and review the newly added melange.emit stanza documentation found in the latest version of Dune’s documentation. If you have any questions, encounter any issues, or otherwise want to participate in any way, we invite you to join the #melange channel in the Reason Discord.

Thank you for taking the time to read about our progress with the Dune-Melange integration. We hope you share our excitement about this project!

Other OCaml News

From the ocaml.org blog

Old CWN

If you happen to miss a CWN, you can send me a message and I’ll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.