OCaml Weekly News

Previous Week Up Next Week

Hello

Here is the latest OCaml Weekly News, for the week of November 15 to 22, 2022.

Table of Contents

cohttp 6.0.0~alpha0 released

Marcello Seri announced

I’d like to announce the release of the first alpha version of the new cohttp. If you are not familiar with it, cohttp is an OCaml library for creating HTTP clients and servers. It has a portable HTTP parser, and implementations using various asynchronous programming libraries.

This experimental release introduces a number of changes in cohttp’s internals and in the libraries provided. In particular:

  • the new http library provides essential type definitions used in cohttp and also includes a new, extremely fast, http parser. The library is designed to have practically no dependencies and make it easy for other packages to easily interoperate with cohttp.
  • cohttp-server-lwt-unix is a new lwt server backend that does not rely on conduit and uses direct lwt-io access. This should resolve most of the latency issues encountered with the old cohttp-lwt-unix library.
  • cohttp-curl is a new libcurl-based (via ocurl) library, also providing an lwt and an async backend.
  • cohttp-eio uses eio to leverage new features from multicore ocaml, it also include new very fast parser to fully benefit from eio at all levels.

All of this comes with a large number of fixed bugs, improved testing, new long-awaited features (like the http cache for cohttp-lwt-unix) and lots of internal refactoring and cleanup.

We would like to thank @rgrinberg @mefyl @anuragsoni @BikalGurung and all the people that contributed with issues, discussions, PRs and design ideas leading to this release.

LINKS

CHANGES

BREAKING CHANGES

Constant string pattern matching

Ian asked

Let’s say I write something like this:

match lexeme with
| "foo00" -> Foo00
| "foo01" -> Foo01
  ...
| "foo50" -> Foo50
| _ -> FooError

(My real keywords are not uniform like this.) Does this compile to the obvious straight linear search, or is there any cleverness (like a constant hidden hashtable perhaps)? Or should I have a hashtable myself?

gasche replied

The compiler explorer shows assembly output, but often other intermdiate representations are more readable (to me at least). The example of @copy is

let test x =
    match x with
    | "barfoofoo00" -> 1
    | "barfoofoo01" -> 2
    | "barfoofoo02" -> 3
    | "barfoofoo03" -> 4
    | "barfoofoo04" -> 5
    | "barfoofoo05" -> 6
    | "barfoofoo06" -> 7
    | "barfoofoo07" -> 8
    | "barfoofoo08" -> 9
    | "barfoofoo09" -> 10
    | "barfoofoo10" -> 11
    | "barfoofoo11" -> 12
    | "barfoofoo12" -> 13
    | "barfoofoo13" -> 14
    | "barfoofoo14" -> 15
    | "barfoofoo15" -> 16
    | "barfoofoo16" -> 17
    | "barfoofoo17" -> 18
    | "barfoofoo18" -> 19
    | "barfoofoo19" -> 20
    | _ -> -42

and the -dcmm output is as follows: (remember that integers are tagged in this representation, so for example 39, 2*19+1, is the encoding of 19 in the source.)

(function camlTest__test_267 (x/269: val)
 (catch
   (let size/274 (>>u (load_mut int (+a x/269 -8)) 10)
     (if (!= size/274 2) (exit 1)
       (let cell/272 (load_mut int (+a x/269 0))
         (if (== cell/272 8027225910085312866)
           (let cell/273 (load_mut int (+a x/269 8))
             (if (< cell/273 288230376155197551)
               (if (< cell/273 288230376155001199)
                 (if (< cell/273 288230376154935407)
                   (if (== cell/273 288230376154869871) 3
                     (if (== cell/273 288230376154870127) 23 (exit 1)))
                   (if (== cell/273 288230376154935407) 5
                     (if (== cell/273 288230376154935663) 25
                       (if (== cell/273 288230376155000943) 7 (exit 1)))))
                 (if (< cell/273 288230376155066735)
                   (if (== cell/273 288230376155001199) 27
                     (if (== cell/273 288230376155066479) 9 (exit 1)))
                   (if (== cell/273 288230376155066735) 29
                     (if (== cell/273 288230376155132015) 11
                       (if (== cell/273 288230376155132271) 31 (exit 1))))))
               (if (< cell/273 288230376155328879)
                 (if (< cell/273 288230376155263087)
                   (if (== cell/273 288230376155197551) 13
                     (if (== cell/273 288230376155197807) 33 (exit 1)))
                   (if (== cell/273 288230376155263087) 15
                     (if (== cell/273 288230376155263343) 35
                       (if (== cell/273 288230376155328623) 17 (exit 1)))))
                 (if (< cell/273 288230376155394415)
                   (if (== cell/273 288230376155328879) 37
                     (if (== cell/273 288230376155394159) 19 (exit 1)))
                   (if (== cell/273 288230376155394415) 39
                     (if (== cell/273 288230376155459695) 21
                       (if (== cell/273 288230376155459951) 41 (exit 1))))))))
           (exit 1)))))
 with(1) -83))

alan then added

@gasche has a comprehensive comment on how OCaml compiles pattern matching on r/ProgrammingLanguages.

gasche then said

Indeed! Thanks for the mention. Reposting below.

The strategy used by the OCaml compiler is not naive, but not hashing either. It will compile a switch on strings (with a default case) into a tree of switches on characters (in fact words, see below), by reading characters at specific positions in the string. It was implemented by Benoît Vaugon and Luc Maranget.

The position to read is selected as the least discriminating position: we count the number of different values (among the string patterns) for each position, and choose the (leftmost) position with the smallest number of possible values. The idea is that these positions must be tested anyway to eliminate the default case, so you may as well check them first.

For example if you are matching on “aa” vs “ab” (or something else, the default case), and you know that your input has size 2, you may generate either:

    switch s[0]:
      case 'a':
        switch s[1]:
          case 'a': goto <case "aa">
          case 'b': goto <case "ab">
          default: goto <default case>
      default: goto <default case>

or

    switch s[1]:
      case 'a':
        switch s[0]:
          case 'a': goto <case "aa">
          default: goto <default case>
      case 'b':
        switch s[0]:
          case 'a': goto <case "ab">
          default: goto <default case>
      default: goto <default case>

and the first approach is more interesting, it performs the same tests for each non-default value with shorter code.

This is counter-intuitive: check the least discriminating position first. It is more intuitive at first to check the most discriminating position.

Regarding size: you start by switching on the size of the string, then you are left with groups of the same size, where the same positions are valid.

Regarding characters: instead of looking up strings characters by characters, you can read whole machine word at once, except at the very end of the string. In the case of {"one", "two", "three"}, with the 0-ended representation of C, a single 32bit read is enough to distinguish the three values, so you need a single position switch (after switching on the size first).

Dune’s Style Guide

Rudi Grinberg announced

I’ve recently added contribution guidelines to dune’s code base to help onboard potential contributors to dune. It’s been suggested that it would be useful to share them with the wider community so this is the purpose of this post.

Many guidelines are taken straight from Jane Street’s internal rules which shouldn’t be a surprise as the project was started by a JST employee. The intent of sharing these is not to evangelize our coding style to other projects, but perhaps inspire some of you to pick and choose the rules you like and maybe share some of your own.

Jane Street, compiler development, and open-source

Yaron Minsky said

@reisenberg wrote a summary of our recent efforts on the compiler, and @sid asked a question about how this impacts Jane Street’s publicly released software:

How would the newer versions of the libraries that progressively use more and more of custom language be made available to the OCaml community?

This is a good question, given that it may well take years to upstream some of our compiler changes, and some of them might not be accepted by upstream at all.

In some sense, we’re in this world already. We’ve already made changes to Base, our standard library, to use the local mode in order to better support stack allocation. And this has already made it in to our public release, as you can see if you look for the [@local] annotations in the code below.

https://github.com/janestreet/base/blob/master/src/list.mli#L208

Our strategy here was to use the new feature in a way that doesn’t break the syntax, so it could be smoothly included in our public code. This works for modes because modes can be erased without changing the semantics of the code.

This approach isn’t always possible. Consider the include functor syntax that was mentioned in @reisenberg’s post:

module List = struct
  type 'a t =
  | Nil
  | Cons of 'a * 'a t

  let mapi t ~f = ...

  include functor Make_map
end

This can’t just be desugared away, as it happens, and so we simply block ourselves from using this feature within publicly released code.

There’s an obvious tension here, since blocking ourselves from using new features makes it more painful to open source things. Part of our intent here is to prioritize the upstreaming of things that cause these problems. include functor is an example of a feature that’s pretty small, and we think is generally quite useful, so we intend to propose it upstream soon.

I suspect unboxed types will be more like include functor than like modes, in that it’s going to be hard to use it in a way that’s compatible with the public release. We haven’t yet really worked through what the tradeoffs will be there.

In any case, our public release code is important to us, and we’re going to continue to release new versions. We value the fact that it’s helpful to the community, but it’s also more directly valuable to us. In particular, it makes it easier for us to reuse work done by other people, since people spend the time and effort to build libraries that interoperate well with our code (notably, with Async). And also, our public release helps people take into account our usage patterns when working on the compiler or other important community tools like Merlin or OCamlformat.

Dune 3.6.0

Etienne Millon announced

Dear dune users, It is my pleasure to announce that dune 3.6.0 is now available on opam :tada:. Here’s the changelog - I reused the same classification as in the previous announce for dune 3.5.0. Thanks again to all the contributors including bug reporters.

dune executable

This lists features of the “dune” executable itself. Upgrading dune will bring in these changes. We consider these changes safe, but it is difficult to define what a breaking change is for a command-line tool (for example, some error messages change). It is important to note that just upgrading the dune executable is not supposed to change how dune interprets existing projects. If just upgrading dune breaks compilation, it is a bug in dune, please report it!

  • Added
    • Introduce a $ dune ocaml top-module subcommand to load modules directly without sealing them behind the signature. (#5940, @rgrinberg)
    • Revive $ dune external-lib-deps under $ dune describe external-lib-deps. (#6045, @moyodiallo)
    • Extend the promotion CLI to a dune promotion group: dune promote is moved to dune promotion apply (the former still works) and the new dune promotion diff command can be used to just display the promotion without applying it. (#6160, fixes #5368, @emillon)
    • Build progress status now shows number of failed jobs (#6242, @Alizter)
    • Allow promoting into source directories specified by subdir (#6404, fixes #3502, @rgrinberg)
    • Support CLICOLOR and CLICOLOR_FORCE to enable/disable/force ANSI colors. (#6340, fixes #6323, @MisterDA).
    • Create a fake socket file _build/.rpc/dune on windows to allow rpc clients to connect using the build directory. (#6329, @rgrinberg)
  • Fixed
    • Forbid multiple instances of dune running concurrently in the same workspace. (#6360, fixes #236, @rgrinberg)
    • Make dune describe workspace return the correct root path (#6380, fixes #6379, @esope)
    • Fix running inline tests in bytecode mode (#5622, fixes #5515, @dariusf)

(lang dune 3.6)

This lists changes if you opt into the new (lang dune 3.6) version in your dune-project file. For this too, these are changes that we consider safe, but they can require changes to your dune files. For example, sandboxing is enabled in more places, which means that you might have to be more precise in expressing your dependencies. Please reach out on the issue tracker if you have trouble fixing your dune file or if something does not seem to be possible anymore.

  • Added
    • Add (glob_files <glob>) and (glob_files_rec <glob>) terms to the files field of the install stanza (#6250, closes #6018, @gridbugs)
    • Allow Byte_complete binaries to be installable (#4837, @AltGr, @rgrinberg)
    • Allow :standard in the (modules) field of the coq.pp stanza (#6229, fixes #2414, @Alizter)
  • Fixed
    • Allow absolute build directories to find public executables. For example, those specified with (deps %{bin:...}) (#6326, @anmonteiro)
    • [ctypes] do not mangle user written names in the ctypes stanza (#6374, fixes #5561, @rgrinberg)
    • Forbid private libraries with (package ..) set from depending on private libraries that don’t belong to a package (#6385, fixes #6153, @rgrinberg)
    • [ctypes] always re-run pkg-config because we aren’t tracking its external dependencies (#6052, @rgrinberg)
    • [ctypes] remove dependency on configurator in the generated rules (#6052, @rgrinberg)
    • Fix passing of flags to dune coq top (#6369, fixes #6366, @Alizter)
    • Prevent crash if absolute paths are used in the install stanza and in recursive globs. These cases now result in a user error. (#6331, @gridbugs)

My learnings about monads and state monads

David Wong announced

I wrote about monads recently here: https://cryptologie.net/article/578/simple-introduction-to-monads-in-ocaml/

and state monads here: https://cryptologie.net/article/581/state-monads-in-ocaml/

thought this might be useful to someone trying to learn these as well

hyphenrf then said

I see you do toplevel-open in your code, which is generally only good for namespaced modules not identifiers. A more idiomatic way to access definitions is to refer to them directly by their fully-qualified names, or do a local-open.

open Mod

let f x = a ...
let g x = b ...
let h x = c ...

becomes

let f x = Mod.a ...
let g x =
  let open Mod in
  b ...
let h x = Mod.(c ...)

so for example:

let res = Monad.(
  let* a = Some 5 in
  let* b = Some 6 in
  return (a + b)
)

Confero 0.1.1

“Petter A. Urkedal announced

Confero implements the Unicode Collation Algorithm (UCA), currently built for Unicode 15.0.0. It also provides the Default Unicode Collation Element Table (DUCET), which implements a language-agnostic collation order.

For most use-cases, it should suffice to link with confero and confero.ducet and use the single entry point Confero.collate. For a drop-in replacement for String.compare, pass ~total:true, otherwise it will disagree with (=) due to normalization. If you don’t link with confero.ducet, the default collation will be based on Unicode codepoints. The API allows you to take more control of which collation mapping is used, and to evaluate separate stages of the UCA, if needed.

I haven’t looked into localizing collation, but it should be possible to create a custom mapping which calls the DUCET mapping as a fall-back. Note, however, that the collation elements are not stable across Unicode versions. CLDR should of interest to those who want to look into this.

The API documentation is not online yet, but I’ll post a link when it gets indexed on ocaml.org.

B·o·B, an universal and secure peer-to-peer file-transfer in OCaml

Calascibetta Romain announced

I am very pleased to announce the experimental distribution of Bob available on this website.

Bob is an OCaml-based file-sharing application. For the Robur team, this is our first application using the Esperanto project (announced some time ago) allowing us to distribute a single binary that works on almost all platforms (including Windows).

This software also uses, as far as its relay is concerned, a unikernel developed with MirageOS and deployed by yours truly.

Finally, the distribution of the binary or the relay is offered by the reproducible infrastructure developed by the Robur team and available here: https://builds.robur.io/

The website (also a unikernel) contains all the information about Bob, its use, its configuration, etc. A series of Questions/Answers is also available there.

We would like to say that the project is experimental. Even though we are ready to deliver the software in due form, we are aware of some bugs and we will continue to improve the software. However, the practical case, its use and user feedback are needed at this stage - and that is why we are announcing its availability.

A series of articles to understand the whole development and deployment process is available:

If you are interested in the project and want to help us maintain all that it entails or if you consider what we do to be super cool, you can donate here.

The Robur team

Old CWN

If you happen to miss a CWN, you can send me a message and I’ll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.