OCaml Weekly News

Previous Week Up Next Week

Hello

Here is the latest OCaml Weekly News, for the week of May 04 to 11, 2021.

Table of Contents

Software engineer position at LexiFi (Paris)

Alain Frisch announced

LexiFi is hiring! We are looking for a fully-time software engineer to join our core development team. The vast majority of our stack is implemented in OCaml, and we have plenty of exciting projects on a wide range of topics.

More info on https://www.lexifi.com/careers/software_engineer/

Open source editor for iOS, iPadOS and macOS

Continuing this thread, Nathan Fallet announced

Just updated the editor, I redesigned the macOS version, and it just looks better and more native

6b03c462755fb37a2d5018013c3d1c8bd45f53bf_2_1380x766.jpeg

What are your first impressions on it?

Backend developer position at Issuu (Copenhagen)

Dario Teixeira announced

We are looking for a Backend Developer with experience in machine learning – and preferably also OCaml! – to join our Research & Development team. You will help build machine learning research prototypes and be responsible for integrating them into new and existing products.

At Issuu, we use OCaml extensively in our production systems. If you love OCaml and functional programming in general, Issuu is a great place to put your passion into real-world products!

Please find more information about this position at the following link: https://jobs.lever.co/issuu/f502cb20-b216-4c67-8357-d748e1b35178

Anentropic asked and Dario Teixeira replied

I would love to hear more about your OCaml backend stack

Well, we love to talk about our OCaml stack! :slightly_smiling_face:

We rely on the Jane Street ecosystem a lot, using Core as a Stdlib replacement and Async for monadic concurrency.

AMQP forms the backbone of our messaging system, and therefore we use amqp-client extensively.

We use both MySQL and Postgresql databases in production. For the former we use ppx_mysql, and for the latter, PGOCaml. (Thanks to Docker, we can give PGOCaml compile-time access to the DB without having to depend on the actual production DB.)

We currently use Protobuf for serialisation, but spend a great amount of time complaining about it. We rely on ocaml-protoc-plugin to generate the OCaml code from Protobuf definitions.

Anyway, that's just the basics of our stack. Do let me know if there's something else you'd like to know in more detail!

roddy asked and Dario Teixeira replied

Do you use Protobuf for interop with non-OCaml systems? If not, I'm curious about whether you've considered bin_prot as an alternative; it seems like an obvious choice if you're using Core/Async.

Yes, we use Protobuf mainly because we have a heterogeneous stack, where besides OCaml we also have services running Python, Kotlin, or Elixir.

Tim McGilchrist asked and Dario Teixeira

I'm curious about how you structure the business code (for want of a better word), in between the technical layers of talking to AMQP or an SQL store. Are there larger scale patterns like CQRS or DDD that you use to organise code?

How do you package up code for deployment? Docker / AWS something. We're slowly migrating to a micro-service architecture (the pros and cons of which are outside the scope of this thread; that's a can of worms I'd rather not open…) whose cast of characters includes "entities" (responsible for storing/retrieving data from DBs), generic backend services that encapsulate business logic, frontend services, and backend-for-frontend services.

We're using Docker for deployment on AWS (mostly), and slowly migrating from Docker Swarm to Kubernetes.

25 years of OCaml

Xavier Leroy announced

25 years ago, on May 9th 1996, release 1.00 of the Objective Caml language and system was announced: https://sympa.inria.fr/sympa/arc/caml-list/1996-05/msg00003.html

It was already the consolidation of many years of work, integrating Jérôme Vouillon and Didier Rémy's work on objects and classes within Caml Special Light, itself a combination of my work on modules and native-code compilation with earlier code taken from Caml Light, especially Damien Doligez's GC.

Little did I know that O(bjective) Caml would still be there 25 years later!

A lot happened during this time, including several major evolutions of the language, and, much more importantly, the emergence of a community of users and an ecosystem of tools and libraries. But maybe this was just the beginning for something even bigger? We'll see…

Happy birthday, OCaml!

David Allsopp replied

Most pleasingly, with a very small number of patches, the Windows port still works in Visual Studio 2019:

C:\Birthday>ocaml.exe
        Objective Caml version 1.00

#print_endline "Happy 25th Birthday, OCaml!";;
Happy 25th Birthday, OCaml!
- : unit = ()
##quit;;

C:\Birthday>type hooray.ml
let rec hip_hip n =
  if n > 0 then
    let () = print_endline "hip hip! hooray!" in
    hip_hip (pred n)

let () = hip_hip 25
C:\Birthday>ocamlopt -o hooray.exe hooray.ml

C:\Birthday>hooray
hip hip! hooray!
...

On the OCaml Maling List, Roberto Di Cosmo also replied

Long live OCaml!

Thanks Xavier, and to all the brilliant minds that contributed to the evolution and adoption of this beautiful language, and system, in this past quarter of a century.

If I may add a personal note, one truly remarkable fact is that some rather complex code written in 1998 using OCaml 1.07 [1] could be compiled and run last year using OCaml 4.x without modifications: the only visible changes were the new warnings spotting potential issues in the code, thanks to the many improvements to the compiler over time.

For the curious, all the details are here: https://www.dicosmo.org/Articles/2020-ReScienceC.pdf

Cheers

Roberto

[1] that was the first version including support for marshalling closures, added in a fantastic one week-spring in Pisa exactly for this code :-)

OCaml compiler development newsletter, issue 1: before May 2021

gasche announced

I'm happy to introduce the first issue of the "OCaml compiler development newsletter". I asked frequent contributors to the OCaml compiler codebase to write a small burb on what they have been doing recently, in the interest of sharing more information on what people are interested in, looking at and working on.

This is by no means exhaustive: many people didn't end up having the time to write something, and it's fine. But hopefully this can give a small window on development activity related to the OCaml compiler, structured differently from the endless stream of Pull Requests on the compiler codebase.

(This initiative is inspired by the excellent Multicore newsletter. Please don't expect that it will be as polished or consistent :yo-yo: .)

Note:

  • Feel free of course to comment or ask questions, but I don't know if the people who wrote a small blurb will be looking at the thread, so no promises.
  • If you have been working on the OCaml compiler and want to say something, please feel free to post! If you would like me to get in touch next time I prepare a newsletter issue (some random point in the future), please let me know by email at (gabriel.scherer at gmail).

@dra27 (David Allsopp)

Compiler relocation patches now exist. There's still a few left to write, and they need splitting into reviewable PRs, but the core features are working. A compiler installation can be copied to a new location and still work, meaning that local switches in opam may in theory be renamed and, more importantly, we can cache previously-built compilers in an opam root to allow a new switch's compiler to be a copy. This probably won't be reviewed in time for 4.13, although it's intended that once merged opam-repository will carry back-ports to earlier compilers.

A whole slew of scripting pain has lead to some possible patches to reduce the use of scripts in the compiler build to somewhat closer to none.

FlexDLL bootstrap has been completely overhauled, reducing build time considerably. This will be in 4.13 (#10135)

@nojb (Nicolás Ojeda Bär)

I am working on #10159, which enables debug information in -output-complete-exe binaries. It uses incbin under Unix-like system and some other method under Windows.

@gasche (Gabriel Scherer)

I worked on bringing more PRs to a decision (merge or close). The number of open PRs has gone from 220-ish to 180, which feels nice.

I have also contributed to @Ekdohibs' project camlboot, which is a "bootstrap-free" implementation of OCaml able to compile the OCaml compiler itself. It currently targets OCaml 4.07 for various reasons. We were able to do a full build of the OCaml compiler, and check that the result produces bootstrap binaries that coincide with upstream bootstraps. This gives extremely strong confidence that the OCaml bootstrap is free from "trusting trust" attacks. For more details, see our draft paper.

with @Octachron (Florian Angeletti)

I worked with Florian Angeletti on deprecating certain command-line warning-specifier sequences, to avoid usability issues with (new in 4.12) warning names. Before -w -partial-match disables warning 4, but -w -partial is interpreted as the sequence w -p -w a -w r -w t -w i -w a -w l, most of which are ignored but -w a silences all warnings. Now multi-letter sequences of "unsigned" specifiers (-p is signed, a is unsigned) are deprecated. (We first deprecated all unsigned specifiers, but Leo White tested the result and remarked that -w A is common, so now we only warn on multi-letter sequences of unsigned specifiers.

I am working with @Octachron (Florian Angeletti) on grouping signature items when traversing module signatures. Some items are "ghost items" that are morally attached in a "main item"; the code mostly ignores this and this creates various bugs in corner cases. This is work that Florian started in September 2019 with #8929, to fix a bug in the reprinting of signatures. I only started reviewing in May-September 2020 and we decided to do sizeable changes, he split it in several smaller changes in January 2021 and we merged it in April 2021. Now we are looking are fixing other bugs with his code (#9774, #10385). Just this week Florian landed a nice PR fixing several distinct issues related to signature item grouping: #10401.

@xavierleroy (Xavier Leroy)

I fixed #10339, a mysterious crash on the new Macs with "Apple silicon". This was due to a ARM (32 and 64 bits)-specific optimization of array bound checking, which was not taken into account by the platform-independent parts of the back-end, leading to incorrect liveness analysis and wrong register allocation. #10354 fixes this by informing the platform-independent parts of the back-end that some platform-specific instructions can raise. In passing, it refactors similar code that was duplicating platform-independent calculations (of which instructions are pure) in platform-dependent files.

I spent quality time with the Jenkins continuous integration system at Inria, integrating a new Mac Mini M1. For unknown reasons, Jenkins ran the CI script in x86-64 emulation mode, so we were building and testing an x86-64 version of OCaml instead of the intended ARM64 version. A bit of scripting later (8b1bc01c3) and voilà, arm64-macos is properly tested as part of our CI.

Currently, I'm reading the "safe points" proposal by Sadiq Jaffer (#10039) and the changes on top of this proposed by Damien Doligez. It's a necessary step towards Multicore OCaml, so we really need to move forward on this one. It's a nontrivial change involving a new static analysis and a number of tweaks in every code emitter, but things are starting to look good here.

@mshinwell (Mark Shinwell)

I did a first pass of review on the safe points PR (#10039) and significantly simplified the proposed backend changes. I've also been involved in discussions about a new function-level attribute to cause an error if safe points (including allocations) might exist within a function's body, to make code that currently assumes this robust. There will be a design document for this coming in due course.

I fixed the random segfaults that were occurring on the RISC-V Inria CI worker (#10349).

In Flambda 2 land we spent two person-days debugging a problem relating to Infix_tag! We discovered that the code in OCaml 4.12 onwards for traversing GC roots in static data ("caml_globals") is not correct if any of the roots are closures. This arises in part because the new compaction code (#9728) has a hidden invariant: it must not see any field of a static data root more than once (not even via an Infix_tag). As far as we know, these situations do not arise in the existing compiler, although we may propose a patch to guard against them. They arise with Flambda 2 because in order to compile statically-allocated inconstant closures (ones whose environment is partially or wholly computed at runtime) we register closures directly as global roots, so we can patch their environments later.

@garrigue (Jacques Garrigue)

I have been working on a number of PRs fixing bugs in the type system, which are now merged:

  • #10277 fixes a theoretical bug in the principality of GADT type inference (#10383 applies only in -principal mode)
  • #10308 fixes an interaction between local open in patterns and the new syntax for introducing existential type variables
  • #10322 is an internal change using a normal reference inside of a weak one for backtracking; the weak reference was an optimization when backtracking was a seldom used feature, and was not useful anymore
  • #10344 fixes a bug in the delaying of the evaluation of optional arguments
  • #10347 cleans up some code in the unification algorithm, after a strengthening of universal variable scoping
  • #10362 fixes a forgotten normalization in the type checking algorithm

Some are still in progress:

  • #10348 improves the way expansion is done during unification, to avoid some spurious GADT related ambiguity errors
  • #10364 changes the typing of the body of the cases of pattern-matchings, allowing to warn in some non-principal situations; it also uncovered a number of principality related bugs inside the the type-checker

Finally, I have worked with Takafumi Saikawa (@t6s) on making the representation of types closer to its logical meaning, by ensuring that one always manipulate a normalized view in #10337 (large change, evaluation in progress).

@let-def (Frédéric Bour)

For some time, I have been working on new approaches to generate error messages from a Menhir parser.

My goal at the beginning was to detect and produce a precise message for the ‘let ;’ situation:

let x = 5;
let y = 6
let z = 7

LR detects an error at the third ‘let’ which is technically correct, although we would like to point the user at the ‘;’ which might be the root cause of the error. This goal has been achieved, but the prototype is far from being ready for production.

The main idea to increase the expressiveness and maintainability of error context identification is to use a flavor of regular expressions. The stack of a parser defines a prefix of a sentential form. Our regular expressions are matched against it. Internal details of the automaton does not leak (no reference to states), the regular language is defined by the grammar alone. With appropriate tooling, specific situations can be captured by starting from a coarse expression and refining it to narrow down the interesting cases.

Now I am focusing on one specific point of the ‘error message’ development pipeline: improving the efficiency of ‘menhir –list-errors’. This command is used to enumerate sentences that cover all erroneous situations (as defined by the LR grammar). On my computer and with the OCaml grammar, it takes a few minutes and quite a lot of RAM. Early results are encouraging and I hope to have a PR for Menhir soon. The performance improvement we are aiming for is to make the command almost real time for common grammars and to tackle bigger grammars by reducing the memory needs. For instance, in the OCaml case, the runtime is down from 3 minutes to 2–3 seconds and memory consumption goes from a few GiB down to 200 MiB.

Daniel Bünzli asked and gasche replied

> […] @Ekdohibs’ project camlboot , which is a “bootstrap-free”
> implementation of OCaml able to compile the OCaml compiler itself. It currently targets OCaml 4.07 for various
> reasons. We were able to do a full build of the OCaml compiler, and check that the result produces bootstrap
> binaries that coincide with upstream bootstraps. This gives extremely strong confidence that the OCaml bootstrap is
> free from “trusting trust” attacks. For more details, see our draft paper.

Something that is not clear to me (but I read quickly) is the impact of `guile` itself being not bootstrapped yet. Could there be a very elaborate attack (with probability 0 of existing) on both the guile and ocaml bootstrap or is there something in the whole scheme that prevents it ?

Yes, currently Guile needs to be trusted, and it would be possible that a bootstrapping virus in Guile would break our correctness result. (It would need to reproduce itself through our compiler and interpreter that were written after Guile itself, but I think in theory this could be done with an almost-infinitely clever program analysis.) Of course, an attack at the source level (inserting malicious source, instead of malicious binaries) is also possible anywhere in the chain. Our main reason for using Guile is that this is the high-level language community most active on debootstrapping-towards-the-metal (through the Guix connection), so we believe it is more likely to manage debootstrapping and maintain it in the longer run.

(The seed that Guile depends on is its macro-expander, which is written using macros itself. In theory one may perform the macro-expansion of the expander, and then manually review the two versions to verify the absence of attack there.)

After so many years, I discover 'Str.bounded_full_split regexp str n'

UnixJunkie said

This is so useful and powerful:

#require "str";;
Str.bounded_full_split (Str.regexp "[()]") "toto (titi, tata (et tutu)) vont au parc (en courant)" 1024;;
- : Str.split_result list =
[Str.Text "toto "; Str.Delim "("; Str.Text "titi, tata "; Str.Delim "(";
 Str.Text "et tutu"; Str.Delim ")"; Str.Delim ")"; Str.Text " vont au parc ";
 Str.Delim "("; Str.Text "en courant"; Str.Delim ")"]

Still finding hidden pearls in the stdlib after so many years! :slight_smile:

Parser for the Scala programming language?

Deep in this thread, Yoann Padioleau announced

I ended up porting the recursive descent parser in the Scala compiler to OCaml … I think it was the fastest way to get a working parser from OCaml …

https://github.com/returntocorp/pfff/blob/develop/lang_scala/parsing/Parser_scala_recursive_descent.ml

Old CWN

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.