OCaml Weekly News
Hello
Here is the latest OCaml Weekly News, for the week of November 15 to 22, 2022.
Table of Contents
cohttp 6.0.0~alpha0 released
Marcello Seri announced
I’d like to announce the release of the first alpha version of the new
cohttp
. If you are not familiar with it, cohttp
is an OCaml library
for creating HTTP clients and servers. It has a portable HTTP parser, and implementations using various asynchronous
programming libraries.
This experimental release introduces a number of changes in cohttp’s internals and in the libraries provided. In particular:
- the new
http
library provides essential type definitions used incohttp
and also includes a new, extremely fast, http parser. The library is designed to have practically no dependencies and make it easy for other packages to easily interoperate with cohttp. cohttp-server-lwt-unix
is a new lwt server backend that does not rely on conduit and uses direct lwt-io access. This should resolve most of the latency issues encountered with the oldcohttp-lwt-unix
library.cohttp-curl
is a newlibcurl
-based (viaocurl
) library, also providing anlwt
and anasync
backend.cohttp-eio
useseio
to leverage new features from multicore ocaml, it also include new very fast parser to fully benefit from eio at all levels.
All of this comes with a large number of fixed bugs, improved testing, new long-awaited features (like the http
cache for cohttp-lwt-unix
) and lots of internal refactoring and cleanup.
We would like to thank @rgrinberg @mefyl @anuragsoni @BikalGurung and all the people that contributed with issues, discussions, PRs and design ideas leading to this release.
LINKS
- Project page: https://github.com/mirage/ocaml-cohttp
- Documentation for the alpha version needs to be generated locally with
dune build @doc
CHANGES
- cohttp-eio: ensure “Host” header is the first header in http client requests (bikallem cohttp-eio(client): Ensure “Host” header is the first header in requests mirage/ocaml-cohttp#939)
- cohttp-eio: add TE header in client. Check TE header is server (bikallem Add TE header in client. Check TE header in server mirage/ocaml-cohttp#941)
- cohttp-eio: add User-Agent header to request from Client (bikallem cohttp-eio(client): add User-Agent header mirage/ocaml-cohttp#940)
- cohttp-eio: add Content-Length header to request/response (bikallem cohttp-eio: Add Content-Length header to request/response mirage/ocaml-cohttp#929)
- cohttp-eio: add cohttp-eio client api - Cohttp_eio.Client (bikallem cohttp-eio Client module mirage/ocaml-cohttp#879)
- http: add requires_content_length function for requests and responses (bikallem cohttp-eio Client module mirage/ocaml-cohttp#879)
- cohttp-eio: use Eio.Buf_write and improve server API (talex5 cohttp-eio: improve server API mirage/ocaml-cohttp#887)
- cohttp-eio: update to Eio 0.3 (talex5 Update for Eio 0.3 mirage/ocaml-cohttp#886)
- cohttp-eio: convert to Eio.Buf_read (talex5 cohttp-eio: use Eio.Buf_read mirage/ocaml-cohttp#882)
- cohttp lwt client: Connection cache and explicit pipelining (madroach Explicit pipelining and connection cache mirage/ocaml-cohttp#853)
- http: add Http.Request.make and simplify Http.Response.make (bikallem mseri http: add Request.make mirage/ocaml-cohttp#878)
- http: add pretty printer functions (bikallem http: add pretty printer functions mirage/ocaml-cohttp#880)
- New eio based client and server on top of the http library (bikallem RFC: eio-only version of cohttp mirage/ocaml-cohttp#857)
- New curl based clients (rgrinberg feature: Curl Based client mirage/ocaml-cohttp#813)
- cohttp-curl-lwt for an Lwt backend
- cohttp-curl-async for an Async backend
- Completely new Parsing layers for servers (anuragsoni Use new request parser mirage/ocaml-cohttp#819)
- Cohttp now uses an optimized parser for requests.
- The new parser produces much less temporary buffers during read operations in servers.
- Faster header comparison (gasche cohttp.headers: faster faster comparison mirage/ocaml-cohttp#818)
- Introduce http package containing common signatures and structures useful for compatibility with cohttp - and no dependencies (rgrinberg feature: introduce http package mirage/ocaml-cohttp#812)
- async(server): allow reading number of active connections (anuragsoni async(server): allow reading number of active connections mirage/ocaml-cohttp#809)
- Various internal refactors (rgrinberg, mseri, fix: fast comparison of request/response mirage/ocaml-cohttp#802, feature: introduce http package mirage/ocaml-cohttp#812, http, cohttp: refactor Header module tests mirage/ocaml-cohttp#820, refactor: remove some Stringext usage mirage/ocaml-cohttp#800, refactor: explicit signature for async mirage/ocaml-cohttp#799, refactor: stop relying on name mangling mirage/ocaml-cohttp#797)
- http (all cohttp server backends): Consider the connection header in response in addition to the request when deciding on whether to keep a connection alive (anuragsoni, http: Consider the user provided response when considering connection keep-alive mirage/ocaml-cohttp#843)
- The user provided Response can contain a connection header. That header will also be considered in addition to the connection header in requests when deciding whether to use keep-alive. This allows a handler to decide to close a connection even if the client requested a keep-alive in the request.
- async(server): allow creating a server without using conduit (anuragsoni Allow creating cohttp-async servers with existing reader/writers mirage/ocaml-cohttp#839)
- Add
Cohttp_async.Server.Expert.create
andCohttp_async.Server.Expert.create_with_response_action~that can be used to create a server without going through Conduit. This allows creating an async TCP server using the Tcp module from ~Async_unix
and lets the user have more control over how theReader.t
andWriter.t
are created.
- Add
- http(header): faster
to_lines
andto_frames
implementation (mseri bench(http): optimize to_frames and to_lines functions mirage/ocaml-cohttp#847) - cohttp(cookies): use case-insensitive comparison to check for
set-cookies
(mseri Cookies: case-insensitive check of set-cookies mirage/ocaml-cohttp#858) - New lwt based server implementation:
cohttp-server-lwt-unix
: this new implementation does not depend on conduit and has a simpler and more flexible API - async: Adapt cohttp-curl-async to work with core_unix.
BREAKING CHANGES
- refactor: move opam metadata to dune-project (rgrinberg refactor: move opam metadata to dune-project mirage/ocaml-cohttp#811)
- refactor: deprecate Cohttp_async.Io (rgrinberg refactor: deprecate Cohttp_async.Io mirage/ocaml-cohttp#807)
- fix: move more internals to Private (rgrinberg fix: move more internals to Private mirage/ocaml-cohttp#806)
- fix: deprecate transfer encoding field (rgrinberg fix: deprecate transfer encoding field mirage/ocaml-cohttp#805)
- refactor: deprecate Cohttp_async.Body_raw (rgrinberg refactor: deprecate Cohttp_async.Body_raw mirage/ocaml-cohttp#804)
- fix: deprecate more aliases (rgrinberg fix: deprecate more aliases mirage/ocaml-cohttp#803)
- refactor: deprecate connection value(rgrinberg refactor: deprecate connection value mirage/ocaml-cohttp#798)
- refactor: deprecate using attributes (rgrinberg refactor: deprecate using attributes mirage/ocaml-cohttp#796)
- cleanup: remove cohttp-{curl,server}-async (rgrinberg fix: stop installing binaries mirage/ocaml-cohttp#904)
- cleanup: remove cohttp-{curl,server,proxy}-lwt (rgrinberg fix: stop installing binaries mirage/ocaml-cohttp#904)
- fix: all parsers now follow the spec and require
\r\n
endings. Previously, the\r
was optional. (rgrinberg, fix: require \r\n in all parsers mirage/ocaml-cohttp#921) cohttp-lwt-jsoo
: do not instantiateXMLHttpRequest
object on boot (mefyl Do not systematically create a XMLHttpRequest object. mirage/ocaml-cohttp#922)
Constant string pattern matching
Ian asked
Let’s say I write something like this:
match lexeme with | "foo00" -> Foo00 | "foo01" -> Foo01 ... | "foo50" -> Foo50 | _ -> FooError
(My real keywords are not uniform like this.) Does this compile to the obvious straight linear search, or is there any cleverness (like a constant hidden hashtable perhaps)? Or should I have a hashtable myself?
gasche replied
The compiler explorer shows assembly output, but often other intermdiate representations are more readable (to me at least). The example of @copy is
let test x = match x with | "barfoofoo00" -> 1 | "barfoofoo01" -> 2 | "barfoofoo02" -> 3 | "barfoofoo03" -> 4 | "barfoofoo04" -> 5 | "barfoofoo05" -> 6 | "barfoofoo06" -> 7 | "barfoofoo07" -> 8 | "barfoofoo08" -> 9 | "barfoofoo09" -> 10 | "barfoofoo10" -> 11 | "barfoofoo11" -> 12 | "barfoofoo12" -> 13 | "barfoofoo13" -> 14 | "barfoofoo14" -> 15 | "barfoofoo15" -> 16 | "barfoofoo16" -> 17 | "barfoofoo17" -> 18 | "barfoofoo18" -> 19 | "barfoofoo19" -> 20 | _ -> -42
and the -dcmm
output is as follows: (remember that integers are tagged in this representation, so for example
39
, 2*19+1, is the encoding of 19
in the source.)
(function camlTest__test_267 (x/269: val) (catch (let size/274 (>>u (load_mut int (+a x/269 -8)) 10) (if (!= size/274 2) (exit 1) (let cell/272 (load_mut int (+a x/269 0)) (if (== cell/272 8027225910085312866) (let cell/273 (load_mut int (+a x/269 8)) (if (< cell/273 288230376155197551) (if (< cell/273 288230376155001199) (if (< cell/273 288230376154935407) (if (== cell/273 288230376154869871) 3 (if (== cell/273 288230376154870127) 23 (exit 1))) (if (== cell/273 288230376154935407) 5 (if (== cell/273 288230376154935663) 25 (if (== cell/273 288230376155000943) 7 (exit 1))))) (if (< cell/273 288230376155066735) (if (== cell/273 288230376155001199) 27 (if (== cell/273 288230376155066479) 9 (exit 1))) (if (== cell/273 288230376155066735) 29 (if (== cell/273 288230376155132015) 11 (if (== cell/273 288230376155132271) 31 (exit 1)))))) (if (< cell/273 288230376155328879) (if (< cell/273 288230376155263087) (if (== cell/273 288230376155197551) 13 (if (== cell/273 288230376155197807) 33 (exit 1))) (if (== cell/273 288230376155263087) 15 (if (== cell/273 288230376155263343) 35 (if (== cell/273 288230376155328623) 17 (exit 1))))) (if (< cell/273 288230376155394415) (if (== cell/273 288230376155328879) 37 (if (== cell/273 288230376155394159) 19 (exit 1))) (if (== cell/273 288230376155394415) 39 (if (== cell/273 288230376155459695) 21 (if (== cell/273 288230376155459951) 41 (exit 1)))))))) (exit 1))))) with(1) -83))
alan then added
@gasche has a comprehensive comment on how OCaml compiles pattern matching on r/ProgrammingLanguages.
gasche then said
Indeed! Thanks for the mention. Reposting below.
The strategy used by the OCaml compiler is not naive, but not hashing either. It will compile a switch on strings (with a default case) into a tree of switches on characters (in fact words, see below), by reading characters at specific positions in the string. It was implemented by Benoît Vaugon and Luc Maranget.
The position to read is selected as the least discriminating position: we count the number of different values (among the string patterns) for each position, and choose the (leftmost) position with the smallest number of possible values. The idea is that these positions must be tested anyway to eliminate the default case, so you may as well check them first.
For example if you are matching on “aa” vs “ab” (or something else, the default case), and you know that your input has size 2, you may generate either:
switch s[0]: case 'a': switch s[1]: case 'a': goto <case "aa"> case 'b': goto <case "ab"> default: goto <default case> default: goto <default case>
or
switch s[1]: case 'a': switch s[0]: case 'a': goto <case "aa"> default: goto <default case> case 'b': switch s[0]: case 'a': goto <case "ab"> default: goto <default case> default: goto <default case>
and the first approach is more interesting, it performs the same tests for each non-default value with shorter code.
This is counter-intuitive: check the least discriminating position first. It is more intuitive at first to check the most discriminating position.
Regarding size: you start by switching on the size of the string, then you are left with groups of the same size, where the same positions are valid.
Regarding characters: instead of looking up strings characters by characters, you can read whole machine word at
once, except at the very end of the string. In the case of {"one", "two", "three"}
, with the 0-ended
representation of C, a single 32bit read is enough to distinguish the three values, so you need a single position
switch (after switching on the size first).
Dune’s Style Guide
Rudi Grinberg announced
I’ve recently added contribution guidelines to dune’s code base to help onboard potential contributors to dune. It’s been suggested that it would be useful to share them with the wider community so this is the purpose of this post.
Many guidelines are taken straight from Jane Street’s internal rules which shouldn’t be a surprise as the project was started by a JST employee. The intent of sharing these is not to evangelize our coding style to other projects, but perhaps inspire some of you to pick and choose the rules you like and maybe share some of your own.
Jane Street, compiler development, and open-source
Yaron Minsky said
@reisenberg wrote a summary of our recent efforts on the compiler, and @sid asked a question about how this impacts Jane Street’s publicly released software:
How would the newer versions of the libraries that progressively use more and more of custom language be made available to the OCaml community?
This is a good question, given that it may well take years to upstream some of our compiler changes, and some of them might not be accepted by upstream at all.
In some sense, we’re in this world already. We’ve already made changes to Base, our standard library, to use the
local mode in order to better support stack allocation. And this has already made it in to our public release, as
you can see if you look for the [@local]
annotations in the code below.
https://github.com/janestreet/base/blob/master/src/list.mli#L208
Our strategy here was to use the new feature in a way that doesn’t break the syntax, so it could be smoothly included in our public code. This works for modes because modes can be erased without changing the semantics of the code.
This approach isn’t always possible. Consider the include functor
syntax that was mentioned in @reisenberg’s post:
module List = struct type 'a t = | Nil | Cons of 'a * 'a t let mapi t ~f = ... include functor Make_map end
This can’t just be desugared away, as it happens, and so we simply block ourselves from using this feature within publicly released code.
There’s an obvious tension here, since blocking ourselves from using new features makes it more painful to open
source things. Part of our intent here is to prioritize the upstreaming of things that cause these problems.
include functor
is an example of a feature that’s pretty small, and we think is generally quite useful, so we
intend to propose it upstream soon.
I suspect unboxed types will be more like include functor
than like modes, in that it’s going to be hard to use it
in a way that’s compatible with the public release. We haven’t yet really worked through what the tradeoffs will be
there.
In any case, our public release code is important to us, and we’re going to continue to release new versions. We value the fact that it’s helpful to the community, but it’s also more directly valuable to us. In particular, it makes it easier for us to reuse work done by other people, since people spend the time and effort to build libraries that interoperate well with our code (notably, with Async). And also, our public release helps people take into account our usage patterns when working on the compiler or other important community tools like Merlin or OCamlformat.
Dune 3.6.0
Etienne Millon announced
Dear dune users, It is my pleasure to announce that dune 3.6.0 is now available on opam :tada:. Here’s the changelog - I reused the same classification as in the previous announce for dune 3.5.0. Thanks again to all the contributors including bug reporters.
dune executable
This lists features of the “dune” executable itself. Upgrading dune will bring in these changes. We consider these changes safe, but it is difficult to define what a breaking change is for a command-line tool (for example, some error messages change). It is important to note that just upgrading the dune executable is not supposed to change how dune interprets existing projects. If just upgrading dune breaks compilation, it is a bug in dune, please report it!
- Added
- Introduce a
$ dune ocaml top-module
subcommand to load modules directly without sealing them behind the signature. (#5940, @rgrinberg) - Revive
$ dune external-lib-deps
under$ dune describe external-lib-deps
. (#6045, @moyodiallo) - Extend the promotion CLI to a
dune promotion
group:dune promote
is moved todune promotion apply
(the former still works) and the newdune promotion diff
command can be used to just display the promotion without applying it. (#6160, fixes #5368, @emillon) - Build progress status now shows number of failed jobs (#6242, @Alizter)
- Allow promoting into source directories specified by
subdir
(#6404, fixes #3502, @rgrinberg) - Support
CLICOLOR
andCLICOLOR_FORCE
to enable/disable/force ANSI colors. (#6340, fixes #6323, @MisterDA). - Create a fake socket file
_build/.rpc/dune
on windows to allow rpc clients to connect using the build directory. (#6329, @rgrinberg)
- Introduce a
- Fixed
- Forbid multiple instances of dune running concurrently in the same workspace. (#6360, fixes #236, @rgrinberg)
- Make dune describe workspace return the correct root path (#6380, fixes #6379, @esope)
- Fix running inline tests in bytecode mode (#5622, fixes #5515, @dariusf)
(lang dune 3.6)
This lists changes if you opt into the new (lang dune 3.6)
version in your dune-project file. For this too, these
are changes that we consider safe, but they can require changes to your dune files. For example, sandboxing is
enabled in more places, which means that you might have to be more precise in expressing your dependencies. Please
reach out on the issue tracker if you have trouble fixing your dune file or if something does not seem to be
possible anymore.
- Added
- Add
(glob_files <glob>)
and(glob_files_rec <glob>)
terms to thefiles
field of theinstall
stanza (#6250, closes #6018, @gridbugs) - Allow
Byte_complete
binaries to be installable (#4837, @AltGr, @rgrinberg) - Allow
:standard
in the(modules)
field of thecoq.pp
stanza (#6229, fixes #2414, @Alizter)
- Add
- Fixed
- Allow absolute build directories to find public executables. For example, those specified with
(deps %{bin:...})
(#6326, @anmonteiro) - [ctypes] do not mangle user written names in the ctypes stanza (#6374, fixes #5561, @rgrinberg)
- Forbid private libraries with
(package ..)
set from depending on private libraries that don’t belong to a package (#6385, fixes #6153, @rgrinberg) - [ctypes] always re-run
pkg-config
because we aren’t tracking its external dependencies (#6052, @rgrinberg) - [ctypes] remove dependency on configurator in the generated rules (#6052, @rgrinberg)
- Fix passing of flags to dune coq top (#6369, fixes #6366, @Alizter)
- Prevent crash if absolute paths are used in the install stanza and in recursive globs. These cases now result in a user error. (#6331, @gridbugs)
- Allow absolute build directories to find public executables. For example, those specified with
My learnings about monads and state monads
David Wong announced
I wrote about monads recently here: https://cryptologie.net/article/578/simple-introduction-to-monads-in-ocaml/
and state monads here: https://cryptologie.net/article/581/state-monads-in-ocaml/
thought this might be useful to someone trying to learn these as well
hyphenrf then said
I see you do toplevel-open
in your code, which is generally only good for namespaced modules not identifiers.
A more idiomatic way to access definitions is to refer to them directly by their fully-qualified names, or do a
local-open
.
open Mod let f x = a ... let g x = b ... let h x = c ...
becomes
let f x = Mod.a ... let g x = let open Mod in b ... let h x = Mod.(c ...)
so for example:
let res = Monad.( let* a = Some 5 in let* b = Some 6 in return (a + b) )
Confero 0.1.1
“Petter A. Urkedal announced
Confero implements the Unicode Collation Algorithm (UCA), currently built for Unicode 15.0.0. It also provides the Default Unicode Collation Element Table (DUCET), which implements a language-agnostic collation order.
For most use-cases, it should suffice to link with confero
and confero.ducet
and use the single entry point
Confero.collate
. For a drop-in replacement for String.compare
, pass ~total:true
, otherwise it will disagree
with (=)
due to normalization. If you don’t link with confero.ducet
, the default collation will be based on
Unicode codepoints. The API allows you to take more control of which collation mapping is used, and to evaluate
separate stages of the UCA, if needed.
I haven’t looked into localizing collation, but it should be possible to create a custom mapping which calls the DUCET mapping as a fall-back. Note, however, that the collation elements are not stable across Unicode versions. CLDR should of interest to those who want to look into this.
The API documentation is not online yet, but I’ll post a link when it gets indexed on ocaml.org.
B·o·B, an universal and secure peer-to-peer file-transfer in OCaml
Calascibetta Romain announced
I am very pleased to announce the experimental distribution of Bob available on this website.
Bob is an OCaml-based file-sharing application. For the Robur team, this is our first application using the Esperanto project (announced some time ago) allowing us to distribute a single binary that works on almost all platforms (including Windows).
This software also uses, as far as its relay is concerned, a unikernel developed with MirageOS and deployed by yours truly.
Finally, the distribution of the binary or the relay is offered by the reproducible infrastructure developed by the Robur team and available here: https://builds.robur.io/
The website (also a unikernel) contains all the information about Bob, its use, its configuration, etc. A series of Questions/Answers is also available there.
We would like to say that the project is experimental. Even though we are ready to deliver the software in due form, we are aware of some bugs and we will continue to improve the software. However, the practical case, its use and user feedback are needed at this stage - and that is why we are announcing its availability.
A series of articles to understand the whole development and deployment process is available:
- Esperanto, when OCaml meets Cosmopolitan
- Spoke, an implementation of SPAKE2+EE in OCaml
- The Bob’s state machine with GADTs
- The Robur’s reproducible infrastructure
- Albatross, to deploy your unikernel
If you are interested in the project and want to help us maintain all that it entails or if you consider what we do to be super cool, you can donate here.
The Robur team
Old CWN
If you happen to miss a CWN, you can send me a message and I’ll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.