OCaml Weekly News

Previous Week Up Next Week

Hello

Here is the latest OCaml Weekly News, for the week of April 20 to 27, 2021.

Table of Contents

docs.ocaml.pro : an OCaml Documentation Hub

Fabrice Le Fessant announced

We are pleased to announce that we just published the first version of the OCaml Documentation Hub on:

https://docs.ocaml.pro

The OCaml Documentation Hub can be used to browse the sources and the documentations of more than 2000 opam packages, following links between them when useful. This is a work-in-progress, and we are working on improving it with many more features, such as source annotations with types, full-text and type-driven searches, improvements in the general readability of documentation, etc.

The site is generated using an open-source tool called digodoc, available on:

https://github.com/OCamlPro/digodoc

Digodoc is able to build a map of an opam switch, with links between files, opam packages, ocaml libraries, meta packages and ocaml modules. It is also able to generate documentation using odoc with cross-indexes between all these kinds of packages.

We welcome feedback and contributions! Enjoy !

Simon Cruanes said and Anil Madhavapeddy added

Great work on this site, and I love the domain name as well ;-)

The cross linking between packages is fantastic.

As a bit of background on why documentation cross-linking has taken so long, there is a lonnnggg history intertwined with many people's contributions to opam, build systems (ocamlbuild and dune), conventions (findlib and odig) and of course odoc itself. The major milestones along the way have been:

  • odoc 1.0, first began in 2014 as a quick project to pull together typing information from cmt[i] files, but which ran into the problem that it needs a consistent set of compiled cmt files to actually work, and so needs help from external tools to pull that set of compiled libraries together.
  • odig, which pulls together multiple opam packages (and a filesystem layout for metadata) and runs odoc on then. This allowed for the creation of https://docs.mirage.io a few years ago which cross-references a smaller number of packages
  • opam-repo itself has had better and better bulk builds over the years to ensure that we can actually automatically compile all the artefacts needed for docs builds, thanks to efforts like health check and ocurrent.
  • odoc 2.0, which featured a multi-year rewrite of the OCaml module resolver and introduced a new output IR. This forthcoming release was presented in this OCaml 2020 talk by @jonludlam.

And now with all these pieces in place, the OCaml documentation spring has arrived! The OCamlPro one posted here as the first of the "new batch" of mass documentation indexers, and I'm aware of concurrent efforts by the odoc/ocaml.org maintainer teams to push a central one out to ocaml.org, as well as by the MirageOS team who are refreshing docs.mirage.io with the latest and greatest. I'm sure when the dust has settled on all these indexers we can look for common pieces, but for now it's lovely to see so much innovation happening at pace.

For the community: now is the time to fix your docstrings in your libraries, as there will many cool tools parsing and processing them, and rendering them into all kinds of output formats!

To the odoc contributors, thank you! The journey to get to this documentation site started here seven years ago:

commit ef91571cab31d9ece7af965ed52eaaff57a12efc
Author: Leo White <lpw25@cl.cam.ac.uk>
Date:   Thu Oct 16 19:20:18 2014 +0100

    Initial commit

@lefessan one thing I'm not sure about in your site is the "copyright library authors" claim. That's murky legal ground – it's worth establishing if the odoc HTML has gone through a compilation process and so is no longer copyright the authors (just as a binary output is not copyright the original source code). If the output is copyright the authors, then they have reasonable grounds to claim that you should also reproduce the copyright notice and other license restrictions. Personally, I prefer to claim that there is no copyright to the original authors in odoc output, and sidestep this issue.

Fabrice Le Fessant replied

Thanks @avsm , all these projects were indeed important milestones towards the creation of this site. However, I wouldn't want this history perspective to give the wrong feeling that building this site was easy, it is the result of a very good, long and hard work by the team at OCamlPro to make it work despite a road paved with many obstacles. It also benefited from OCamlPro's long history of innovative projects for the OCaml community, that lead for example in the past to Opam, Try-OCaml, Memprof/Memthol, Opam-builder, Learn-OCaml, the Typerex tools (ocp-indent, ocp-index, ocp-build, etc.) and more recently opam-bin and drom.

As I said, this is a work-in-progress, and there are many features that we will be adding in the next months to make this website much easier to navigate, for users to rapidely reach the information that matters for them. We hope it will be inspirational for all the other developers who are working on similar projects, and we are looking forward to using their projects soon too!

Daniel Bünzli said

I'd just like to stress that odig documents OCaml package installs regardless of the package manager used as long the install structure follows these conventions (which are automatically followed by dune installs) .

Also for people using my packages, I'd just like to mention they may miss important documentation bits on https://docs.ocaml.pro until that issue is resolved.

Much later in the thread, Kiran Gopinathan said

It's not quite the same as hoogle, but merlin has a functionality to search for functions by type signature - the feature doesn't seem to get much attention apparently - probably the interface is a little lacking, but with some extra elisp tuning, it can work quite smoothly:

3c2d1c63fac7cbd7dd1bb5b9a406589e031cb795.gif

Yawar Amin then added

The command line for this:

ocamlmerlin single search-by-polarity -position 0 -query '-int +string'

(To search for values of type int -> string.)

Decompress 1.4.0

Charles Edouard Lecat announced

Greetings everyone,

I am happy to announce the new release of decompress 1.4.0, available for installation via OPAM. Decompress is a library containing a pure OCaml implementation of several compression algorithms:

  • RFC1951
  • Zlib
  • Gzip
  • LZO

It's goal is to provide several algorithms for both the inflation and the deflation of objects, in the form of a stream API allowing to call the chosen algorithm one bit at a time. Such behavior allows for an easy use of decompress in situations where we would not be able to have the input in one go, or where we would like to output the result in a non blocking way. This new release comes with several improvements to the documentation and bug fixes, but even more, with a whole new implementation for the rfc 1951 and zlib algorithms.

Non-stream implementation for rfc 1951 and zlib

Up to this day, decompress was used in several projects like ocaml-git. However, as time passed by, it appeared that in some cases, the current implementation of decompress was not the optimal solution: As useful as a stream implementation is, it requires to save a lot of information about the state of the compression, in order to resume it once we have enough input.

This is why, in some cases where we would be sure that we have our whole input in one go, we might want to avoid all of these side-costs, and directly go to the point.

State of the art: libdeflate

This new problematic in mind, we have started thinking about the existing implementations of these algorithms which were also bypassing the stream behavior. One implementation that proved to be a suitable example for our problem, was the library libdeflate, an implementation in C. It's main advantages being: a better compression ratio than zlib and with faster runtime.

It was used as the solid base for the OCaml implementation provided by this new release.

OCaml version of libdeflate, performances and use cases

Inheriting the logic of libdeflate, the new implementation now has a better compression ratio, while being slightly faster at it. On the other side, the decompression is way faster, with 33% of speed increase in most tested cases: On the ~book2 (from the Calgary corpus) file:

  • decompress (stream): 15 Mb/s (deflation), 76 Mb/s (inflation), ratio: 42.46 %
  • decompress (non-stream): 17 Mb/s (deflation), 105 Mb/s (inflation), ratio: 34.66 %

Now that this is in place, the users of decompress will be able to choose between the two versions, according to their needs. In the case of ocaml-git, the vast majority of the git objects are small and will be compressed in one go. This is why we updated with the new implementation when possible.

Writing optimized code and profiling it

One of the biggest concerns of this release was to be able to produce optimized code. The base code being coded in C, a lot of sub-optimal behavior where ported in the OCaml version: for and while loops, references everywhere, mixes of struct and union., it needed a lot of clean up.

This is why once the main iteration was done, we have spent several weeks profiling the code base, using the OCaml library landmarks, flamegraph or simply the linux binary perf. This work, sometimes tedious, proved to be helpful and healthy for both the harmonization of the code and it's performances.

Decompress & MirageOS

Compression algorithms are a really important piece in many projects, and operating systems do not avoid this. decompress was coded from the start with the idea of being used in the much larger project MirageOS.

This release is another opportunity to broaden MirageOS’s reach, by providing one more algorithm to it’s stack, allowing us to specialise even more the unikernels that would have a need for inflation/deflation algorithms. This more restrictive implementation, as we need to have the whole input in one go, will allow us to take advantage of the situation and give more flexibility for the user.

The positive aspects of this release will most likely show up soon enough, as we make use of decompress to its full potential

elliptic curves - maintainable and verified (full stack, from primitives to TLS)

Hannes Mehnert announced

over the last month I worked on upgrading the cryptography stack for OCaml and MirageOS. I just published a blog post. Enhancments of OCaml-TLS (usenix security paper from 2015) and X.509 are in place.

The main achievement after TLS 1.3 support (since May 2020, 0.12.0) is that elliptic curve certificates are now supported. Elliptic curve cryptography uses fiat. The X509 implementation now supports PKCS 12 (used by browsers and other software (e.g. OpenVPN) to bundle certificates and private keys).

Get mirage-crypto-ec, x509 0.13.0 and tls 0.13.1 (all available in the opam-repository). Discussion and feedback appreciated.

First release of Docteur, an opiniated read-only file-system for MirageOS

Calascibetta Romain announced

I'm glad to announce the first release of docteur, a simple tool to make and use (in read-only) a "file-system" for MirageOS. As you know, with MirageOS, we don't have sockets, kernel space or even file-descriptor. It's not possible to manipulate files standalonely and many primitives commonly available with the unix module don't exists in our space.

Therefore, it is difficult to imagine making a website that displays local files or a database system. But in our spirit of separation of services, it becomes possible for your unikernel to communicate over the network to a "file system" or a database.

For quite some time we have been experimenting with a file system external to our unikernel called Git. This is the case of pasteur which saves the pastes in a Git repository. It is also the case of unipi or Canopy which display the content of a Git repository and can resynchronize with it using a hook. Or the case of our primary DNS server whose zone file comes from a Git repository - we can then trace all the changes on this file.

However, we have several limitations:

  1. it requires the Git repository to load into memory in your unikernel
  2. it requires a communication (external with GitHub or internal in a private network)

The persistent aspect is very important. We should always be able to launch a unikernel and not lose the data if our system shuts down.

The mutable aspect (modify a file) is useful in some cases but not in others. As for unipi for example (a simple static web site), the difference between resynchronizing with a hook or restarting the unikernel with a new version of your filesystem is minor.

Docteur as a second solution

This is where Doctor comes in. It solves both of our problems by offering the generation of a file system from scratch:

  • a Git repository (local or available on a service)
  • a specific folder

Doctor is able to create a complete representation of a folder and to compress it at such a ratio that a generation of the documentation of several OPAM packages with all their versions making 14 Gb is reduced to an image of only 280 Mb!

Such a high compression ratio is in particular due to a double level of compression by decompress and duff. For more details, Docteur just generates a slightly modified PACK file with carton.

Then, Docteur proposes a simple library which makes available 2 ways to manipulate this image for your unikernel:

  1. a way that is fast but with a consequent boot time
  2. a slower way but with no cost to the boot time

The first way will simply "analyze" the image to re-extract the layout of your file system. Then it uses the ART data-structure to save this layout. So, whenever you want a specific file and according to ART benchmarks, you have access to the content very quickly.

The problem remains the analysis which takes place at boot time and which can take a very long time (it depends essentially on the number of files you have). There can also be an impact on memory usage as the ART data structure is in memory - the more files there are, the bigger the structure is.

The second method is more "silly". Each time you request a file, we will have to rebuild the entire path and therefore deserialize several objects (like folders). The advantage is that we don't analyze the image and we don't try to maintain a layout of your file system.

Example

Docteur is meant to be simple. The generation of the image is done very simply by the command make:

$ docteur.make -b refs/heads/main https://github.com/dinosaure/docteur disk.img
$ docteur.make -b refs/heads/main git@github.com:dinosaure/docteur disk.img
$ docteur.make -b refs/heads/main git://github.com/dinosaure/docteur disk.img
$ docteur.make -b refs/heads/main file://$(pwd)/dev/docteur disk.img

Then, Docteur proposes 2 supports: Unix & Solo5. For Unix, you just have to name explicitly the image file to use. For the case of Solo5 (and thus of virtualization). You just have to find a name for a "block device" and to reuse this name with the Solo5 "tender" specifying where the image is.

$ cd unikernel
$ mirage configure -t unix --disk disk.img
$ make depends
$ mirage build
$ ./simple --filename README.md
$ cd unikernel
$ mirage configure -t hvt --disk docteur
$ make depends
$ mirage build
$ solo5-hvt --block:docteur=disk.img -- simple.hvt --filename README.md

Finally, Docteur proposes another tool that checks (and analyzes) an image to give you the version of the commit used (if the image comes from a Git repository) or the hash of your file system produced by the calculation of a Merkle tree.

$ docteur.verify disk.img
commit  : ad8c418635ca6683177c7ff3b583e1ea5afea78f
author  : "Calascibetta Romain" <romain.calascibetta@gmail.com>
root    : bea10b6874f51e3f6feb1f9bcf3939933b2c4540

Merge pull request #11 from dinosaure/fix-tree-expanding

Fix how we expand our file-system

Conclusion

Many times people ask me for a purpose in MirageOS such as a website or a particular service. I think that Docteur shows one essential thing about MirageOS, it is a tool and an ecosystem. But it's not an endpoint that is concretized in a specific application.

Docteur is not THE solution to our problems and answers a specific use case. What is important to note is not what Docteur does but the possibility for our ecosystem and our tools to allow the development of Docteur. As it allows the development of a trillion applications!

As such, I say to those people to "play" with MirageOS if they want to learn more. Our goal is not to show you applications that you could then deploy easily (even if we are working on this aspect too) but to give you the possibility to imagine your OS (independently from our vision)!

And if you try, we'll be happy to help you!

Ocaml-solidity, a new OCaml library for Solidity

OCamlPro announced

We are pleased to announce our new OCaml library, ocaml-solidity ! Ocaml-solidity is a program manipulation library that provides a Solidity parser and typechecker.

Our library is made for developers on Solidity code analysis, it builds a typechecked AST that can be analyzed with a provided visitor. Please note that our parser and typecheck conforms mostly to Solidity 0.7, inline assembly is not supported. Take a look at our documentation.

You can test it and report bugs just here!

Migrating to floatarray (blog post)

Nicolás Ojeda Bär announced

At LexiFi we recently migrated our codebase to use floatarray in place of float array in order to disable the "flat float array" mode in the compiler. If you are interested in finding out more about how we did it, we wrote a blog post about it https://www.lexifi.com/blog/ocaml/floatarray-migration/. Enjoy!

Old CWN

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.