OCaml Weekly News

Previous Week Up Next Week

Hello

Here is the latest OCaml Weekly News, for the week of June 15 to 22, 2021.

Table of Contents

First releases of dirsp-exchange: auditable variant of Signal Protocol and ProScript-to-OCaml translator

jbeckford announced

I'm pleased to announce the first release of dirsp-exchange, available today from the Opam repositories.

The intent of the [dirsp] libraries is to provide software engineers with auditable source code that has some level of safety assurance (typically proofs) from security researchers.

The first libraries are:

  • dirsp-exchange-kbb2017 0.1.0 - The KBB2017 protocol for securing a two-party conversation. Similar to Signal Protocol v3 and Olm Cryptographic Ratchet.
  • dirsp-ps2ocaml 0.1.0 - A ProScript to OCaml translator. ProScript is an executable subset of JavaScript that can be formally verified.

and a couple more supporting libraries.

dirsp-exchange-kbb2017 has a build process that generates its own OCaml code using dirsp-ps2ocaml on formally verified ProScript source code.

The canonical example for dirsp-exchange-kbb2017 is:

module P       = Dirsp_proscript_mirage.Make()
module ED25519 = P.Crypto.ED25519
module K       = Dirsp_exchange_kbb2017.Make(P)
module U       = K.UTIL

(* Alice sends a message to Bob *)
let aliceSessionWithBob = T.newSession (* ... supply some keys you create with ED25519 and U ... *) ;;
let aliceToBobSendOutput = T.send
  aliceIdentityKey
  aliceSessionWithBob
  (P.of_string "Hi Bob!")

(* Now you can send the output "aliceToBobSendOutput" from Alice to Bob.
   Let's switch to Bob's computer. He gets notified of a new message using a notification library of
your choosing, and then does ...  *)

let bobSessionWithAlice = T.newSession (* ... supply some keys ... *);;
let bobFromAliceReceiveOutput = T.recv
  bobIdentityKey
  bobSignedPreKey
  bobSessionWithAlice
  theEncryptedMessageBobReceivedFromAlice
assert (bobFromAliceReceiveOutput.output.valid)
Format.printf "Bob just received a new message: %s\n"
  (bobFromAliceReceiveOutput.plaintext |> P.to_bytes |> Bytes.to_string)

These are early releases, especially dirsp-ps2ocaml.

Online docs are at https://diskuv.github.io/dirsp-exchange

Feedback, contributions and downloads are very welcome!

Job offer: 3 year research engineer in static analysis of OCaml programs at Inria Rennes

Benoit Montagu announced

as part of a project between Inria and Nomadic Labs, we are offering a 3 year research engineer position, to work on static analysis for OCaml programs. The position will start in October in the Celtique Inria research team, in the vibrant city of Rennes, France. If you are a talented OCaml programmer, if you are interested in static analysis, or if you simply want to know more about this project, please contact me!

The detailed job description is here: https://jobs.inria.fr/public/classic/fr/offres/2021-03821

Please feel free to transfer this announce to people that you think could be interested.

IRC channels available on libera.chat

Deep in this thread, Romain Calascibetta announced

Just to let you know that I spent a time to re-implement the IRC protocol in OCaml and to deploy a simple MirageOS as a logger to save discussions into a Git repository. The bot is currently deployed, the explanation is available here: https://github.com/dinosaure/cri/tree/master/unikernel And used for #mirage@irc.libera.chat

It's a nice example about MirageOS/unikernel and I may deploy one to save #ocaml@irc.libera.chat as whitequark already does with her bot.

Set up OCaml 2.0.0-beta

Sora Morimoto announced

Hopefully, this will be the last release before stable 2.0.0. This release allows you to add multiple custom repositories, which enables testing with multicore and beta repository.

- name: Use Multicore OCaml
  uses: ocaml/setup-ocaml@v2
  with:
    ocaml-compiler: ocaml-variants.4.12.0+domains+effects
    opam-repositories: |
      multicore: https://github.com/ocaml-multicore/multicore-opam.git
      default: https://github.com/ocaml/opam-repository.git

First release of Jsonxt - a set of JSON parsers and writers

Stephen Bleazard announced

Jsonxt provides a number of JSON parsers and writers for RFC 8259 compliant JSON as well as non-standard extensions supported by Yojson. Features include

  • RFC 8259 compliant when in strict and basic mode
  • Performance focused especially for files and strings
  • Support for standard and extended JSON tree types:
    • Strict follows a strict interpretation of RFC 8259 with all numbers represented as floats.
    • Basic extends the strict type to include convenience types while maintaining RFC compliance. This is compatible with Yojson's Basic type
    • Extended adds additional non-standard types including tuples and variants and is not RFC compliant. This is compatible with Yojson's Safe type
  • A number of different parsers including
    • A standard JSON tree parser for various sources including string, file and channel
    • A Stream parser that returns a stream of raw JSON tokens.
    • A monad based parser compatible with async
  • Writers including
    • File and string writers
    • A monad based writer that is compatible with async
    • A stream writer that converts a stream of JSON tokens
  • Support for streaming JSON via the Stream module
  • Standard interfaces including Yojson compatibility
  • Support for ppx_deriving_yojson and ppx_yojson_conv via Yojson compatibility

The package is available via opam, with documentation on github.io. The source can be found at github/jsonxt

mula 0.1.0, ML's radishal Universal Levenshtein Automata library

Ifaz Kabir announced

I'm happy to announce the release of my library mula. The package uses Universal Levenshtein Automata (ULA) to not only check if a word is within a certain edit distance of another, but to also output what the edit distance is! It uses the automata themselves to calculate edit distances. A fun use case for this is that we can feed a set of words to the automaton and immediately rank the words by their edit distance.

Mula supports both the standard Levenshtein edit distance as well as the Demarau-Levenshtein distance which counts transpositions of two adjacent characters as a single edit. I also support getting live error counts, so you can feed part of a string into an automaton, and get the minimum number of errors that have occurred so far.

I currently have matching working using non-deterministic ULA, but I have partially started the work toward the deterministic versions. It should be possible to pre-compute the DFAs for up to edit distance 3 and pack it with the library, never needing to be recomputed because the Universal Automata are independent of the input strings. But the non-deterministic automata support very large edit distances: (Sys.int_size - 1)/2, so they have value on their own.

This library came about from a desire to add a "did you mean" feature to a toy compiler, but not wanting to write the kind of dynamic programming code that you can find in the OCaml compiler [1] or merlin/spelll [2,3].

You can find the library here and the documentation here. It's not on opam yet, but I have submitted a pull request.

Happy OCamling!

References:

  1. Edit distance in the OCaml compiler. https://github.com/ocaml/ocaml/blob/e5e9c5fed56efdd67601e4dbbaebeb134aee361c/utils/misc.ml#L516.
  2. Edit distance in merlin. https://github.com/ocaml/merlin/blob/444f6e000f6b7dc58dac44d6ac096fc0e09894cc/src/utils/misc.ml#L527
  3. Edit distance in spelll. https://github.com/c-cube/spelll/blob/3da1182256ff2507a0be812f945a7fe1a19adf9b/src/Spelll.ml#L26

Ifaz Kabir then added

Some details:

I followed the paper by Touzet [1] as much as possible. If you take a look at the code, you'll see a a lot of +1's for 1-indexing. This was to keep the implementation as close to the paper as possible! (If you do want to check the implementation against the paper, note that the paper has a typo in Definition 2). For the Demarau-Levenshtein automaton, I adapted Figure 9 from Mitankin's thesis [2]. I'm convinced that my adaptation works, but my adaptation of Touzet's subsumption relation for Demarau-Levenshtein might be slightly sub-optimal. If you have question about the adaptation, feel free to ask!

mula does not completely replace c-cube's spelll package. In particular I don't support any indexs, etc. But there are some interesting differences in the automata they use. (w stands for the base word here)

  1. The spelll package creates the Levenshtein Automaton for a single string/word (LA_w), mula uses Universal Levenshtein Automata (ULA).
  2. Spelll computes a DFA from a non-deterministic automaton that uses eplison transitions. ULA do not have epsilon transitions, but for transitions it looks ahead into the base word w. Additionally the NFA's states/transitions are computable on the fly, so there is no need to store the NFA in memory.
  3. Spelll's automata transitions using characters. mula computes a bitvector from an input character to transition from states to states. (Computing the bitvector is where the look ahead comes in).
  4. Spelll's automata return true~/~false, and uses a separate function to calculate edit distances. Mula uses the automaton itself to calculate edit distances, the outputs have type int option. (LA_w can be modified to support this though!)

References:

  1. On the Levenshtein Automaton and the Size of the Neighborhood of a Word. Hélène Touzet https://hal.archives-ouvertes.fr/hal-01360482/file/LATA2016.pdf
  2. Universal Levenstein Automata: Building and Properties. Petar Nikolaev Mitankin. https://store.fmi.uni-sofia.bg/fmi/logic/theses/mitankin-en.pdf

New release of mlcuddidl, the OCaml interface to the CUDD BDD library

nberth announced

I'm pleased to write this first release announcement for the mlcuddidl package.

These bindings to the CUDD BDD library were initially written by Bertrand Jeannet and have been around as an OPAM package for quite some time now. The source code is now hosted on framagit.

This release of version 3.0.7 mostly ports the package to OCaml versions ≥ 4.10.

first release of orf: OCaml Random Forests

UnixJunkie announced

I finished implementing a classifier and regressor using Random Forests (seminal paper: https://link.springer.com/article/10.1023/A:1010933404324):

https://github.com/UnixJunkie/orf

Some caveats:

  • this is somewhat slow; especially the classifier (and I don’t know so much how to accelerate it; probably two orders of magnitude slower than sklearn).
  • this is not super generic (int IntMap sparse features only; i.e. a sparse vector of integers represents a sample).

The package is now available in opam (opam install orf).

Two interfaces are exposed:

RFC (for classification) https://github.com/UnixJunkie/orf/blob/master/src/RFC.mli

RFR (for regression) https://github.com/UnixJunkie/orf/blob/master/src/RFR.mli

The test file shows some usage examples: https://github.com/UnixJunkie/orf/blob/master/src/test.ml

If you want to help, I tried to flag a few things for the near future: https://github.com/UnixJunkie/orf/issues

If you use it and if it is useful to you, I would be happy to know.

Old CWN

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.