OCaml Weekly News
Hello
Here is the latest OCaml Weekly News, for the week of June 09 to 16, 2020.
Table of Contents
First release of monolith
François Pottier announced
It is my pleasure to announce the first release of Monolith.
Monolith offers facilities for testing an OCaml library (for instance, a data
structure implementation) by comparing it against a reference implementation.
It uses a form of black-box testing, and relies on afl-fuzz
for efficiency.
The user must describe what types and operations the library provides. Under the best circumstances, this requires 2-3 lines of code per type or operation. The user must also provide a reference implementation of the library.
Then, like a monkey typing on a keyboard, Monolith attempts to exercise the library in every possible way, in the hope of discovering a scenario where the library behaves incorrectly. If such a scenario is discovered, it is printed in the form of an OCaml program, so as to help the user reproduce the problem.
At this time, a tutorial is not yet available. There is however an API documentation and a number of demos.
Repository: https://gitlab.inria.fr/fpottier/monolith
API Documentation: http://cambium.inria.fr/~fpottier/monolith/doc/monolith/Monolith/index.html
Installation:
opam update opam install monolith
Sylvain Conchon joined OCamlPro's team
OCamlPro announced
Sylvain Conchon joined OCamlPro's team as Formal Methods CSO. He created Alt-Ergo and has been teaching OCaml in universities for about 20 years. He shares thoughts on interactions between industry and research labs, and his vision of Formal methods and OCaml as language for the industry. Read his interview on our blog: https://www.ocamlpro.com/2020/06/05/interview-sylvain-conchon-cso-on-formal-methods/
First release of streaming
Rizo announced
It is my pleasure to announce the first public release of streaming
– a library for building efficient, incremental
data processing pipelines that compose and don't leak resources.
I built streaming as a result of many experiments with different streaming and iteration models for OCaml. There are
multiple packages on OPAM that share some of the goals of streaming
(we even have Stdlib.Seq
now!), but none of
them combine (1) excellent performance, (2) safe resource handling and (3) pure functional style for combinators.
Streaming solves these problems by implementing three basic and independent models: sources, sinks and flows –
they represents different parts of the pipeline that correspond to producing, consuming and transforming elements.
These models can be defined and composed independently to produce reusable "streaming blocks".
The library defines a central Stream
model that relies on sources, sinks and flows. This model is a push-based
iterator with performance characteristics similar to the iter
iterator, which has type ('a -> unit) -> unit
, and
is known for being very efficient. But unlike iter
, it has a pure functional core (no need to use mutable state and
exceptions for flow control!) and can handle resource allocation and clean up in a lazy and deterministic way. All of
this while having a slightly better performance for common stream operations.
For those who are curious about the performance characteristics of streaming
and other models, I created a
dedicated repository for stream benchmarks: https://github.com/rizo/streams-bench. In particular, it includes a few
simple benchmarks for Gen
, Base.Sequence
, Stdlib.Seq
, Iter
, Streaming.Stream
and Streaming.Source
.
The library should soon be published on opam. In the meantime, I invite you to read the docs and explore the code:
- Library documentation: https://odis-labs.github.io/streaming
- Github project: https://github.com/odis-labs/streaming
Guillaume Bury askec
That's great ! From the benchmarks, it looks like you hit a really good implementation !
I've looked (maybe a bit fast) at the API documentation, and it is admittedly a bit outside the scope of streams/iterators, but I was wondering if there was some proper way to:
- connect a sink to a source to create some loop
- have some kind of fixpoint on streams
I guess it would always be possible to use some references and/or some complex functions to encode these into the provided API, but I was wondering if there was a clean way to do it.
For a bit of context and explanation, what I have in mind is the case of a program (let's say a type-checker or
something close to the idea) with a persistent state, that should operate over a stream of inputs, which are
top-level phrases, and produce some outputs, for instance print some result for each correctly type-checked statement
(and an error otherwise).
The type-checker would basically be a function of type (`input * `state) -> (`output * `state)
, and starting
from an initial state, it would process an input element (giving the output to some sink), and then the next input
element would be processed with the state that was reached after processing the previous element: the state would
reach the sink of the flow, and then be inserted back into the source.
Separately, imagine the language being type-checked has a notion of include, then one of the step of the flow would
be to expand each include into a stream of inputs/phrases, but each of the phrases in this stream would need to be
expanded, so a simple flat_map~/~flatten
is not enough.
I already have a custom implementation that handle these features, but I was wondering whether I could use
streaming
to handle most of the code linking all of the steps, ^^
Rizo replied
if there was some proper way to:
- connect a sink to a source to create some loop
- have some kind of fixpoint on streams
Regarding the first point: yes! That's exactly the point of the Stream
module. You see, sources are pull-based
abstractions, while sinks are push-based. Source's type essentially says something like "I might give you some data,
if you ask", while sink's type is the opposite "I might take some data, if you give it to me". They are completely
and intentionally decoupled; it is Stream's role to drive the computation by pulling data from sources and pushing it
into sinks. So the easiest way to connect them is:
Stream.(from srouce |> into sink)
Of course, that's not very useful per se, but it illustrates my point. Take a look at the
Stream.from
code to see the implementation
of the loop you're asking for. It does some extra work to ensure that resources are correctly handled, but it should
be clear what the loop is doing.
The stream types in the library are currently abstract because I didn't want to commit to a particular representation
just yet. If this is a problem for your use case, let me know, I'll expose them in a Private
module.
Regarding the second point: I'm not sure what you mean in practice by "fixpoint on streams". I guess the one thing
that could help implement something like that is the
Stream.run
function. It
allows you to continue reading elements from a source even after a sink is filled by returning a leftover stream.
This stream can be used with Stream.run
repeatedly.
Alternatively there's also
Flow.through
, which
consumes input trying to fill sinks repeatedly and produces their aggregated values as a stream. Super useful for
things like streaming parsing. Might even help with your use-case for top-level phrases.
On a more general note though, the type ('input * 'state) -> ('output * 'state)
looks a lot like a mealy
machine. Streaming.Sink
is a moore
machine, which is slightly less general because the output values do
not depend on input values, only on the state.
I thought about exposing different kinds of sinks in streaming, but wanted to make sure that the common use cases are covered first. I'll keep your case in mind for future versions of the library.
Senior software engineer at Asemio in Tulsa, OK
Simon Grondin announced
We are Asemio and our team of data scientists, software engineers, architects, and management consultants are working together to achieve a nationwide data ecosystem for social good.
You’ll be working on the Asemio Community Integration Platform. It features state-of-the-art privacy-preserving, pre-processing and pipeline management, as well as record linkage technology.
The back end is written in OCaml. The front end is compiled from OCaml to JavaScript and uses a modern MVC framework. The work you’ll be doing will touch numerous technical disciplines, including cryptography, distributed systems, language design and implementation, data analytics, and data visualizations.
We prefer candidates willing to relocate, but we could make an exception for an exceptional candidate.
For more information or to apply, please refer to our SE listing: https://stackoverflow.com/jobs/401383/ocaml-senior-software-engineer-asemio
Other OCaml News
From the ocamlcore planet blog
Here are links from many OCaml blogs aggregated at OCaml Planet.
Old CWN
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.