OCaml Weekly News

Hello

Here is the latest OCaml Weekly News, for the week of July 29 to August 05, 2014.

The ML Family workshop: program and the 2nd call for participation
Core Suite 111.25.00
findlib-1.5.2
OCaml 4.02 || arm64 || ppc64le
Why AVL-tree?
ocaml-gettext 0.3.5
Other OCaml News

The ML Family workshop: program and the 2nd call for participation

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-07/msg00185.html

oleg announced:

Higher-order, Typed, Inferred, Strict: ACM SIGPLAN ML Family Workshop
Thursday September 4, 2014, Gothenburg, Sweden

Call For Participation and Program         http://okmij.org/ftp/ML/ML14.html

Early registration deadline is August 3. Please register at
https://regmaster4.com/2014conf/ICFP14/register.php

The workshop is conducted in close cooperation with the
OCaml Users and Developers Workshop  http://ocaml.org/meetings/ocaml/2014/
taking place on September 5.

*** Program with short summaries  ***
(The online version links to the full 2-page abstracts)

* Welcome 09:00
* Session 1: Module Systems 09:10 - 10:00

1ML -- core and modules as one (Or: F-ing first-class modules)
Andreas Rossberg
  We propose a redesign of ML in which modules are first-class
  values. Functions, functors, and even type constructors are one and
  the same construct. Likewise, no distinction is made between
  structures, records, or tuples, including tuples over types. Yet,
  1ML does not depend on dependent types, and its type structure is
  expressible in terms of plain System F-omega, in a minor variation
  of our F-ing modules approach. Moreover, it supports (incomplete)
  Hindley/Milner-style type inference. Both is enabled by a simple
  universe distinction into small and large types.

Type-level module aliases: independent and equal
Jacques Garrigue; Leo White
  We promote the use of type-level module aliases, a trivial extension
  of the ML module system, which helps avoiding code dependencies, and
  provides an alternative to strengthening for type equalities.

* Session 2: Beyond Hindley-Milner 10:30 - 11:20

The Rust Language and Type System (Demo)
Felix Klock; Nicholas Matsakis
  Rust is a new programming language for developing reliable and
  efficient systems. It is designed to support concurrency and
  parallelism in building applications and libraries that take full
  advantage of modern hardware. Rust's static type system is safe and
  expressive and provides strong guarantees about isolation,
  concurrency, and memory safety.
  Rust also offers a clear performance model, making it easier to
  predict and reason about program efficiency. One important way it
  accomplishes this is by allowing fine-grained control over memory
  representations, with direct support for stack allocation and
  contiguous record storage. The language balances such controls with
  the absolute requirement for safety: Rust's type system and runtime
  guarantee the absence of data races, buffer overflows, stack
  overflows, and accesses to uninitialized or deallocated memory. In
  this demonstration, we will briefly review the language features
  that Rust leverages to accomplish the above goals, focusing in
  particular on Rust's advanced type system, and then show a
  collection of concrete examples of program subroutines that are
  efficient, easy for programmers to reason about, and maintain the
  above safety property.

Doing web-based data analytics with F# (Informed Position)
Tomas Petricek; Don Syme
  With type providers that integrate external data directly into the
  static type system, F# has become a fantastic language for doing
  data analysis. Rather than looking at F# features in isolation, this
  paper takes a holistic view and presents the F# approach through a
  case study of a simple web-based data analytics platform.

* Session 3: Verification 11:40 - 12:30

Well-typed generic smart-fuzzing for APIs (Experience report)
Thomas Braibant; Jonathan Protzenko; Gabriel Scherer
  In spite of recent advances in program certification, testing
  remains a widely-used component of the software development cycle.
  Various flavors of testing exist: popular ones include unit testing,
  which consists in manually crafting test cases for specific parts of
  the code base, as well as quickcheck-style testing, where instances
  of a type are automatically generated to serve as test inputs.
  These classical methods of testing can be thought of as internal
  testing: the test routines access the internal representation of the
  data structures to be checked. We propose a new method of external
  testing where the library only deals with a module interface. The
  data structures are exported as abstract types; the testing
  framework behaves like regular client code and combines functions
  exported by the module to build new elements of the various abstract
  types. Any counter-examples are then presented to the
  user. Furthermore, we demonstrate that this methodology applies to
  the smart-fuzzing of security APIs.

Improving the CakeML Verified ML Compiler
Ramana Kumar; Magnus O. Myreen; Michael Norrish; Scott Owens
  The CakeML project comprises a mechanised semantics for a subset of
  Standard ML and a formally verified compiler. We will discuss our
  plans for improving and applying CakeML in four directions:
  optimisations, new primitives, a library, and verified applications.

* Session 4: Implicits 14:00 - 14:50
        
Implicits in Practice (Demo)
Nada Amin; Tiark Rompf
  Popularized by Scala, implicits are a versatile language feature
  that are receiving attention from the wider PL community. This demo
  will present common use cases and programming patterns with
  implicits in Scala.

Modular implicits
Leo White; Frédéric Bour
  We propose a system for ad-hoc polymorphism in OCaml based on using
  modules as type-directed implicit parameters.

* Session 5: To the bare metal 15:10 - 16:00

Metaprogramming with ML modules in the MirageOS (Experience report)
Anil Madhavapeddy; Thomas Gazagnaire; David Scott; Richard Mortier
  In this talk, we will go through how MirageOS lets the programmer
  build modular operating system components using a combination of
  functors and metaprogramming to ensure portability across both Unix
  and Xen, while preserving a usable developer workflow.

Compiling SML# with LLVM: a Challenge of Implementing ML on a Common
Compiler Infrastructure
Katsuhiro Ueno; Atsushi Ohori
  We report on an LLVM backend of SML#. This development provides
  detailed accounts of implementing functional language
  functionalities in a common compiler infrastructure developed mainly
  for imperative languages. We also describe techniques to compile
  SML#'s elaborated features including separate compilation with
  polymorphism, and SML#'s unboxed data representation.

* Session 6: No longer foreign 16:30 - 18:00

A Simple and Practical Linear Algebra Library Interface with Static
Size Checking (Experience report)
Akinori Abe; Eijiro Sumii
  While advanced type systems--specifically, dependent types on
  natural numbers--can statically ensure consistency among the sizes
  of collections such as lists and arrays, such type systems generally
  require non-trivial changes to existing languages and application
  programs, or tricky type-level programming. We have developed a
  linear algebra library interface that guarantees consistency (with
  respect to dimensions) of matrix (and vector) operations by using
  generative phantom types as fresh identifiers for statically
  checking the equality of sizes (i.e., dimensions).  This interface
  has three attractive features in particular.  (i) It can be
  implemented only using fairly standard ML types and its module
  system. Indeed, we implemented the interface in OCaml (without
  significant extensions like GADTs) as a wrapper for an existing
  library.  (ii) For most high-level operations on matrices (e.g.,
  addition and multiplication), the consistency of sizes is verified
  statically.  (Certain low-level operations, like accesses to
  elements by indices, need dynamic checks.)  (iii) Application
  programs in a traditional linear algebra library can be easily
  migrated to our interface. Most of the required changes can be made
  mechanically.  To evaluate the usability of our interface, we ported
  to it a practical machine learning library (OCaml-GPR) from an
  existing linear algebra library (Lacaml), thereby ensuring the
  consistency of sizes.

SML3d: 3D Graphics for Standard ML (Demo)
John Reppy
  The SML3d system is a collection of libraries designed to support
  real-time 3D graphics programming in Standard ML (SML). This paper
  gives an overview of the system and briefly highlights some of the
  more interesting aspects of its design and implementation.

* Poster presentation, at the joint poster session with the OCaml workshop
Nullable Type Inference
Michel Mauny; Benoit Vaugon
  We present a type system for nullable types in an ML-like
  programming language. We start with a simple system, presented as an
  algorithm, whose interest is to introduce the formalism that we
  use. We then extend it as system using subtyping constraints, that
  accepts more programs. We state the usual properties for both
  systems. This is work in progress.


Program Committee

Kenichi Asai             Ochanomizu University, Japan
Matthew Fluet            Rochester Institute of Technology, USA
Jacques Garrigue         Nagoya University, Japan
Dave Herman              Mozilla, USA
Stefan Holdermans        Vector Fabrics, Netherlands
Oleg Kiselyov (Chair)    University of Tsukuba, Japan
Keiko Nakata             Tallinn University of Technology, Estonia
Didier Remy              INRIA Paris-Rocquencourt, France
Zhong Shao               Yale University, USA
Hongwei Xi               Boston University, USA

Core Suite 111.25.00

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-07/msg00191.html

Jeremie Dimino announced:

I am pleased to announce the 111.25.00 release of the Core suite.

With this release all packages are now compatible with OCaml 4.02.

The following packages were upgraded:

- async_extra
- async_kernel
- async
- async_parallel
- async_unix
- core_extended
- core_kernel
- core
- custom_printf
- jenga
- ocaml_plugin
- patdiff
- patience_diff
- sexplib
- textutils

Files and documentation for this release are available on our
website and all packages are in opam:

https://ocaml.janestreet.com/ocaml-core/111.25.00/individual/
https://ocaml.janestreet.com/ocaml-core/111.25.00/doc/

Here is list of changes for this version:

## 111.25.00

### async

- add a dns example

### async_extra

- Removed `lazy` from the core of `Log`.
- Made `Log.Message.t` have a stable `bin_io`.

The `Stable.V1` is the current serialization scheme, and `Stable.V0`
is the serialization scheme in 111.18.00 and before, which is needed
to talk to older systems.
- Changed `Rpc` to return `Connection_closed` if a connection ends
before a response makes it to the caller.

Previously, the dispatch output was never determined.

Also, removed an unused field in one of the internal data structures
of Async RPC.
- In `Versioned_rpc`, added `version:int` argument to `implement_multi
` functions.
- In `Versioned_rpc`, the `Pipe_rpc.Make` functors now return an
additional output functor.

`Register'` is like `Register` but has in its input module:

```ocaml
val response_of_model :
Model.response Queue.t -> response Queue.t Deferred.t
```

rather than

```ocaml
val response_of_model : Model.response -> response
```

This is analogous to `Pipe.map'` and `Pipe.map`.
- Added to `Log` a `V2` stable format and better readers for
time-varying formats.
- In `Log`, added an optional `?time:Time.t` argument to allow callers
to pass in the logged time of an event rather than relying on
`Time.now ()`.

### async_kernel

- Fixed `Clock.run_at_intervals` to make the initial callback at an
interval multiple.

Previously, if `start` was in the past, `f` would run immediately
rather than waiting for an interval of the form `start + i * span`.
Now it always waits for an interval, even the first time it runs.

### async_parallel

- improve error handling

### async_unix

- Added `Unix.Addr_info` and `Name_info`, which wrap `getaddrinfo` and
`getnameinfo`.
- Improved the scheduler's error message when the thread pool is
stuck.

### core

- Added `Gc.disable_compaction` function.
- Added `Time.to_string_abs_trimmed`, which prints a trimmed time and
takes a `zone` argument.
- Fixed `unix_stubs.c` to suppress a warning when building with some
versions of gcc.
- Changed `Time.Zone` to allow the zoneinfo location to be specified
by an environment variable.

Closes #40
- Fix compatibility with 4.02

### core_extended

- Moved `Quickcheck` from `core`.
- Added [Int.gcd].

### core_kernel

- Fix build on FreeBSD

Closes #10
- Added functions to `Container` interface: `sum`, `min_elt`,
`max_elt`.

```ocaml
(** Returns the sum of [f i] for i in the container *)
val sum
: (module Commutative_group.S with type t = 'sum)
-> t -> f:(elt -> 'sum) -> 'sum

(** Returns a min (resp max) element from the collection using the
provided [cmp]
function. In case of a tie, the first element encountered while
traversing the
collection is returned. The implementation uses [fold] so it has the
same
complexity as [fold]. Returns [None] iff the collection is empty. *)
val min_elt : t -> cmp:(elt -> elt -> int) -> elt option
val max_elt : t -> cmp:(elt -> elt -> int) -> elt option
```
- Made `Core_hashtbl_intf` more flexible. For instance supports
modules that require typereps to be passed when creating a table.

Address the following issues:

The type `('a, 'b, 'z) create_options` needs to be consistently used
so that `b` corresponds with the type of data values in the returned
hash table. The type argument was wrong in several cases.

Added the type `('a, 'z) map_options` to `Accessors` so that
map-like functions -- those that output hash tables of a different
type than they input -- can allow additional arguments.
- Fixed a bug in `Dequeue`'s `bin_prot` implementation that caused it
to raise when deserializing an empty dequeue.
- Made `Container.Make`'s interface match `Monad.Make`.
- Deprecated infix `or` in favor of `||`.
- Simplified the interface of `Arg` (which was already deprecated in
favor of `Command`).
- Replaced `Bag.fold_elt` with `Bag.filter`.
- `Memo.general` now raises on non-positive `cache_size_bound`.
- Removed `Option.apply`.
- Removed `Result.call`, `Result.apply`.
- Moved `Quichcheck` to `core_extended`.

It should not be used in new code.

### custom_printf

- Fix 4.02 compatibility.

### jenga

- Switched to un-version-numbered API.
- Renamed `Tenacious_sample_lib.Tenacious` to
`Tenacious_sample_lib.Tenacious_sample` to avoid conflicts in the
public release.
- Write `buildable_targets.list` (on alias `.info`).

### ocaml_plugin

- ignore more warnings by default

### patdiff

- add a `?file_names` argument to `Compare_core.diff_strings`

### patience_diff

- refactoring and more unit tests

### sexplib

- Fix compatibility with OCaml 4.02

findlib-1.5.2

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00001.html

Gerd Stolpmann announced:

This version includes a few fixes concerning the -ppx option (e.g.
support it in the toploop).

More documentation and download:
http://projects.camlcity.org/projects/findlib.html

OCaml 4.02 || arm64 || ppc64le

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00011.html

Richard Jones announced:

Just a friendly note for anyone who is trying to build OCaml or
OCaml packages for any or all of the following:

 - OCaml 4.02
 - i686
 - x86-64
 - arm64
 - ppc64le

I have been fixing many issues in the Fedora packages.  You can look
at the patches or build changes I have needed to apply in Fedora by
going to:

  http://pkgs.fedoraproject.org/cgit/

and using the search box in the top right corner to search for the
package name
(eg: ocaml-mikmatch : http://pkgs.fedoraproject.org/cgit/ocaml-mikmatch.git/ )

Then look for the "master" branch, click "tree", and you will see the
build recipe we use (*.spec file) and the patches we apply.

Patches which are relevant for upstream are usually sent upstream,
*if* I can easily find someone who will receive them (especially
without too much hassle, such as requiring sign-ups).

Why AVL-tree?

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00012.html

A long time ago, Xavier Leroy said ... Diego Olivier Fernandez Pons recently replied:

> At the time Set was written, no efficient algorithms for whole-set
> 
> operations (union, intersection, difference, etc) were known for
> red-black trees. I'm not sure they are known today.

The answer is no and I will try to explain.

When doing a set union, at some point you will have to merge two
disjoint sets represented each by a tree

[1 .. 10] union [15 .. 22]

At that point you need to know the "size" of each tree because if both
trees are the same size you just need to append them

return Node ([1..10], [15..22])

while if one tree is bigger than the other, you will need to cut that
tree into smaller pieces and reorganize them (a "rotation").

divide [1..10] into [1..5], [6..10]
return Node([1..5], Node ([6..10], [15..22]))

The AVL implementation of OCaml gives you the "size" of each tree, its
the extra int in the constructor 

'a tree = Empty | Node 'a tree * 'a * 'a tree * int <--- here

it is equal to the longest path in the tree

f = function E -> 0 | Node (l, _, r, h) -> 1 + max (f l, fr)

If you implement Weight-balanced trees, that value is the cardinal of
the set represented by the tree.

If you use red-black or "traditional AVL" representation, that extra
bit just tells you if the tree is leaning towards the left or towards
the right, but doesn't tell you any information about the global
"size" of the tree. Which means you cannot implement set union
directly. You have to traverse full tree in O(n) to recompute that
information.

There is at least one library in Haskell that implements AVL trees
that way, it uses -1 | 0 | 1 information for most operations and
recomputes the longest path for all trees when doing set operations,
then discards that info.

There is also a way to implement red black trees to do set operations
just like (non-traditional) AVLs. It is described in Olivié H.J A new
class of balanced search trees : half balanced search trees, RAIRO
Informatique Théorique 16, 51 71 - also in his PhD Thesis.

Basically a tree is red-black when shortest-branch * 2 >=
longest-branch (cf. Constructing Red-Black trees, Ralf Hinze 1999 -
easier to find than Olivié's works)

So you can implement a tree structure that memorizes both the shortest
and the longest branch in each node

'a tree = E | N of 'a tree * 'a * 'a tree * int * int <---- shortest
and longest

From there you just follow the same scheme than the AVL implementation
in the OCaml lib, you just need to cope with a couple of triple
rotation cases.

> As for performance of insert/lookup operations, Jean-Christophe
> Filliâtre has measurements showing that OCaml's 2-imbalance AVL
> trees perform better than red-black trees. It all depends on your ratio
> of insertions to lookups, of course. But the 2-imbalance trick makes
> a big difference with textbook AVL trees.

This is a bit more difficult but in short "red-black trees do not
implement truly red-black trees" and "all implementations of red-black
trees implement a different approximation of red-black trees".

Any implementation of red-black trees that uses the 1 bit trick and
where insertions are in log (n) is not surjective. You can understand
that as meaning
(1) - some red-black trees will never be built by the balancing
algorithm
(2) - sometimes you will give the balancing algorithm a red-black tree
(= has a provable red-black coloring) and it will still rebalance it
instead of noticing it's already balanced and returning it untouched.

The reason is insertion in red-black trees using the 1 bit trick is
linear. Meaning if I want a balancing algorithm that never falls into
(1) or (2) I cannot avoid linear time insertion. The result is all
algorithms found in the literature are log(n) approximations of the
full algorithm. The only algorithm that is truly logarithmic is again
Olivié's thanks to it's implicit representation of the red-black
coloring.

From there, comparing performance of AVLs with "red-black trees" in
general makes no sense as the person that implemented them may have
implemented any arbitrary approximation of the full algorithm. This
translates in the code as a random number of "cases" to check for
balancing : 4 for Okasaki, 27 in the CLR, etc. With double, triple,
quadruple rotations, according to how "deep" the author was willing to
go. Because what these guys are doing is "unfolding" the linear
behavior of the algorithm into a constant and falling back into
rebuilding any tree they are given for all other cases.

In any case, some benchmarks I did very, very, very longtime ago
showed AVL and Weight balanced are good all-purpose implementations.
The advantage of WB being it gives you the cardinality of the set in O
(1). In some sense, AVL is the log of WB (in the same way the height
of a tree is the log of its size).

ocaml-gettext 0.3.5

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00014.html

Sylvain Le Gall announced:

ocaml-gettext provides support for internationalization of OCaml
program. It uses the gettext convention and file format through a pure
OCaml implementation or a C library binding.

This release fixes a problem with OCaml 4.02. Thanks to Richar W. M.
Jones for the patch.

You can download it here:
https://forge.ocamlcore.org/frs/download.php/1433/ocaml-gettext-0.3.5.tar.gz

Other OCaml News

From the ocamlcore planet blog:

Thanks to Alp Mestan, we now include in the OCaml Weekly News the links to the
recent posts from the ocamlcore planet blog at &lt;http://planet.ocaml.org/&gt;.

The ML Workshop looks fantastic:
  https://blogs.janestreet.com/the-ml-workshop-looks-fantastic/?utm_source=rss&utm_medium=rss&utm_campaign=the-ml-workshop-looks-fantastic

Software Engineer (OCaml) at Cryptosense (Full-time):
  http://functionaljobs.com/jobs/8730-software-engineer-ocaml-at-cryptosense

Generalist Engineer at Esper (Full-time):
  http://functionaljobs.com/jobs/8728-generalist-engineer-at-esper

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt