OCaml Weekly News

Hello

Here is the latest OCaml Weekly News, for the week of March 13 to 20, 2018.

New caml-list mirror - https://inbox.ocaml.org/caml-list
BAP 1.4 Release
Ocaml-git, git design and implementation
first release of orxgboost (gradient-boosted trees from R usable in OCaml)
OCaml job at Be Sport Paris
What is the reason of separation of module implementation and signatures in OCaml?
Other OCaml News

New caml-list mirror - https://inbox.ocaml.org/caml-list

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2018-03/msg00040.html

Nicolás Ojeda Bär announced:

I am happy to announce a new caml-list mirror, accessible at:

  https://inbox.ocaml.org/caml-list

Some advantages with respect to the official archive (sympa):

  - threads are not broken at month boundaries;
  - powerful search;
  - more usable interface (in my opinion);
  - it is easy to get a copy of the full archive: git clone --mirror
https://inbox.ocaml.org/caml-list.

The new mirror is powered by https://public-inbox.org/, where you can
go to learn more about it.

If you run into trouble or have any comments at all, do not hesitate
to get in touch either directly or via caml-list.

BAP 1.4 Release

Archive: https://discuss.ocaml.org/t/ann-bap-1-4-release/1711/1

Ivan Gotovchits announced:

After six months of active development, the BAP Team is proud to announce the
release of the Binary Analysis Platform 1.4.

The Carnegie Mellon University Binary Analysis Platform ([CMU BAP][0]) is a
reverse engineering and program analysis platform that works with binary code
and doesn't require the source code. BAP supports multiple architectures: ARM,
x86, x86-64, PowerPC, and MIPS. BAP disassembles and lifts binary code into the
RISC-like BAP Instruction Language (BIL). Program analysis is performed using
the BIL representation and is architecture independent in a sense that it will
work equally well for all supported architectures. The platform comes with a set
of tools, libraries, and plugins. The main purpose of BAP is to provide a
toolkit for implementing automated program analysis. BAP is written in OCaml and
it is the preferred language to write analysis, we have bindings to C, Python,
and Rust. The Primus Framework also provides a Lisp-like DSL for writing program
analysis tools.

The new release brings quite a few new features and several bug fixes. All
summarized below. We would like to especially thank Anton Kochkov (aka
[XVilka][1]) for contributing the MIPS lifter, and [SoftSec Lab][2] for
extensive testing and verification of our lifter semantics.

BAP v1.4 can be installed via the OCaml Package Manager ([opam][3]) and NixOS
package manager ([nix][4]). You can also use prebuilt deb, rpm, or tgz
[packages][5] or build BAP manually from sources. A good selection of
[docker][6] and [vagrant][7] recipes is also available, with some prebuilt
docker images uploaded to [DockerHub][8].

BAP is a moving target, so we are also encouraging everyone to use our rolling
releases opam repository, that will give you access to the newest features and
bug fixes as soon as they got merged into the master branch. Just add it to your
opam with
```
opam repo add bap-testing \
  https://github.com/BinaryAnalysisPlatform/opam-repository.git#testing
```

# Release Notes

## Features
* MIPS and MIPS64 lifters
* PowerPC and PowerPC64 lifters
* LLVM 5.0 compatibility
* BARE Binary Analysis Rule Engine
* New Taint Analysis Framework
* Primus Lisp 2.0 with symbols and methods
* Recipes
* Primus Test Framework
* Dataflow and Abstract Interpretation Framework
* Progress Reports and Profilers
* New primitives for BML

## Bug fixes

* Incorrect error handling in x86 lifter
* Failure to decode ICC binaries
* Fixes equiv type in Graphlib
* Unhardcodes llvm backed in the linear sweep disassembler
* Fixes the memory printer
* Fixes handling relocations in reconstructor
* Fixes race condition in the source merge procedure
* Restores the source-type command line option
* Proper handling of tail calls in IR lifter
* Fixes segment registers in mov instruction
* Fixes xor in the BIL simplfication procedure
* Fixes flag calculation in the x86 sub instruction
* Fixes numerous missed sign extensions in x86 lifter
* Adds modulo operation to x86 rot/rol instructions
* Fixes operands order in the x86 xadd instruction
* Fixes segment duplication

[0]: https://github.com/BinaryAnalysisPlatform/bap
[1]: https://github.com/XVilka
[2]: https://softsec.kaist.ac.kr/
[3]: https://opam.ocaml.org/packages/bap
[4]: https://github.com/NixOS/nixpkgs/pull/36194
[5]: https://github.com/BinaryAnalysisPlatform/bap/releases
[6]: https://github.com/BinaryAnalysisPlatform/bap/tree/master/docker
[7]: https://github.com/BinaryAnalysisPlatform/bap/tree/master/vagrant
[8]: https://hub.docker.com/u/binaryanalysisplatform/

Ocaml-git, git design and implementation

Archive: https://discuss.ocaml.org/t/ocaml-git-git-design-and-implementation/1718/1

Calascibetta Romain announced:

May be you don't know but we have a pure implementation of `git` in OCaml. This
big work started by @samoht a few years ago will be change a lot internally and
on the API to fit on MirageOS and, more generally, as a server software.

The final goal is to use (but it's already done in some cases like
[Canopy](https://github.com/Engil/Canopy)) `ocaml-git` as a data-store of a
MirageOS with [`Irmin`](https://github.com/mirage/irmin). By this way, it's easy
to tracking any update of your data-store, `revert` them (if you find any
problem) and replicate to an usual `git` repository (on GitHub for example).

It stills an experimental project which we have open questions about design,
interface and implementation. The good news is we are able to change before the
expected big release (which add lot of stuffs on `ocaml-git` and fix some bugs
about memory consumption).

However, find the better way for any use-case it's not easy for a single or two
brains. So currenlty, we have some pending questions about design and
implementation which need feedback. As a good community manager, i try to
explain at any step my choice, the context and possibilities. For a long time,
these discussions did in OCamllabs (because they are particularly technical) but
we across a big step when `ocaml-git` can be use now.

This topic could be a good place to report any strong update which breaks API,
semantic and a good way to explain step by step what is **really** `git`
internally (specification, details, limitations and so on ...). Finally, it
could be a good entry point to participate in any way (question, explanation,
documentation, or c0d3) to this big project.

And, as the first design question, you can follow this issue:
https://github.com/mirage/ocaml-git/issues/286

And because I'm currently on the MirageOS retreat in Marrakech, I would like to
say: good hacking!

PS: this big project is a part of a huge project, MirageOS. So, for many people,
it's difficult to see links between `ocaml-git` and others project (like
`irmin`, `wodan`, etc.) and constraints. But don't be afraid, we are here to
help to understand globally all.

first release of orxgboost (gradient-boosted trees from R usable in OCaml)

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2018-03/msg00048.html

Francois BERENGER announced:

I created a thin wrapper around the xgboost R package.

The code is here:
https://github.com/UnixJunkie/orxgboost

The interface is close to my precedent SVM package (orsvm-e1071).

Background:
https://en.wikipedia.org/wiki/Gradient_boosting

Paper:
Chen, Tianqi, and Carlos Guestrin.
"Xgboost: A scalable tree boosting system."
Proceedings of KDD'16. ACM, 2016.
DOI: 10.1145/2939672.2939785.

OCaml job at Be Sport Paris

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2018-03/msg00051.html

Be Sport announced:

Be Sport is looking for programmers familiar with functional languages, for
a development and research work.

Send an email to jobs@besport.com if you want to apply!

The company
Be Sport is a start-up company with sound funding, with a team of 20 people
today. We have a very ambitious development program and we believe in the
use of modern technologies coming from research to build strong basis for
our apps. We work in close relationship with research laboratories (CNRS,
Inria, univ. Paris Diderot …). Our main focus at the moment are languages
and typing for Web and mobile apps, recommendation algorithms for social
networks, classification algorithms.
Our first app is a social network dedicated to sport.
Be Sport premises are located in the center of Paris.

Work
We are looking for developers to take part in the development of some
features of our apps.
The developers will be integrated in the programming team: participation in
the writing of specifications, implementation (client / server), testing …
They will initially work on improving existing features, before
progressively taking the lead on some components.

Skills
Good general developers, knowing OCaml or other functional languages (or
willing to learn) are welcome. Other skills related to the implementation
of a social network are also welcome: machine learning, database, search
engines, etc.
Candidates must have some expertise on some of the following technologies:

- Typed functional languages, especially OCaml (and Ocsigen
Js_of_ocaml/Eliom)
- MacOS, iOS
- Databases
- Machine learning
- Web programming (CSS, Javascript…)

-- 

http://www.besport.com/

*Vincent Balat*
vincent@besport.com +33 6 32 01 13 04

215 rue Saint-Denis 75002 Paris
www.besport.com

https://www.facebook.com/besportdotcom/
https://www.instagram.com/besport_official/  https://twitter.com/BeSport
https://www.linkedin.com/company/2419257

What is the reason of separation of module implementation and signatures in OCaml?

Archive: https://discuss.ocaml.org/t/what-is-the-reason-of-separation-of-module-implementation-and-signatures-in-ocaml/1735/29

Deep in this huge thread, Ivan Gotovchits said:

In the ML module system, modules represent [abstract data types][0] with
existential types, as shown in the [foundational work by Mitchel and
Plotkin][1]. Compare with conventional languages, such Java or C++, where
abstract data types are (poorly) modeled with classes (and interfaces) that bind
together the nominal abstraction with the set of methods (operations). The ML
module system does not invent any ad-hoc constructs, such as classes, but relies
on mathematics to deliver proper definitions that are well-tested by time. In
the ML module system, structures denote [mathematical objects][3], and their
types are denoted by the [signatures][2].

The separation between abstraction and implementation is the essential part of
modular programming in particular and reasoning in general. Properly chosen
abstractions reduce the amount of information that we need to reason about and
allow us to build complex systems from smaller parts. One of the
responsibilities of the modular system in programming languages is to protect
the abstractions by ensuring that modules depend on abstractions, not
implementations. Consider Python, Common Lisp, and many other dynamically typed
languages that do not protect the abstractions as they do not provide mandatory
information hiding mechanisms. As a project evolve, the diffusion process rots
through the module binaries, that essentially leads to projects that are hard to
maintain and hard to understand.

Of course, the ML module system is not the only mechanism for implementing
abstract data types. We have also classes and interfaces (as in Java,C++),
another option is to use type classes as in Haskell (they all basically differ
in the way how the represent polymorphism - that's a completely different
topic). But in any case, just having types of definitions, without providing a
mechanism to define types of mathematical structures (i.e., sets of operations)
is not enough.

Whether or not to have a separate mli file for signatures that is a design
question. I personally like it, though it poses some technical problems and
doesn't play well with namespaces. In OCaml, you can consider mli files as a
shorthand and even consider them optional. Some projects (e.g., ocamlbuild)
define all their abstractions (module types) in one ml file, that is used then,
in different implementations. Although that's not common today, it's a viable
option.

[0]: https://en.wikipedia.org/wiki/Abstract_data_type
[1]:
http://delivery.acm.org/10.1145/50000/45065/p470-mitchell.pdf?ip=128.237.154.21&id=45065&acc=ACTIVE%20SERVICE&key=A792924B58C015C1%2E5A12BE0369099858%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1521472465_27441f74c41007e12833a005cbd35aad

[2]: https://en.wikipedia.org/wiki/Signature_(logic)

[3]: https://en.wikipedia.org/wiki/Mathematical_structure

Xavier Leroy then added:

Many very good points were raised already, so I'll just add a couple of
historical notes and personal preferences.

The "one compilation unit = one .mli interface file + one .ml implementation
file" design goes back to Caml Light and was taken from Modula-2, Wirth's
excellent Pascal successor. As previously mentioned, it works great with
separate compilation and parallel "make".

But the main point in favor, in my opinion, is that it clearly separates the
user's view of the module (the .mli file) from the implementor's view (the .ml
file). In particular, documentation comments on how to use the module go to the
.mli file; comments about the implementation go to the .ml file. This stands in
sharp contrast with most of the Java code I've written and read, where comments
are an awful mixture of documentation comments and implementation comments, and
serious IDE support is needed to see just the user's view of a class and
nothingelse.

Also, in the .mli file declarations can be listed in the most natural /
pedagogical order, starting with the most useful functions and finishing with
less common ones, while definitions in .ml files must come in bottom-up
dependency order.

OCaml combines this Modula-2 approach to compilation units with a Standard
ML-like language of modules, featuring nested structures, functors, and multiple
views of structures. The latter is a rarely-used but cool feature whereas a
given structure can be constrained by several signatures to give different views
on the implementation, e.g. one with full type abstraction for use by the
general public and another with more transparent types for use by "friend"
modules. Again, and perhaps even more than in Modula-2, it makes sense to
separate structures (implementations) from signatures (interfaces) that control
how much of the implementation is visible.

Other OCaml News

From the ocamlcore planet blog:

Here are links from many OCaml blogs aggregated at OCaml Planet,
http://ocaml.org/community/planet/.

Release of Alt-Ergo 2.1.0
 http://www.ocamlpro.com/2018/03/14/release-of-alt-ergo-2-1-0/

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.

Alan Schmitt