Previous week Up Next week

Hello

Here is the latest OCaml Weekly News, for the week of August 05 to 12, 2014.

  1. containers 0.3.3
  2. Bookaml 1.0
  3. strategies to deal with huge in-memory "object" graphs?
  4. OCaml HTML parsing & manipulation
  5. ppx_import 0.1
  6. Other OCaml News

containers 0.3.3

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00019.html

Simon Cruanes announced:
I am very pleased to announce the first official release of
containers[1], a BSD-licensed extension to the compiler standard library
that focused on modularity, efficiency and a small footprint,
for OCaml >= 4.01. In particular, no pack, and almost every modules are
self-contained (i.e., if you link one module, you don't pay for the
other ones).  The purpose is not to compete with Batteries or Core,
which will always be more exhaustive, but to provide a lightweight
and modular approach.

You can find the API online [2].

Containers is split into several parts:
- a "core" library, just called "containers"; every module it contains
  is prefixed with "CC"[3] and lives in the global namespace.
  Most modules provide iterators (especially sequence [4]).
  In a nutshell:

  * CCList, with tail-recursive version of fold_right, append and map,
    and other new functions
  * CCOpt, CCPair, CCFun, providing useful combinators for pairs,
    options (map, >>=), functions (id,const,compose)
  * combinators for printing, equality, comparisons, random generators, etc.
  * CCError, an error monad (with polymorphic variants, same as D. Bünzli)
  * other data structures: CCArray, CCVector, CCTrie, CCMultiMap,
    CCMultiSet, CCHeap, bitvectors

- a tiny string library, in "containers.string" (module
  Containers_string) with a few algorithms on strings.
- a heap of unstable, experimental things ("containers.misc") that
  shouldn't be used in production, but might one day be promoted to core.

Contributions and bugreports/fixes are very welcome, the library is
still very young[5]. Although some might think it's bad practice, you can
also copy individual modules (.mli + .ml) into your own project if for
some reason you want to avoid dependencies.

Cheers!

[1] https://github.com/c-cube/ocaml-containers
[2] http://cedeela.fr/~simon/software/containers/
[3] as in "core containers", or "companion cube" if you suspect me of
    megalomania
[4] https://github.com/c-cube/sequence
[5] some other attempts at writing a container library:
    http://img.myconfinedspace.com/wp-content/uploads/2007/07/shipping_gone_wrong.jpg
      

Bookaml 1.0

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00021.html

Dario Teixeira announced:
I'm happy to announce release 1.0 of Bookaml [1], a simple library for
validating ISBNs, gathering information about a book given its ISBN, or to
find any number of books matching given search criteria. Bookaml is closely
tied to the Amazon Product Advertising API [2], which it uses internally
for retrieving book information.

For more information, please visit the project's homepage,
or read the blog post that announced the release [3].

And of course, Bookaml is already available on OPAM.

Best regards,
Dario Teixeira

[1] http://bookaml.forge.ocamlcore.org/
[2] https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
[3] http://nleyten.com/post/2014/08/07/Announcing-Bookaml-1.0
      

strategies to deal with huge in-memory "object" graphs?

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00038.html

Yoann Padioleau asked:
I have an application that is gradually creating a graph (using ocamlgraph) and 
the amount of memory it is using is around 3 or 4 Gb (my machine has 
74Gb of RAM). There are lots of nodes and edges in this graph. The problem is that building
this graph takes a huge amount of time. As the build progresses, it gets slower
and slower. My guess is that the “object” graph is getting really huge and
so the Gc needs to explore each time even more. I’ve tried things like

  (* see www.elehack.net/michael/blog/2010/06/ocaml-memory-tuning *)
  Gc.set { (Gc.get()) with Gc.minor_heap_size = 4_000_000 };
  (* goes from 5300s to 3000s for building db for www *)
  Gc.set { (Gc.get()) with Gc.major_heap_increment = 8_000_000 };
  Gc.set { (Gc.get()) with Gc.space_overhead = 300 };


but it does not really help. It is still really slow.

In the past I sometimes use the Marshall module to reduce the number of “objects”,
but it forces me to rewrite quite a lot the code. 

Is there a way to partition the heap so that for instance in my case
all the graph related things are put in a different area that the Gc
does not have to explore each time.  I’d like a minor heap, major
heap, and then a
do_not_gc_this_heap_it_is_only_growing_there_is_no_garbage_here_to_collect.
      
Gabriel Kerneis replied:
Isn't it what Ancient is (was) for? Designed in times where the
keyword was swap rather than inconceivably large amount of RAM, but
the same idea should work for both situations I guess.

http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=538
      
Gerd Stolpmann also replied:
The latter can be accomplished by setting space_overhead to a large
value (maybe 1E6). Also set max_overhead to 1E6 to avoid compactions.

Once you have built the graph, you can move the whole beast (provided it
is read-only) to a non-GC-managed area with either Ancient (simpler to
use, fewer features), or Ocamlnet's Netmulticore. But you really can
only move the whole graph, with everything that is reachable from it.
Any mutation will kill the program.
      

OCaml HTML parsing & manipulation

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00042.html

Jacques du Preez asked and Christophe Troestler replied:
> I've been searching for an OCaml library to parse HTML, and then be able to
> query and manipulate it similar to jQuery.
> 
> The JSoup Java library, http://jsoup.org, allows me to do this. Is there
> something like this for OCaml?

Nethtml in ocamlnet partly does what you need (you can easily write
recursive functions to extract the desired data from the HTML tree).
      

ppx_import 0.1

Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-08/msg00049.html

Peter Zotov announced:
I'm pleased to announce ppx_import 0.1. It will be available shortly
in OPAM.

ppx_import is a syntax extension that allows to pull in types or
signatures from compiled interface files.

For example:

# type loc = [%import: Location.t];;
type loc = Location.t = { loc_start : Lexing.position; loc_end :
Lexing.position; loc_ghost : bool; }
# module type Hashable = [%import: (module Hashtbl.HashedType)];;
module type Hashable = sig type t val equal : t -> t -> bool val
hash : t -> int end

See more documentation on GitHub:
https://github.com/whitequark/ppx_import
      

Other OCaml News

From the ocamlcore planet blog:
Thanks to Alp Mestan, we now include in the OCaml Weekly News the links to the
recent posts from the ocamlcore planet blog at http://planet.ocaml.org/.

Digits:
  http://shayne-fletcher.blogspot.com/2014/08/digits.html

Announcing Bookaml 1.0:
  http://nleyten.com/post/2014/08/07/Announcing-Bookaml-1.0

Release of Bookaml 1.0:
  https://forge.ocamlcore.org/forum/forum.php?forum_id=907
      

Old cwn

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.


Alan Schmitt