Hello
Here is the latest OCaml Weekly News, for the week of July 15 to 22, 2014.
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-07/msg00059.html
Continuing this thread from last week, Benjamin Canou replied to Alain Frisch:> This topic of how to specify which ppx processors to run, and > avoiding multiple processes, is indeed still largely opened. > > I don't see what's the benefit of restricting the processors to > subtrees. It's easy enough for each processor to traverse extension > nodes it doesn't support (this is the behavior of the default > mapper). And I don't think it's a good idea to introduce > a composition model different from the successive application of > different processors on the entire tree. I.e. function composition, > which is quite well understood and easy to reason about. In > particular, you only need to understand the behavior of each > processor to predict what the composition will do, not exactly how > each processor is implemented (and which state it carries across the > tree, internally). My belief is that there is room for a less ambitious, plug-in based (statically or dynamically linked) annotation mechanism, on top of PPX. For instance, if we consider the current use of camlp4, we can assume that most ppxs will probably just define some annotation on some specific kind of AST node in order to rewrite them and / or insert auxiliary code, without carrying so much state. Having a common mechanism for registering such common simple tasks and assigning to names to them could be useful, without breaking the model. In practice, an annotation would simply be declared as an OCaml module calling predefined specific registration functions. Such a module, linked with a predefined main module would produce a stand alone ppx binary processing only this annotation. However, it gives the possibility to compose annotations more finely, without changing the semantics. As I mentioned previously, It could limit the number of tree rewrites, but it could have other advantages such as detecting annotation name clashes. We launched this thread because we thought that such a mechanism has more chances to be adopted if previously discussed and understood by build systems, but perhaps the best solution is that we propose such a ppx helper and then wait and see. > > With ocamlfind 1.5, requiring a package when compiling a file adds > the required -ppx flags in addition to the -I flags. If we want to > avoid multiple process, one could create a meta ppx driver which > dynamically loads and runs other drivers (specified as .cmxs files). > To be able to use such as meta driver from ocamlfind, ocamlfind > needs to know about how to build each ppx processor (i.e. which > libraries should be invoked). Dynamic linking might not be the best > solution, though: it is not available on all platforms, and it > requires all libraries on which ppx processor depend to provide > a corresponding .cmxs. The alternative is to have ocamlfind link > a specific meta driver statically (using its knowledge of package > dependencies) for each actual combination of ppx to be used, relying > on an internal cache to avoid linking the same driver repeatedly, of > course. The next step is to create not ppx drivers (that > incorporate multiple precessors), but compiler drivers (to avoid > completely extra process creations and marshaling of the AST). If > this is encapsulated in ocamlfind (or a similar tool), this can > still be used by any build system which currently relies on > ocamlfind. > > Specifying ppx requirements in the source code is a different topic. > It might be a good direction, but then I don't see why this should > be restricted to ppx requirements and not -I flags. It would seem > logical to specify package requirements, which would add both -I > and -ppx flags (and maybe more). I agree on this. Annotations give us the possibility to make OCaml files more self content and build-system independent. I though see a distinction between compiler directed pragmas and build-system directed ones. > Actually, it would have been more important to specify package > requirements for Camlp4 processors, since this knowledge might be > required by tools that are not under the control of your build > system, such as your editor/IDE (to load the corresponding syntactic > support). Since the concrete syntax doesn't change anymore with ppx > processors, it seems less critical to specify them in the source > code (I'd say, not more and not less than general package > requirement). Well, one of the use cases of extension nodes is to integrate external notations into literals such as [%json{| ... |}]. I believe IDEs could still use a little help to know how to format these, instead of showing plain OCaml strings.Alain Frisch then said:
> My belief is that there is room for a less ambitious, plug-in based > (statically or dynamically linked) annotation mechanism, on top of PPX. > For instance, if we consider the current use of camlp4, we can assume > that most ppxs will probably just define some annotation on some > specific kind of AST node in order to rewrite them and / or insert > auxiliary code, without carrying so much state. Having a common > mechanism for registering such common simple tasks and assigning to > names to them could be useful, without breaking the model. The registration mechanism exists: a ppx processor is supposed to call Ast_mapper.register. By default, this runs directly the main program to support for ppx protocol for this single processor, but you can very well override the behavior of Ast_mapper.register by calling Ast_mapper.register_function first. You can look at https://github.com/alainfrisch/ppx_drivers for an experiment of creating such a driver. But this notion of registration is quite independent on how multiple processors interact. I'm yet to be convinced that a composition mode more fine-grained that function composition is worth the extra mental overhead. It would also be necessary to decide between top-down and bottom-up rewriting, none of them being the best pick for all processors. Typically, a macro expander would want other processors to apply to the result of macro expansions (i.e. top-down rewriting style), while a type-conv-like processor might prefer a bottom-up rewriting (so that the macro expander can run first on type declarations before it takes control). Concerning the notion of state in ppx processors, I've written the following ppx processors: - sedlex (a lexer generator): it needs to remember about all lexer definitions in the current unit to be able to generate some global function definitions (shared by various lexers) (for instance, we don't want to generate definitions used only by code that would be excluded by a conditional compilation processor) - ppx_metaquot: some extension nodes with very short names %e, %t, %p are recognized only in the context of an enclosing extension node such as %expr ( https://github.com/alainfrisch/ppx_tools/blob/master/ppx_metaquot.ml ) I expect most other ppx processors to be slightly more complex than purely local transformations, which makes it even more difficult to reason on local composition (and "purely local" term rewriting systems are already hard to reason about their composition). > Well, one of the use cases of extension nodes is to integrate external > notations into literals such as [%json{| ... |}]. I believe IDEs could > still use a little help to know how to format these, instead of showing > plain OCaml strings. Indeed, although I think this is going to be a rather rare use of extension points / quoted strings.
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-07/msg00092.html
Alain Frisch asked:GADTs allow to restrict existential type variables to being an instance of a row type, as in: class type c = ... type s = | EX: ((#c as 'a) -> unit) * (unit -> 'a) -> ex I'm wondering if it is possible to simulate such restricted existential quantification with first-class module and private row types. Something like: module type S = sig type t = private #c val f: t -> unit val g: unit -> t end let foo (type t_) (f : (#c as t_) -> unit) (g : unit -> t_) = let module M = struct type t = t_ let f = f let g = g end in (module M : S) This does not work, of course, because of the "... as t_". Is there a local work-around? If not, I'm wondering if if would be easy (and make sense) to introduce a form for introducing locally private row types: let foo (type t_ = private #c) (f : t_ -> unit) (g : unit -> t_) = ...Leo White then replied:
There is a work-around, but it is quite convoluted: https://sympa.inria.fr/sympa/arc/caml-list/2012-10/msg00131.html There is also a feature request for the feature you propose: http://caml.inria.fr/mantis/view.php?id=6137Alain Frisch then replied:
Very clever trick. Thanks! One can actually use the GADT only to carry the constraint and pass the value argument(s) separately. This means the GADT definition remains very simple even if we need to add many fields: ============================================================= class type c = object end module type SIG = sig type t = private #c val f: t -> unit val g: unit -> t end type 'a aux = Aux: (#c as 'a) aux let create f g = let helper (type u) (Aux : u aux) f g = (module struct type t = u let f = f let g = g end : SIG) in helper Aux f g (* or even: *) let create f g = let module Aux = struct type 'a t = E: (#c as 'a) t end in begin fun (type u) (Aux.E : u Aux.t) f g -> (module struct type t = u let f = f let g = g end : SIG) end Aux.E f g =============================================================
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-07/msg00098.html
Continuing the thread from last week, Damien Doligez said:First note that we are not breaking backward compatibility: you can always use the -unsafe-string flag to compile your dusty code. On 2014-07-05, at 13:04, Gerd Stolpmann wrote: > So my scenario is quite low-level: I/O, and C interfaces. As you said, bigarrays are the best suited for that kind of code. But that's not a good reason to make all strings as heavy as bigarrays. If you need bigarrays, by all means use bigarrays in your code, not String or Bytes. On 2014-07-05, at 13:24, Gerd Stolpmann wrote: > Well, the complexity can be reduced a bit by using phantom types: > > type string = [`String] stringlike > type bytes = [`Bytes] stringlike > > and then just define function-by-function what is permitted: This is almost the same as our first version, which we discarded as too complex and not compatible enough (as you noted, because of unresolved type variables). But it might make a come-back. On 2014-07-08, at 14:24, Gerd Stolpmann wrote: > It will create confusion even with actively maintained code bases. What > could help here is very clear communication when the change will be the > standard behavior, and how the migration will take place. Currently, it > feels like a big experiment - hey, let's users tentatively enable it, > and watch out for problems. OK, we need to be clearer on the "how" (in an nutshell, the default will switch from -unsafe-string to -safe-string at some point in the future when we feel that enough of the existing code has been updated). As for the "when", we can't tell because that depends a lot on how fast the community updates its code. Hopefully no more than three years. Possibly as soon as 4.03.0. > There could also be a section in > the manual explaining the new behavior, and how to convert code. That's a good idea. > Right, that's the good side of it. (Although the danger is quite > theoretical, as most programmers seem to intuitively follow the rule > "don't mutate strings you did not create". I've never seen this kind of > bug in practice.) What about programmers who deliberately trigger the bug (aka "attackers", in a security setting)? It's not just about how unlikely a bug is, but also whether it can be exploited. > For instance, there is one module in OCamlnet where a regexp is directly > run on an I/O buffer (generally, you need to do some primitive parsing > on I/O buffers before you can extract strings, and that's where > stringlike would be nice to have). Without stringlike, I would have to > replace that regexp somehow. If stringlike is polymorphic, you will need a new regexp library that operates on stringlike. We cannot update the current regexp library to use stringlike because that would introduce polymorphism and unresolved type variables, and that might break some of the code that used to run on 1.03... On 2014-07-14, at 19:45, Richard W.M. Jones wrote: > That would imply removing incorrect functions like String.uppercase > and String.lowercase. First, we mark them deprecated. Then we wait a very long time before we actually remove them from (if ever).Alain Frisch also commented:
> http://blog.camlcity.org/blog/bytes1.html Coming back to motivating example of this post. Lexing provides: val from_channel : in_channel -> lexbuf val from_string : string -> lexbuf val from_function : (bytes -> int -> int) -> lexbuf In particular, from_function expects you to write to a buffer, so it's pretty clear that its callback must accept a "bytes", not a "string". There is no place for a (string -> int -> int) -> lexbuf function. Concerning from_string: this function copies the string to an internal buffer. This is purely implemented on the OCaml side without any unsafe features. We could avoid this copy because we know that the generated lexers won't actually modify the buffer in that case, but it would be very difficult to do this without using an unsafe feature, even if we had some sort of generalization of bytes and string. We would instead need a completely different implementation (which would not use "stringable" to make the "source" (string or "stream") explicit in the lexbuf datastructure. We could also provide an extra from_bytes function, but it can currently be implemented by composing Bytes.to_string and Lexing.from_string. Are you concerned only by the performance overhead of this approach (two copies)? If so, the same argument would apply to the current implementation of from_string, and we would need to switch to a different approach, for which it's not clear that "stringable" would be a big help (see above). Before doing anything like that, it would be interesting to evaluate the exact overhead. It could very well be negligible/acceptable for most cases compared to the cost of actual lexing.
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-07/msg00100.html
Oliver Bandel announced:Just found this via ycombinator: http://tech.esper.com/2014/07/15/why-we-use-ocaml/
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-07/msg00103.html
Dmitrii Kosarev announced:Lablqt will help you to create GUI applications with using QtQuick 2 and OCaml. It includes PPX syntax extension and ocamlfind package. It's available in OPAM already. Tutorial: http://kakadu.github.io/lablqt/tutorial2.html . Kind regards, Dmitrii Kosarev a.k.a. Kakadu P.S. Many thanks to guys who helped me in improving of the tutorial.
Archive: https://sympa.inria.fr/sympa/arc/caml-list/2014-07/msg00107.html
Yoriyuki Yamagata announced:I write a blog post http://yoriyuki.info/en/blog/2014/07/18/unicode/ which proposes inclusion of Unicode strings in OCaml standard distribution. The reason for this proposal can be summarized as follows. 1 Type for human readable text is too important to left out from the standard distribution, in particular from the beginner's perspective 2 This enhances interpolability of Unicode processing libraries 3 This suppresses the current dangerous practice that raw UTF-8 encoded string is used for Unicode string. I hope this stimulates the further discussion of human readable texts in OCamlTörök Edwin then replied:
Just my two cents: I don't think that iterating on codepoints is the fundamental operation for Unicode strings, there are too many pitfalls to be aware of (unless you are writing a low-level Unicode library such as normalization, regular expression matching, etc.). From a user's perspective I think you need mostly these operations: * create a valid UTF-8 string (reject invalid ones) * be able to put Unicode strings in standard containers (i.e. comparison and hash functions) * transform Unicode strings (normalize, case-fold) * Unicode regular expressions (UTS#18) * Unicode text segmentation (UTS#29) * Unicode collation algorithms (UTS#10) The regular expressions could be used to match/replace/split the Unicode string just as you would use String.index/String.sub, and it could be used to find/access the Unicode properties of unicode characters. And Unicode text segmentation can be used to define more useful iterators than codepoint-by-codepoint. If performance is a concern specialized implementations could be provided for commonly used expressions. Now of course this is only my view and someone else would consider other Unicode features to be more important. Also Unicode evolves, there could be bugs in the implementation, etc. so I don't think this is suitable to be baked in the compiler-provided standard library. At least such a "high-level" Unicode library should be prototyped and fully implemented outside the compiler (based on the already existing uu* libraries perhaps). Then see what problems people face when using it, and then tested out in a "standard library" such as Batteries or Core, and only then be made part of the compiler's standard library. Given that recently there's been a tendency to split-out functionality from the OCaml distribution (camlp4, ocamlbuild, etc.) I don't think that adding such complicated and evolving Unicode algorithms to the standard library is the way to go. I'd rather see the standard library deprecate/remove the ASCII-specific interfaces (or move them to a submodule). As for the compiler itself it could provide some useful functionality, not sure if it is required: * warn when string literals are not valid UTF-8 * support for \u * if there will be [`String|`Bytes ] stringlike there could be a 'type unicode = `Unicode stringlike' and some way to make literal strings be of type unicode, with the actual unicode implementation left to a user library
Thanks to Alp Mestan, we now include in the OCaml Weekly News the links to the recent posts from the ocamlcore planet blog at http://planet.ocaml.org/. Building an ARMy of Xen unikernels: http://openmirage.org/blog/introducing-xen-minios-arm Using Irmin to add fault-tolerance to the Xenstore database: http://openmirage.org/blog/introducing-irmin-in-xenstore Merge sort: http://shayne-fletcher.blogspot.com/2014/07/merge-sort.html Reductions in computability theory from a constructive point of view: http://math.andrej.com/2014/07/19/reductions-in-computability-theory-from-a-constructive-point-of-view/ Simple top-down development in OCaml: https://blogs.janestreet.com/simple-top-down-development-in-ocaml/?utm_source=rss&utm_medium=rss&utm_campaign=simple-top-down-development-in-ocaml Coq is hiring a specialized engineer for 2 years: http://coq.inria.fr/coq-is-hiring-a-specialized-engineer-for-2-years OCamlPro Highlights: May-June 2014: http://www.ocamlpro.com/blog/2014/07/16/monthly-2014-06.html
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.