Here is the latest Caml Weekly News, for the week of 11 to 18 October, 2005.
Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/30758Thomas Fischbacher asked and Xavier Leroy answered:
> Actually, this brings me to a question I wanted to ask for a long time: > while I never used this so far, I just assumed that OCaml does provide > hash tables where keys are compared w.r.t. "being the same" ('==' , that > is), rather than only hash tables where keys are compared for "being > equal" (say, '='). Easily definable using the functorial interface to hash tables: module MyEqHashTable = Hashtbl.Make(struct type t = my_type_for_keys let equal = (==) let hash = Hashtbl.hash end) > Of course, "EQ hash tables" have to be treated in a slightly special way > when talking about stop© GCing. No, not "of course". You're thinking of using the address of a memory block as its hash value. This indeed requires GC support. But you can get the semantics you want (keys compared by reference) while still computing the hash code structurally. This is what the code snippet above does.Thomas Fischbacher then asked and Xavier Leroy answered:
> Hm, okay. I agree that this does give me the same semantics. > But doesn't this degrade expected lookup performance to an O(n) list > lookup if I put a lot of stuff into the hash which is (=) but not (==)? Yes, of course. It depends on your application. For things like hash-consing, this approach (standard hash function + comparison finer than (=)) can still work well. Alternatively, you can always put unique integers in the data type used as key.
Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/30781Mark Benecke asked and Xavier Leroy answered:
> When I compile my program with the ocamlopt compiler in the windows OS., the > following error message is shown: > > gcc: @C:\DOKUME~1\karinu\LOKALE~1\Temp\camlresp8f8d86: No such file or > directory > gcc: @C:\DOKUME~1\karinu\LOKALE~1\Temp\camlresp0e9864: No such file or > directory > Error during linking > > does any one has any idea? You should install the Cygwin version of the Mingw compilers, not the standalone MSYS/Mingw compilers. The latter do not understand the @responsefile notation.
Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/30784Richard Jones asked and Yoann Padioleau answered:
> I'm trying to optimise a program which is using a large amount of > memory and consequently thrashing. > > The core of the program is an iteration over a list of something like > a million elements which consumes about 1/2 gig of RAM. The iteration > is: > > List.iter ( > fun row -> > (* put row into database and forget about it *) > ) rows; > (* no further references to rows after this *) > > This is the stdlib implementation of List.iter. Should the garbage > collector be able to collect the part of the list which has been > iterated over, during the iteration? At the moment it doesn't look > like it's doing so. Because rows is still accessible after the List.iter so it is normal that it is not garbage collected. I had the same kind of problem and to optimize it I choose to produce the elements of rows lazily (but then I had another problem with the Lazy modudle where elements were not garbage collected so I use my own lazy module (simple via closure) and it works perfectly well).Richard Jones answered and Gerd Stolpmann said:
> > Because rows is still accessible after the List.iter so it is normal > > that it is not garbage collected. > > I agree that rows is "accessible", but it's not actually used. My > understanding is that the GC would be prevented from considering the > list for collection if the pointer to the head of the list (ie. rows) > was stored on the heap or in a register somewhere. Would this be the > case here? > > > I had the same kind of problem and to optimize it I choose to > > produce the elements of rows lazily (but then I had another problem > > with the Lazy modudle where elements were not garbage collected so I > > use my own lazy module (simple via closure) and it works perfectly > > well). > > Unfortunately this isn't really an option here. The rows list comes > from a huge XML doc which is parsed by PXP and passed through some > complex post-processing; PXP doesn't support incremental processing of > XML docs, and the post-processing would be tricky to convert too. PXP has a pull parser. You get the XML document as a lazy stream of XML events. I don't know your document format, but if it is something like <document> <record>...</record> <record>...</record> ... lots of them ... </document> I would recommend using the pull parser, and then create XML trees for the individual records only (you can mix both styles).John Skaller answered the OP:
let rows = ref  in (* build up the list of rows here *) let rec doit rows = match rows with |  -> () | h :: t -> install_in_db h; doit t (* tail call *) in doit (let r = !rows in rows := ; r)Olivier Andrieu also answered the OP:
> This is the stdlib implementation of List.iter. Should the garbage > collector be able to collect the part of the list which has been > iterated over, during the iteration? At the moment it doesn't look > like it's doing so. " Short answer: with ocamlc, no. With ocamlopt, yes. " cf. these informatives messages by Xavier: http://groups.google.com/group/fa.caml/msg/f194a3240d240e71 http://groups.google.com/group/fa.caml/msg/d36cb040d0d87ca6Seth J. Fogarty said, Richard Jones asked, and Frederic van der Plancke answered:
> > No, because you have bound rows to a name. Now, I believe if rows is > > returned by a function, and is NOT bound by name in that function, it > > can be garbage collected. I.E. > > Ah OK ... I'm interested though: why does binding a value to a name > cause problems? Surely at this level (ocamlopt generated code) there > ought to be no difference between a named value and an unnamed one? Perhaps the problem is not the name, but the fact that when you write List.iter f rows the compiler isn't going to go great lengths in optimizing out the reference to rows before the call to List.iter, since the optimisation probably isn't free and the compiler (and compiler writers) don't know in advance how worthwhile it would be. If the call to List.iter was a tail call the compiler would probably be forced to optimise rows out. Hence my 0.02 cents worth untested idea: you might try and use tail call optimisation to get rid of that reference. say let do_work () = let rows = ... in List.iter (...) rows
Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/30792Ian Zimmerman asked and Jacques Garrigue answered:
> Are Ocaml's VM threads preemptive? That is, does the virtual machine > actually have a timer and give a slice to each thread? Or does a thread > run until it's blocked? The timer signal is used, so they are preemptive. This is however only true for threads running ocaml code. There is no way to switch thread during C calls, so that all potentially blocking C calls should be manually converted to non-blocking ones (this is already done for the standard library and the Unix module.) This can also create problems when somebody else needs the timer signal (for instance the Graphics module.) System threads are most costly (they use system resources), but they simplify greatly working with C.Ian Zimmerman then asked and Jacques Garrigue answered:
> The reason I asked was: when writing a library that uses C extension > code and a global data structure in the C code, can I just use system > thread primitives for synchronization (that would be pthreads since > I am only thinking about Unix type systems at least ATM) and not worry > about VM threads? > > From your post the answer seems to be yes. That's even stronger than that: if you do not plan to run some pure C threads concurently with ocaml, you usually don't need any synchronization, either for system or VM threads. With VM threads, you may only change thread when executing ocaml code, so all your C code is single-threaded. With system threads, you have to release the ocaml lock by calling enter_blocking_section before any other ocaml thread may run. If you don't want to be interrupted, just don't release it. So you don't need to bother with synchronization as long as you don't introduce concurrency intentionally on the C side. (But this might be what you intend to do...)
Archive: http://thread.gmane.org/gmane.comp.lang.caml.general/30798Gerd Stolpmann announced:
For your information, I just released this software written in O'Caml. Gerd ------------------------------------------------------------------------- UMLMON is a monitor for User Mode Linux (UML) that sets up a runtime environment for UML and controls the execution of UML. UMLMON uses a quite advanced client/server architecture, is rock-stable, and even has a web interface. The newest version of UMLMON, 1.0.1, has just been released. It is now available under the terms of the GPL. UMLMON is a feature-rich monitor: - UMLMON creates and maintains a runtime environment for UML. This includes a chroot jail and the automatic setup of needed resources. - UMLMON is a daemon with an RPC interface. One can control the daemon and the running UML instance by sending commands over this interface. The RPC interface works locally and over the network. - Of course, one can start and stop the UML instance over the UMLMON daemon. - UMLMON can establish bidirectional connections to the virtual consoles and the virtual serial lines. This means one can directly have an interactive session on the virtual machine without having to rely on networking. - UMLMON can log the output of virtual consoles. Even log rotation is supported. - UMLMON has administrative functions to create and manipulate virtual disks. - UMLMON eases the setup of virtual networking. - UMLMON is a secure environment. Although the daemon runs as root, it is written in a _safe_ programming language such that it is impossible that buffer overflows or other flaws of low-level languages allow attackers to break in. - There is a command-line interface and a web interface. Due to restricted resources the latter is currently only available in German language. A multilingual version is planned, though. I hope you see also the advantages of such a daemon. You can get rid of all the hackish scripts that are typically used in a UML setup, and have a clean and powerful solution that grows with your requirements. The UMLMON homepage is at http://www.gerd-stolpmann.de/umlmon . There are binaries for LSB-2.0 (Linux Standard Base). See http://www.gerd-stolpmann.de/buero/umlmon/umlmon-1.0.1-install.txt.en for instructions where to get them and how to install them. Source code: http://www.ocaml-programming.de/packages/umlmon-1.0.1.tar.gz . It is quite difficult to build UMLMON on a typical Linux distro because most prerequisite libraries are missing, or are too old. The easiest way is to go with GODI, http://godi.ocaml-programming.de , which includes a lot of scripts to automate the build of programs written in O'Caml. Just install the packages apps-umlmon and apps-umlmon-web. Problems, questions, etc. are answered by me, please contact me privately <firstname.lastname@example.org>. UMLMON is an Open Source product by Informatikbuero Gerd Stolpmann. Commercial support setting up the daemon and the whole UML system can be bought, especially here in Germany.
Here is a quick trick to help you read this CWN if you are viewing it using vim (version 6 or greater).
If you know of a better way, please let me know.
If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.
If you also wish to receive it every week by mail, you may subscribe online.