Martin Elsman | 1 Mar 1994 16:35
Picon

Documentation for the type-checker for Caml Light

Hello Caml-list listener's

Does anybody know if there exist any documentation of the type-
checker of Caml Light. 'The ZINC Experiment: An Economical
Implementation of The ML-Language' by Xavier Leroy, 1990 does
not include type-checking :-(.

I'm trying to attach equality type variables to Caml Light
together with imperative type variables and overloaded
builtin operators. The code of Caml Light is not 'just' the
Hindley/Milner/Robinson kind of thing, though there are
similarities. What are dangerous type variables and why does
the type checker include two unification algorithms?

Best regards

Martin Elsman

-----------------------------------------------------------------------
Martin Elsman                       The Technical University of Denmark
E-Mail: mael <at> id.dth.dk                   Department of Computer Science
-----------------------------------------------------------------------

Vale'rie Me'nissier-Morain | 1 Mar 1994 18:55
Picon
Picon
Favicon

Some questions and proposals...

>	2) This is a question mainly to CAML and Camllight implementors: has
>	anybody thought of a way to migrate in an automatic, or at least semi-
>	automatic way, modules from one of the systems to the other? I
>	have this huge quantity of modules in CAML wanting to be
>	compiled in Camllight...!

In fact the difficulties to port Caml programs to Caml-Light seem to
me of three orders:

1. 
   a. Functions written in the prelude file of Caml that are not
defined by Caml-Light: this is easy to correct, just try to compile
and for each unbound name, go in Caml prelude and copy-paste the
definition in a compatibility file loaded before your programs, for
example 
  b. Functions that are curryfied in only one version, generates a
typical typechecking error, so it seems easy to change
  c. Functions with arguments in another order, either generates a
typechecking error and it is easy to change, or for example functions
like "move src dest" changed in "move dest src", and it seems very
difficult to detect 
  d. Same function, same arguments, but not same result, for example 
"index" function for which in one version the first character is
number 1 and in the other version it is number 0

2. Functions that use the fact that the order of evaluation of Caml is
from left to right: it seems very difficult to automatize this phase

3. Functionalities that are different in Caml and Caml-Light like
parsers&printers vs. grammars or some differences in arithmetic, it
(Continue reading)

Xavier Leroy | 1 Mar 1994 20:45
Picon

Re: Documentation for the type-checker for Caml Light

> The code of Caml Light is not 'just' the
> Hindley/Milner/Robinson kind of thing, though there are
> similarities.

There are a number of tricks in the efficient implementation of type
inference that you won't find in Hindley, Milner, nor Robinson, indeed.
Some are described in Cardelli's paper "Basic polymorphic type-checking",
Science of computer programming, 8(2). 

The Caml Light typechecker also uses "levels" on type variables to
implement generalization efficiently. This folklore trick is described
in Weis and Leroy's book "Le langage Caml" (in French, unfortunately),
and also (but much more abstractly) in some publications by Didier
Remy. I believe Didier is on the list, so maybe he can provide more
references.

> What are dangerous type variables

A type analysis to prevent polymorphic mutable structures (e.g.
polymorphic references), similar in purpose to imperative type
variables but much superior in my opinion. See my POPL 91 paper and
my PhD thesis (available on
ftp.inria.fr:INRIA/Projects/cristal/Xavier.Leroy).

> and why does
> the type checker include two unification algorithms?

I don't understand what you're alluding to. Where is the second algorithm?

- Xavier Leroy
(Continue reading)

Didier Remy | 2 Mar 1994 15:37
Picon
Picon
Favicon

Re: Documentation for the type-checker for Caml Light

> The Caml Light typechecker also uses "levels" on type variables to
> implement generalization efficiently. This folklore trick is described
> in Weis and Leroy's book "Le langage Caml" (in French, unfortunately),
> and also (but much more abstractly) in some publications by Didier
> Remy. I believe Didier is on the list, so maybe he can provide more
> references.

What Xavier Leroy refers to is described in the paper [1]  together with the
addition of an equational  theory on ML  types (abstract at the  end of this
message). 

The idea is to mark all nods & variables of types with integers that keep
track of the first occurrence of those nods & variables in the typing context.
The unification algorithm has to update the marks when merging two types.
Generalization of a type can then be done in time proportional to the number
of nods & variables that have been introduced since the most recent LET
instead of checking through the whole type environment (that might be very
large).

The unification algorithm of Caml Light reuses similar marks, but they are
not updated at the same place. 

    Didier.

----------------

[1]   Didier R\'emy.  "Extending ML Type System with a Sorted Equational
      Theory".  INRIA Research Report 1766, year 1992.

      Can be retrieved by anonymous ftp as eq-theory-on-types.{dvi,ps}.Z at
(Continue reading)

Pierre Weis | 3 Mar 1994 12:06
Picon
Picon

bigloo

[Version anglaise]
Bigloo, a new optimizing Caml compiler is now available by anonymous ftp from
ftp.inria.fr (192.93.2.54), directory lang/caml-light, file
bcl1.6.0.0unix.tar.Z.

Bigloo is bootstrapped and fully compliant with the Caml Light 0.6
compiler and the Caml Light libraries (including camlyacc, camllex).
It can be considered as an alternative to the regular camlc compiler.
In particular, it has successfully compiled many complex Caml Light
programs, including camlc and the Coq system (the ``calculus of
constructions'', a proof assistant).

Bigloo is running on the following unix platforms:
- SPARC (1, 2, 10) under Bsd and Solaris 2.xx
- SONY-NEWS (mips r3000)
- IRIS indigo (mips r3000)
- SUN 3/60
- DEC Station 3100 (mips r3000)
- HP-PA (730)
- PC-linux (i486)

To install and use Bigloo, you need an ANSI C compiler.

For further informations contact {Manuel.Serrano,Pierre.Weis} <at> inria.fr.

[French version]
Bigloo est un nouveau compilateur optimisant pour Caml. Il est
disponible par ftp anonyme sur le site ftp.inria.fr (192.93.2.54), re'pertoire
lang/caml-light (fichier bcl1.6.0.0unix.tar.Z).

(Continue reading)

Pierre Weis | 16 Mar 1994 12:08
Picon
Picon

Bigloo compilation

Although Bigloo is very easy to use, it is not so evident to switch
from regular camlc compilation to Bigloo compilation: you have to
adapt the make files.
This message is intended to help you in this process: it gives you a
very simple way to build a new make file, adapted to the compilation
using Bigloo.

Makefile template:
------------------
We give here a makefile template to build applications using
the Bigloo compiler. It uses a perl script, camloodep, to automatically
find out the depencies between modules. Thus, to build your
application, you have to complete the template by filling the make
variables OBJS and DEST:
OBJS is the exhaustive list of object files used in the application
(remember that this are .o files in bigloo, instead of .zo files when
compiling with camlc).
DEST is the name you want to give to your application.

Then you have to type in the two following shell commands:
1) First
   make depend
   that completes the makefile (which is supposed to be named Makefile,
   and should be writable), with all the dependencies found in the Caml
   source files.
2) Second
   make
   that compiles all your source files and build the target executable program.

Compilation flags for Bigloo:
(Continue reading)

John Harrison | 16 Mar 1994 18:01
Picon
Picon
Favicon

Structure sharing


My research work is in theorem proving and I'm quite interested in the impact
on efficiency of structure-sharing the datatypes representing terms etc.

Has anyone tried implementing global hash consing in CAML? (That is, ensuring
that you never keep two distinct copies of any "cons cell".) Presumably this
would make the system slower in most applications (the faster EQ for structural
equality testing being overwhelmed by the slowness of constructors). But it
might make it much more usable on really small systems, which is consonant with
the CAML Light ideology.

Alternatively, how well would it work if I used the "hashtbl" library and did
everything myself for the datatypes I'm interested in? Am I making things non
GC-able by putting them in a hash table? How do the hash tables in "hashtbl"
work exactly?

Thanks,

  John Harrison (jrh <at> cl.cam.ac.uk)

  Hardware Verification Group
  University of Cambridge Computer Laboratory
  New Museums Site
  Pembroke Street
  Cambridge CB2 3QG
  England.

Pierre Weis | 16 Mar 1994 20:43
Picon
Picon

Sharing

Hi John,

As far as I know nobody have ever tried to implement global hash
consing in CAML. However structure-sharing on terms handled by the coq
system have been used once. It revealed to be interesting but not
decisive for efficiency.

On the other hand, there exists at least one implementation of ML
featuring hash-consing: HiML written by Jean Goubault
(J.Goubault <at> frcl.bull.fr). He showed that this can be feasible and efficient.

If you don't want to share everything ever allocated in the memory but
some well identified objects, you can perfectly use hash tables. The
problem is, as you mentioned it, that objects you put in a global hash
table cannot be reclaimed by the garbbage collector, since they are in
used (in the hash table!). This is not a problem, if you already know
that these object are structures or parts of structures already used
elsewhere. That's why sharing was used in coq for theories, that were
necessarily manipulated by the theorem prover.

Finally, the hash tables in the "hashtbl" module are not magic: they
are written in Caml, and you can read the source code which is in the
standard distribution. We do not claim that these hash tables got
the most efficient implementation: their implementation is reasonably
good and works very fine in most cases. But since you have the source
code, you can adapt this code to your particular problem, it a very
common practice to redefine some of the library functions to tune it
to a particular use.

Pierre Weis
(Continue reading)

Xavier Leroy | 16 Mar 1994 21:26
Picon

Re: Structure sharing

To complement Pierre's answer:

> But [hash-consing] might make it much more usable on really small
> systems, which is consonant with the CAML Light ideology.

Generally speaking, the design of Caml Light goes a long way to get
compact code, but does not try very hard to represent data compactly.
For instance, cons cells are 3 words instead of 2. In practice, the
only system where more compact data representations would make a big
difference is the 8086 PC, with its 640k limit; as soon as, say, 2M
are available (which is the case on all Macintoshes and PCs nowadays),
the current representations seems to be unproblematic.

> Alternatively, how well would it work if I used the "hashtbl" library
> and did everything myself for the datatypes I'm interested in? Am I
> making things non GC-able by putting them in a hash table?

Yes. By programming directly in C and using special features of the
current garbage collector (finalized objects and the fact that objects
allocated in the old generation are never relocated), you could get a
more efficient implementation of hash-consing that does not prevent
garbage collection. That's definitely non-trivial.

> How do the hash tables in "hashtbl" work exactly?

The only bit of magic is the hashing primitive, which walks the data
structure representing any Caml Light value (in the same way as
generic structural equality) and computes a hash value based on the
nodes encountered. Apart from this, it's standard dynamic hashing.

(Continue reading)

Chet Murthy | 17 Mar 1994 10:21
Picon
Picon

Re: Structure sharing


Hi, John,

I also did some work recently with structure-sharing in Coq.
Specifically, I wrote a caml-light generic hash-conser which walks the
in-memory representation in an unsafe manner, "re-sharing" a
datastructure.  Since I don't have type information, I can't detect
the fact that something's mutable, so this only works for purely
applicative data.

Whatever.  Anyway, I had an application where I wanted to read
something in off of disk, and then "un-share" it completely - walk it
completely, so I could hash the leaves of the datastructure into a
symbol table.  But then, well, the datastructure is unshared - I ought
to re-share it, no?  Well, hash-consing was a modest loss on a
reasonable-sized benchmark.

That is, for Constructions at least, the datastructures are simple
enough, and small enough, that hash-consing costs more than it is
worth when it is done "globally"

On the other hand, when I did it on the output of the type-checker,
and the term which I fed as input (coupled, so that the type-checker's
output will be hash-consed "onto" the term being type-checked), I got
a slight speedup.

All this to say that it probably isn't a big win either way, in
caml-light, for Coq.

--chet--
(Continue reading)


Gmane