Skip to content
This repository has been archived by the owner on Feb 2, 2023. It is now read-only.

Compiler or Language projects to work on

Runhang Li edited this page Aug 22, 2017 · 12 revisions

NB: This page is sporadically maintained. There is no guarantee that the suggested projects here will be merged upstream when completed. To avoid disappointment, consider contacting the OCaml developers before starting work on anything listed here.

See Getting started hacking the compiler for OCaml installation instructions and other important details.

Please keep this page updated: if you're working on an issue, edit the Who is working on this field; if you choose to work on something that's not currently listed, it would be helpful if you could add it.

Collaboration is strongly encouraged! Please indicate if you would be willing to act as a mentor, etc.

If you complete a project, or notice that a project has been completed, please move it to the Compiler or Language projects previously worked on wiki page.


New: Added 16th May 2017

In an effort to highlight some areas needing attention, we've added these jobs:

  • #1152 Code Review or testing: this generalises dtimings to show allocations and top heap size. (Mentor: Mark Shinwell, difficulty: ★★★☆☆)
  • #1080 needs solving to figure out if SpaceTime should be built for bytecode-only libraries. (Mentor: Mark Shinwell, David Allsopp, difficulty: ★★☆☆☆)
  • Implement Unix._exit and change the testcase in this MPR (Mentor: Mark Shinwell, difficulty: ★★☆☆☆)
  • Figure out how to get built-in identifiers like Sys_error in the documentation, from MPR#3468 -- only part 2 needs doing (Mentor: David Allsopp, difficulty: ★★★☆☆).
  • Document the environment variable CAML_DEBUG_SOCKET in MPR#6504 (Mentor: David Allsopp, difficulty: ★☆☆☆☆)
  • Look through Mantis and fix issues marked as bugs (not "features" etc).
  • Code review for GH PRs.
  • Multicore Good First Tasks.

Prior to 16th May 2017

These are a mixture of jobs/projects, some of which are up to date and others less so. Worth a look through :)

FLambda projects

The compiler-hacking label in the flambda repository includes additional suggestions.

Test/benchmark FLambda on existing OCaml programs

  • Expertise: ★☆☆☆☆
  • Time: ★★☆☆☆
  • Mentor: Leo
  • What needs to be done: The FLambda branch adds a whole new optimising phase to the OCaml compiler, intended to improve inlining in particular. It is being actively tested and benchmarked in order for it to be merged upstream in the next couple of weeks (in time for the OCaml 4.03 release). This effort would be greatly helped by testing and benchmarking existing OCaml programs with FLambda and OCaml trunk. Flambda can be found here in the flambda_trunk branch. Any issues or regressions can be reported here.

Support segmentation faults in the FLambda branch

  • Expertise: ★★★★☆

  • Time: ★★☆☆☆

  • Issue: https://github.com/OCamlPro/flambda-task-force/issues/24

  • Mentor: Mark/Leo

  • Who is working on this:

  • See the issue for details.

    It's sometimes useful for debugging purposes to have a way of reliably triggering segfaults. The current approach (coercing 0 to a pointer and then writing to it) is statically detected as an error with the flambda branch, so a new approach is needed.

Bugs & Features

Some bugs and features to work on:

Empty record and variant types

  • Expertise ★★★☆☆
  • Time: ★★★☆☆
  • Mantis: Mantis #7583
  • Mentor:
  • Who is working on this: @objmagic
  • What needs to be done: Support empty record and empty variants type.
  • Status:
    • [08/19] Grammar has been extended. Need to work on later phases.
    • [08/21] If we consider type t = {} as record type, it seems impossible to determine ty_expected of let x = {}. ty_expected of a record value is determined when unifying record labels. However, empty record has no labels to be unified so ty_expected cannot be determined.
    • diff

Warn when a let rec function is not recursive

  • Expertise: ★★★☆☆

  • Time: ★★★☆☆

  • Mantis:

  • Mentor:

  • Who is working on this: @objmagic

  • What needs to be done: Currently, there is no warning when a recursive functions are not invoked recursively. This could lead to difficult to debug behaviors. For example:

    # let foo x = print_int x;;
    val foo : int -> unit = <fun>
    # let rec loop = function
        | [] -> ()
        | x::xs -> foo (x);; (* Recursive call missed? *)
    val loop : int list -> unit = <fun>

The following warning should be displayed:

```ocaml
Warning ??: Recursive function 'loop' is not invoked recursively.
```

There is a range of possibilities for warning about groups of functions bound with 'let rec'. For example, it might be helpful to issue a warning indicating that the following group

    let rec x = fun () -> (x; ())
    and y = fun () -> (z; ())
    and z = fun () -> (y; ())

can be broken up into two cliques:

    let rec x = fun () -> (x; ()) in
    let rec y = fun () -> (z; ())
    and z = fun () -> (y; ())
  • Status:
    • Unused rec warning is implemented in Warning 39.
    • Suggesting grouping could invoke a SCC on the existing pat_slot_list to identify cliques. However, one needs to collect identifier (Ident.t) of every pattern in pat_slot_list. This comes as a problem since validity of let-rec patterns will be checked only at a later stage (bytecode compilation) (this piece of code also shows how one should collect identifiers from let-rec patterns). One possible solution is to move check of validity of let-rec patterns to earlier stage (type-checking) (See GPR#556)
    • diff

Add support for [@tailrec] annotations

  • Expertise: ★★★☆☆

  • Time: ★★★☆☆

  • Mantis:

  • Mentor:

  • Who is working on this:

  • What needs to be done: The new [@tailcall] annotation makes it possible to ensure that individual calls are in tail position. However, it's often more convenient to annotation function bindings (e.g. let rec groups) rather than individual calls. For example, we might annotate foldl as follows:

    let rec foldl [@tailrec] = fun op acc -> function
       [] -> acc
     | x :: xs -> try foldl op (op x acc) xs with Not_found -> assert false

    or perhaps

    let rec foldl = fun op acc -> function
       [] -> acc
     | x :: xs -> try foldl op (op x acc) xs with Not_found -> assert false
    [@@tailrec]

    The suggested approach is to implement [@tailrec] in terms of [@tailcall].

  • Status:

Support explicit polymorphic types in value specifications

  • Expertise: ★★★☆☆

  • Time: ★★★☆☆

  • Mantis: None

  • Mentor: Leo

  • Who is working on this:

  • What needs to be done: Add support for the following syntax

    val id: 'a . 'a -> 'a

    which would mean the same as:

    val id: 'a -> 'a

    This should also be the default way to print out value specifications. This proposal should help users to appreciate the difference between the (universally quantified) type variables in value specifications and the (unification) type variables in regular type annotations (e.g. (add : 'a -> 'a)).

  • Status:

Document the -opaque option of ocamlopt and add OCAMLPARAM support

  • Expertise: ★☆☆☆☆
  • Time: ★☆☆☆
  • Mantis: 6955
  • Mentor:
  • Who is working on this:
  • What needs to be done: Add a description of the new -opaque option (added in this pull request) to the OCaml manual and to the ocamlopt man page. Add support for setting this option using the OCAMLPARAM environment variable.
  • Status:

Convert camlp4 extensions to use extension points

  • Expertise: ★★★☆☆
  • Time: ★★★☆☆
  • Mantis: None
  • Mentor:
  • Who is working on this: ???
  • What needs to be done: OCaml 4.02 adds support for extension points as a partial replacement for camlp4. A number of popular camlp4 extensions (e.g. pa_macro) would be very suitable for reimplementation using extension points. Peter Zotov has written a guide to writing extension points which could be a good starting point.
  • Status:

Archive manipulation tool

  • Expertise: ★★☆☆☆
  • Time: ★★★☆☆
  • Mantis: 2375
  • Mentor: ?
  • Who is working on this:
  • What needs to be done: It would be useful to have a tool that could extract, list, and describe object files (.cmo) within an archive file (.cma), in the spirit of GNU ar.
  • Status:

Improve time printing and parsing functions

Give location information for warnings in ocamldoc

  • Expertise: ★☆☆☆☆

  • Time: ★★★☆☆

  • Mantis: 5901

  • Mentor: ???

  • Who is working on this: @superbobry

  • What needs to be done: Currently ocamldoc reports warnings for problems such as unresolved cross-references, but does not report the location in the source where the problem occurred:

    $ cat w.ml
    (** the function {!bar} is called by {!foo} *)
    let bar x = x + 1
    
    (** the function {!foo} calls {!bear} *)
    let foo x = bar x
    $ ocamldoc w.ml
    Warning: Element bear not found
    

    The ocamldoc code needs to be updated to propagate location information and include it in warning messages.

  • Status: see comments on Mantis.

Signatured open command

  • Expertise: ★★★☆☆

  • Time: ★★★☆☆

  • Mantis: None

  • Mentor: Leo

  • Who is working on this: @stedolan, @objmagic

  • What needs to be done: Add a version of open which accepts a path and a signature, and only adds the members of the module that are in the signature into the environment.

    module type S = sig val x : int end
    
    module M = struct let x = 3 let y = 4 end
    
    open (M : S)
    
    (* Only M.x has been added to the environment as x *)
  • Status: There is a patch (signatured-open) that allows the above code. However, it does not yet take account of changes in the ordering of module items, so code such as

    module type S = sig val y : int val x : int end
    
    module M = struct let x = 3 let y = 4 end
    
    open (M : S)
    
    let z = x - y

    is not handled correctly (and can result in a segmentation fault).

  • Status: https://www.cl.cam.ac.uk/~jdy22/papers/extending-ocamls-open.pdf

Shrinkwrap optimisation for native code

  • Expertise: ★★★★☆
  • Time: ★★☆☆☆
  • Mantis:
  • Mentor: Stephen
  • Who is working on this: Stephen?
  • What needs to be done: Prologue/epilogue code need only be inserted around the parts of a function's body that actually use the stack. This would make the common case of caml_apply{1,2,3,N} a bit shorter.
  • Status:

Expressions like 3#0;; are ignored by the toplevel and the compiler

  • Expertise: ?
  • Time: ?
  • Mantis: https://caml.inria.fr/mantis/view.php?id=6604
  • Mentor:
  • Who is working on this: marek
  • What needs to be done: #[lineno] is actually a lexer directive that consumes the rest of the line. It seems to be mostly used as # [lineno] [filename], so the idea is to make the [filename] argument mandatory and report all other invocations as warnings.
  • Status: I've managed to find my way around the codebase and I've compiled the stricter version of the directive, which WFM. If I can use it to also make bootstrap I'll open a pull request with the proposed change. GitHub PR 931

Large projects

Here's a list of larger compiler-related projects. We've outlined what needs to be done, and welcome contributions.