Skip to content

Model design

Thomas Refis edited this page Dec 17, 2018 · 2 revisions

Outline

The model is the library that does most of the heavy lifting for odoc. Its aim is to provide descriptions of the interfaces of compilation units that are accurate according to OCaml’s module system semantics. This document describes the intended design of the model. Many of the details differ from what is currently implemented.

Paths

Most of the model’s work involves accurately determining which definition a path refers to. To do this, it tries to use very accurate representations of the different kinds of path present in an OCaml compilation unit. The representations try to ensure that all paths point to a definition of the appropriate kind. The representations also keep track of which paths have been “resolved”. A path is only “resolved” once we have verified that the corresponding definition exists in the place that we expect it.

The representations of paths are in the Paths_types and Paths modules.

Roots

The model tries to be completely independent of the “root” type used to represent compilation units – in fact it used to be polymorphic over it. This allows the rest of odoc the freedom to decide how different units are represented. This avoids the model making any assumptions about what compilation units might be present in the documentation. It also allows the root to contain information about where a particular unit is located (e.g. what package it is in).

Identifiers

Identifiers represent concrete definitions within compilation units. When generating documentation these correspond to the actual URL that a path should link to. They are pretty simple, consisting of various projections originating eventually from a single [Root] element.

Paths

Paths represent the concept of “path” within the OCaml language. It is these that appear in the representations of OCaml definitions (e.g. in descriptions of type expressions). Each resolved path refers to a single identifier, but a single identifier might have many possible paths.

Since we only handle interfaces, there are only paths for modules, module types, types and class types. Unlike identifiers, paths are quite complicated. They support simple projections, functor applications and a variety of book-keeping paths which dictate how a particular path should be interpreted as an identifier. The book-keeping paths are discussed in the sections on hidden paths, canonical paths, module alias substitutions and functor argument substitutions.

Fragments

Fragments represent the fragments of OCaml paths that appear in with type and with module constructs in the module language. A fragment does not, on its own, correspond to an identifier. It instead represents how one identifier can be extended to become a second identifier.

Fragments contain similar constructors to paths, although they do not need to deal with functor application which simplifies things a bit.

References

References represent paths written within documentation comments. Each resolved reference refers to a single identifier, but a single identifier might have many possible references.

References have similar constructors to paths, but with additional projections to handle references to other kinds of definition, and no support for functor application (yet).

Note that references must either be to a compilation dependency, a unit within the same package, or they must explicitly mention the package in which the target definition resides.

Hidden paths

Many definitions are not intended to be visible to the user. We treat any definition whose identifier contains __ as hidden, as well as supporting explicit hiding constructs in documentation comments. The model tries to replace paths to hidden definitions with paths to equivalent non-hidden definitions. In order to do this, it marks any resolved path or reference to a hidden definition with a Hidden path constructor. This constructor optionally includes an alternative path that can be substituted for the hidden one.

Canonical paths

Some definitions are intended to be hidden and always replaced by a specific alternative definition. Here the model allows all paths to a particular definition to be replaced with an equivalent reference. To support this it marks any resolved path or reference to such a definition using a Canonical path constructor. This constructor contains the original path and the reference it should be replaced with. If the reference cannot be resolved then the original path is still used.

Module alias substitutions

To (dramatically) reduce the size of interfaces the model does not expand interfaces for module aliases. So if the module Foo is defined as an alias for the module Bar then there is no page for Foo. This means that any paths to the definition of Foo should instead refer to the identifier for Bar. One exception to this rule is if the definition of Bar is hidden, in which case Foo is given its own page. To support this the model marks any resolved paths or reference to a module alias with an Alias path constructor. This constructor contains the original path of the alias and the path of which it is an alias.

Functor argument substitutions

Due to the dependent nature of module types, some paths involving functor application require substitution of the functor argument to make a valid identifier. For example, given:

module F (X : sig module type S end) : sig module N : X.S end

module M : sig module type S = sig type t end end

then the path F(M).N.t should correspond to the identifier for M.S.t. To represent these cases the model would mark the path of F(M).N with a Subst path constructor, which includes the path of the module F(M).N along with the path of the module type M.S that it should be replaced with when projecting from it. There is also a similar SubstAlias path constructor for replacing a module path with another module path. Unlike the case of module aliases above, this replacement must be carried out even if the replacement path is hidden because without replacement there is no valid path.

Forward paths

Module alias paths created with -no-alias-deps option may point to compilation units that are further forward in the dependency graph than the unit that contains them. Such paths cannot be resolved until the “link” phase of generating interfaces. In earlier phases they are explicitly marked with a Forward constructor.

In full generality, there is no complete way to handle forward paths since they essentially behave as dynamically bound names. To avoid this we only allow forward paths within a package, and mark any other forward paths as unresolved.

Compilation units

Compilation unit interfaces are represented by a set of mutually recursive types. Each kind of definition has it’s own type. These types contain the identifier of the definition, any associated documentation, and a description of the definition itself.

The representations of compilation unit interfaces are in the Lang module.

Expansions

The descriptions of modules, module types, includes, classes and class types, also contain an optional “expansion” field. This contains a normalized version of the definition that directly contains its constituent components. For example, given:

module M : sig type t end

module N = M

the description of the N module will essentially be = M, whilst its expansion would be a representation of sig type t = M.t end.

Compiling and linking phases

Creating descriptions of compilation unit interfaces is done in two phases: “compile” and “link”.

Compiling

The compile phase requires all compilation dependencies of the unit to have already been “compiled”. This allows most paths and expansions to be done, since those are following the OCaml semantics and the compilation dependencies are all the modules which the OCaml type-checker needed to read to understand the file. The main exception is forward paths, where we don’t follow OCaml’s (slightly insane) semantics and so may require additional units outside of the compilation dependencies.

Reading OCaml files

The compile phase first reads an OCaml file (e.g. .cmti, .cmi, .cmt) and creates an initial description of the interface with paths and references unresolved and no expansions. This part is handled by the modules Cmi, Cmti and Cmt.

Resolving backward paths

Using the “compiled” descriptions of the unit’s compilation dependencies, the model can resolve all backwards paths. Forward paths and references are left until the “link” phase.

Expansion

Using the “compiled” descriptions of the unit’s compilation dependencies, the model can add expansions for modules, module types, classes and class types. As mentioned above, we don’t add expansions for module aliases to keep down the size of the interfaces.

Linking

The linking phase requires the following units to have been “compiled” (but not “linked”):

  • All units in the same package as the current unit
  • All units in the same package as a compilation dependency of the current unit
  • All units in packages that are mentioned by name in references

Compared to the dependencies for “compile”, dependencies for the “linking” step are defined at the granularity of the package and include packages mentioned by name in references.

Note that odoc provides a command which, given a “compiled” unit, lists all the packages that need to have been “compiled” before you can “link” that unit.

Resolving forward paths

Forward paths are resolved during linking. Since we treat forward paths that go across a package boundary as unresolved, any legal destination of a forward path will be “compiled” before the linking step.

Resolving references

References are resolved during linking. Since forward references that cross package boundaries must name the destination package explicitly, any legal destination of a reference will be “compiled” before the linking step.

Abstracting aliases containing hidden paths

As mentioned earlier, any module alias to a hidden path is replaced by its expansion. Note that a path containing a Hidden node where the right-hand side of the node is resolved is not considered hidden – since we can just use the non-hidden right-hand side instead of the hidden path.

This is done during linking because the “hiddenness” of a path can depend on forward paths (due to forward paths in module aliases) or references (due to canonical paths).

Hidden paths includes “self-canonical aliases”, which are aliases to a path with a canonical element that points back to the original alias. For example, given

(** @canonical N *)
module M : sig ... end

module N = M

the definition of N (= M) is a “self-canonical alias” and as such is treated as an alias to a hidden path.

In addition to hidden module aliases, there are also hidden type aliases, module type aliases, etc. For definitions that can have expansions we replace the alias with the expansion. For definitions that cannot have expansions we remove the alias to leave an abstract definition.

Components data structure

The bulk of the work done by the model should be done using the “components” data structure. This data structure is an optimized representation of a module’s interface that supports path resolution, reference resolution and module expansion. These operations could all be done directly using the compilation unit interface’s data, however that is very inefficient so using a specialized structure that takes advantage of laziness and memoization is needed for reasonable performance.

Module operations

There are a number of core module system operations that must be supported by the components data structure.

Signatures

A constructor is needed to build a components value from the description of a module signature. This requires iterating over the definitions in the signature.

Aliases and module type paths

A constructor is needed to build a components value to represent an alias to a module path or a module whose type is a module type path.

Projections

Projections allow a sub-module to be extracted from the components of a module. There are two kinds of projections here: identifier projections and path projections.

Identifier projection vs. path projection

Consider the following module definitions:

module M : sig

  type t

  module N : sig

    type s = t

  end

  (* A *)

end

(* B *)

There are two different views on the module type of M.N, the view at (* A *) which is:

sig type s = t end

and the view at (* B *) which is:

sig type s = M.t end

the difference between the two views is that in the first the projection of t from M occurs within the identifier (and so is not visible to the user), whilst in the second view the projection of t from M occurs withing the path. We call the first view of M.N the identifier projection, and the second view the path projection.

Renaming

Renaming occurs when a module type is used in a different location. For example in,

module type S = sig type t end

module M : S

we do a renaming when we copy the components of S to M. This involves changing the identifiers associated with the definitions in S, which is analogous to generating fresh identifiers as the OCaml compiler would do in this situation.

Substitution

Substitution occurs during functor application. It involves replacing all uses of the identifier of the functor parameter with the path of the functor argument.

Strengthening

Strengthening occurs when a module alias is expanded to its constituent components. It involves recursively finding abstract definitions in the module and extending them with aliases to the corresponding definitions in the aliased module. For example, given

module M : sig type t type s = int end

then = M will expand to sig type t type s = int end strengthened with M, which is:

sig type t = M.t type s = int end
Extension

Extension of a module type with additional equations happens when using with type = and with module =.

Destructive substitution

Destructive substitution is for handling with type := and with module :=

Memoization and laziness

There are many places where memoization and laziness can improve the performance of the components data structure. We describe a few important ones below.

Memoizing lookups

All lookups of identifiers and paths can be memoized by memoizing lookups of roots, projections from components and functor applications of components.

It is worth noting that whilst constructing the components of a particular module there will be lookups of sub-components of that module, which could cause an infinite loop. Such issues can be avoided by having a local environment that maps local identifiers to the corresponding memoized components, and passing that environment around whilst constructing the original components structure.

Lazy construction of components

The construction of components from signatures and identifiers can be done lazily. This avoids needing to process an entire compilation unit in order to access a single sub-component within that unit. (This also helps with the infinite loop problem mentioned above).

Lazily applying recursive operations

The recursive operations (e.g. renaming, substitution, strengthening) can be done lazily. This is particularly important for renaming and substitution. Looking up a sub-component of a module only requires performing renamings and substitutions on the sub-component being looked up, however if these recursive operations are performed eagerly then the entire module will be processed unnecessarily.

Clone this wiki locally