Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first draft of URL specification, refs #93 #216

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
362 changes: 362 additions & 0 deletions solutions/documentation/url-specification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,362 @@
# URI Specification v1.0.0

- [Introduction notes](#introduction-notes)
- [Indexable Object](#indexable-object)
- [Attributes](#attributes)
- [name](#name)
- [pod](#pod)
- [kind](#kind)
- [subkinds](#subkinds)
- [categories](#categories)
- [Indexable object types](#indexable-object-types)
- [Primary indexable objects](#primary-indexable-objects)
- [Pod blocks](#pod-blocks)
- [Name attribute](#name-attribute)
- [Kind Type](#kind-type)
- [Kind Language and Programs](#kind-language-and-programs)
- [Multiple pod blocks](#multiple-pod-blocks)
- [Secondary indexable objects](#secondary-indexable-objects)
- [Kind::Routine](#kindroutine)
- [Attributes](#attributes-1)
- [Kind::Syntax](#kindsyntax)
- [Attributes](#attributes-2)
- [Kind::Reference](#kindreference)
- [Attributes](#attributes-3)
- [URI setting](#uri-setting)
- [URI setting](#uri-setting-1)
- [URI rewriting](#uri-rewriting)
- [Examples](#examples)
- [Get all URLs from Primary objects](#get-all-urls-from-primary-objects)
- [Get all URLs from Secondary objects](#get-all-urls-from-secondary-objects)
- [Classification of secondary objects by name](#classification-of-secondary-objects-by-name)

## Introduction notes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably add here some general principles. What we want from this URL scheme, things like compatibility, testability, things like that.

In this specification, we want to set a common guidelines and rules for URLs in the [official documentation](raku.docs.org). This specification must be implemented be the tools generating the HTML pages of the doc site ([Documentable](https://github.com/Raku/Documentable) at this moment). For that reason, the described behavior is **currently** implemented in `Documentable:ver<2.0.0>`.

Right now, there are some tests for URL generation in `Documentable`, but they are kind of scattered and are insufficient, so when this specification is finished, a dedicated set of tests will have to be created. Maybe a spec json file, as [Mustache](https://github.com/mustache/spec/tree/master/specs) does.

In the official documentation, there are a lot of different things, like pages generated directly from a single [source file](https://docs.raku.org/type/Associative) from pages generated by grouping certain parts of several [source files](https://docs.raku.org/routine/(%7C),%20infix%20%E2%88%AA). In order to represent this information in a manageable way, we use certain data structures, name conventions and metadata, but everything is based on **indexable objects**.

## Indexable Object

An indexable object is a set of information documenting one thing or several related things that can be referred to. In order to clarify this definition, you can think of an indexable object as the documentation for a certain [type](https://github.com/Raku/doc/blob/master/doc/Type/Any.pod6), for some [method](https://github.com/Raku/doc/blob/aec4740ded31770c799b5e236d9e5d423b8f988b/doc/Type/Any.pod6#L19-L34) or a [tutorial](https://github.com/Raku/doc/blob/master/doc/Language/grammar_tutorial.pod6). Even [references](https://github.com/Raku/doc/blob/master/doc/Type/Any.pod6#L102) are indexable objects.

We can extract and set additional information to these objects, in order to classify them and create URIs to refer them. **All indexable objects** share these attributes but with different values:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every indexable object gets its own URI?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Though URIs of secondary objects are not unique (all secondary objects with the same name are grouped to form a single page).

### Attributes

#### name

Relatively short string to name the object. For instance: `Any`, `X::AdHoc`, `101-basics`. etc. See [indexable object types](#indexable-object-types).

#### pod

Pod representing the indexable object. This pod *does not have to be* a `=begin pod ... =end pod` block. It *must* be a [Pod::Block](https://docs.raku.org/type/Pod::Block) or an array containing `Pod::Block`s.

#### kind

This a **fixed** list of values to make a less granular classification of what the pod is representing.

~~~perl6
enum Kind (Type, Language, Programs, Syntax, Routine, Reference);
~~~

The first three ones, cannot be easily deduced from the indexable object, so they need to be **specified by the user**, whereas the last three can be deduced without much trouble. You should set one of this to the indexable objects depending on the documentation you are trying to represent:

- `Type`: when the docs is about a class, a role or an enum.
- `Language`: when the docs is related to the language itself.
- `Programs`: when the docs is describing a program: a debugger, for instance.

Automatically deduced:

- `Syntax`: when the docs is related to a `twigil`, `constant`, `variable`, `quote` or `declarator`<sup>1</sup>.
- `Routine`: when the docs is related to a `sub`, `method`, `term`, `routine`, `submethod`, `trait`, `infix`, `prefix`, `postfix`, `circumfix`, `postcircumfix` or a `listop`.
- `Reference`: when the docs is just a `X<>` element.

#### subkinds

This is used as a more granular classification of indexable objects, based on the contents of the documentation. The value of these subkinds depends on the kind of the indexable object:

- `Type`: specified by the user.
- `Language`: specified by the user.
- `Programs`: specified by the user.
- `Routine`: deduced from the pod. A list containing a subset of these values: `infix, prefix, postfix, circumfix, postcircumfix, listop, sub, method, term, routine, submethod, trait twigil constant variable quote declarator`.
- `Syntax`: deduced from the pod. A list containing a subset of these values: `twigil, constant, variable, quote, declarator` <sup>2</sup>.
- `Reference`: indexable objects of this kind always have the same `subkinds` value: `['reference']`.

#### categories

This is also used as a more granular classification of indexable objects, nonetheless, this classification is not based entirely in the contents of the documentation. This value also depends of the kind of the indexable object:

- `Type`: specified by the user.
- `Language`: specified by the user.
- `Programs`: specified by the user.
- `Routine`: same as `subkinds` except if `subkinds` contains one the following values: `infix, prefix, postfix, circumfix, postcircumfix, listop`. In that case, `categories` is always `['operators']`.
- `Syntax`: same as `subkinds`. <sup>3</sup>.
- `Reference`: indexable objects of this kind always have the same `subkinds` value: `['reference']`.

## Indexable object types

### Primary indexable objects

#### Pod blocks

A primary indexable object is created from a `pod block`. A pod block is just a pod structure like this one:

~~~perl6
=begin pod
...
=end pod
~~~

But that's not a *valid* one. For a pod block to be a primary indexable object, it needs to comply some rules:

- It must have a `=TITLE`.
- It must have a `=SUBTITLE`.
- It must contain three different key/value pairs following the format: `:kind(<string>) :subkind(<string>) :category(<string>)`.
- `:kind` has to be one and only one of the stringyfied version of the first three `Kind`s: `:kind("type")`, `:kind("language")` or `:kind("programs")`.
- `:subkind` is an arbitrary string.
- `:category` is an arbitrary string.

So, a valid primary indexable object is something like this:

~~~perl6
=begin pod :kind("Language") :subkind("Language") :category("migration")
=TITLE Perl to Raku guide - functions
=SUBTITLE Builtin functions in Perl to Raku
=end pod
~~~

In this key/value pairs, you can set the value of [subkinds](#subkinds) and [categories](#categories) of the first three kinds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those values fixed by this document?

#### Name attribute

Name attribute depends on the kind specified by the user in the primary indexable object:

##### Kind Type

In this case, the last word of the `=TITLE` element is taken as name. So, if we have the following primary indexable object:

~~~perl6
=begin pod :kind("Type") :subkind("class") :category("basic")
=TITLE class Any
=SUBTITLE Thing/object
class Any is Mu {}
=end pod
~~~

Its name would be `Any`.

##### Kind Language and Programs

In this case, due to the arbitrariness of the `=TITLE` element, we cannot deduce a name, so we take the name of the file, stripping out the extension. So, if we have the following primary indexable object:

~~~perl6
=begin pod :kind("Language") :subkind("Language") :category("migration")
=TITLE Perl to Raku guide - functions
=SUBTITLE Builtin functions in Perl to Raku
=end pod
~~~

stored in `/SomeDirectory/perl-raku-guide.pod6`, its name would be `perl-raku-guide`.

#### Multiple pod blocks

Several primary indexable objects of `Kind::Type` can be written in the same file as follows:

~~~perl6
=begin pod :kind("Type") :subkind("class") :category("basic")
=TITLE class Any
=SUBTITLE Thing/object
class Any is Mu {}
=end pod

=begin pod :kind("Type") :subkind("enum") :category("basic")
=TITLE enum Bool
=SUBTITLE Logical Boolean
=end pod
~~~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know what to make of this. If this document is a URL generation document, it should include all of it, not the directory part. Right now, the name of the file is taken directly from the name of the primary file. What happens in this case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi class files are only supported for types, so they have different names. I will clarify that.

They will be treated as two independent primary indexable objects.

### Secondary indexable objects

A primary indexable object can contain a lot of documentation, for instance, [Any](https://github.com/Raku/doc/blob/master/doc/Type/Any.pod6) has a very long list of methods. In order to gave a more granular documentation, we can extract certain parts of that pod and create more indexable objects.

#### Kind::Routine

To detect those parts, we use [pod headers](https://docs.raku.org/type/Pod::Heading). But not all pod headers are valid, they need to follow one of the these formats:

- `[T|t]he <single-name> <subkind>`
- `<subkind> <name>`

where

- `<subkind>` is one element of `infix, prefix, postfix, circumfix, postcircumfix, listop, sub, method, term, routine, submethod, trait`.
- `<single-name>` is a single word (without spaces).
- `<name>` can be formed by several words separated by spaces.

##### Attributes

- `kind` is set to `Kind::Routine`.
- `name` is set to `<single-name>` or `<name>`.
- `subkinds` is set to `(<subkind>)`.
- `categories`:
- If subkind is one of `infix, prefix, postfix, circumfix, postcircumfix, listop`, then it will be set to `("operator")`.
- If subkind is one of `sub, method, term, routine, submethod, trait`, then it will be set to the same value as `subkinds`.

#### Kind::Syntax

To detect those parts, we use [pod headers](https://docs.raku.org/type/Pod::Heading). But not all pod headers are valid, they need to follow one of the these formats:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't say "if necessary". It's going to be used in the URL fragment. So this should be part of the URL specification too.

- `[T|t]he <single-name> <subkind>`
- `<subkind> <name>`

where

- `<subkind>` is one element of `twigil constant variable quote declarator`.
- `<single-name>` is a single word (without spaces).
- `<name>` can be formed by several words separated by spaces.

##### Attributes

- `kind` is set to `Kind::Syntax`.
- `name` is set to `<single-name>` or `<name>`.
- `subkinds` is set to `(<subkind>)`.
- `categories`: will be set to same value as `subkinds`.

`=headn X<>`<sup>1</sup><sup>2</sup><sup>3</sup> is also a valid header.

#### Kind::Reference

These secondary indexable objects come from `X<>` elements (see [Pod::FormattingCode](https://docs.raku.org/type/Pod::FormattingCode)). They have to be written as follows:

~~~perl6
X<text|meta>
~~~

`meta` is a string containing several group of words, separated by `;`, and words inside each group separated by `,`. For instance: `foo, bar; w`. Raku would interpret that `meta` attribute as follows: `[ [foo bar] [w] ]`, that is, a list containing two lists: one with two elements and other with a single element.

From a single `X<>` element, several secondary indexable objects can be created, one for every group of words found in `meta`. For instance:

~~~perl6
X<text|a;b,c;d>
~~~

Would be interpreted as if you had typed:

~~~perl6
X<text|a>
X<text|b,c>
X<text|d>
~~~

`text` or `meta` can be empty strings, but not both at the same time, so `X<|meta>` and `X<text>` are valid references.

##### Attributes

In all cases, `kind` and `subkinds` are set to `Kind::Reference` and `['reference']` respectively. `categories` attribute is not set in these indexable objects.

`name` setting depends on `meta` variable:

- `meta` is an empty string. Then, `meta` is set to `[text]`. So it would be interpreted as `X<text|text>`.
- `meta` has only one element: `name` is set to the stringyfied version of `meta`. So `X<|a>` would get the name `a`.
- `meta` has more than one element: `name` is set to an alteration of `meta`. So `X<|a,b,c>` would get the name `c (a b)`.

##### URI setting

The URI of these indexable objects depends on the primary indexable object where the reference was found. The URI is formed as follows:

~~~perl6
"{$origin.uri}#index-entry-{$meta}-{$index-text}"
~~~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a table that clarifies how any URI is generated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not added a table yet because there are over 60 valid headers and references. Nonetheless, I have added a list of tests to check that the document describes the current behavior in Documentable:ver<2.0.1>

where:
- `origin.uri` is the URI of assigned to the primary indexable object where the reference was found.
- `$meta` is the concatenation by `-` of the groups found in `meta`.
- `$index-text` is `text`.

## URI setting

All indexable objects have an associated *Uniform Resource Identifier* or URI. It is formed based on the common attributes of all indexable objects, as follows:

~~~perl6
"/{$kind.lc}/{$name}"
~~~

## URI rewriting

As you may know, `Raku` accepts a huge range of symbols, so the `name` attribute can be a little bit weird sometimes (from a URI perspective). For this reason, `name` needs to be slightly altered to generate valid URLs. This alteration is made by making these replacements:

~~~perl6
/ => $SOLIDUS
% => $PERCENT_SIGN
^ => $CIRCUMFLEX_ACCENT
# => $NUMBER_SIGN
' ' => _
~~~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never really liked these, any thoughts on changing them to something else? For example, CIRCUMFLEX_ACCENT can be simplified to CARET, but maybe there's an even better way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To tell you the truth, I do not know. Some of those characters are there because they are not valid in paths. So maybe they should only be changed in the filename of the HTML page and redirected from the tool serving the pages. With a dynamic site, we would not need those I think.


## Examples

This specification is intended to be independent from the tool used, but as it's the first version, it's entirely based in the behavior of `Documentable:ver<2.0.0>`, so here you have some examples to check by yourself the concepts explained before:

##### Get all URLs from Primary objects

~~~perl6
use Documentable:ver<2.0.0>;
use Documentable::Registry:ver<2.0.0>;

my $registry = Documentable::Registry.new(
:topdir("doc"),
:dirs(DOCUMENTABLE-DIRS),
:!verbose,
);

$registry.compose;

say $registry.documentables.map({.url});
~~~

##### Get all URLs from Secondary objects

~~~perl6
use Documentable:ver<2.0.0>;
use Documentable::Registry:ver<2.0.0>;

my $registry = Documentable::Registry.new(
:topdir("doc"),
:dirs(DOCUMENTABLE-DIRS),
:!verbose,
);

$registry.compose;

say $registry.definitions.map({.url});
~~~

##### Classification of secondary objects by name

~~~perl6
use Documentable:ver<2.0.0>;
use Documentable::Registry:ver<2.0.0>;

my $registry = Documentable::Registry.new(
:topdir("doc"),
:dirs(DOCUMENTABLE-DIRS),
:!verbose,
);

$registry.compose;

my %routine-documents = $registry.lookup("routine", :by<kind>).categorize({.name});
my %syntax-documents = $registry.lookup("syntax", :by<kind>).categorize({.name});

say %routine-documents<⊅>;
~~~

<sup>1</sup> Certain kind of headers (`=headn X<>`) too, but there are not logical reason to mark those headers as `Syntax`, so that's needs to be fixed. This behavior is inherited from the old `htmlify.p6`.

<sup>2</sup> Additionally, `=headn X<>` is an indexable object with subkinds its meta part. So, for instance, `=headn X<|foo>`, is a indexable object of kind `Syntax` with subkind set to `('foo')`. This also has to be changed, but once again, this behavior is inherited from `htmlify.pod6`.

<sup>3</sup> The same that happens with <sup>2</sup> and subkinds, also happens with `categories`.
Copy link
Member

@AlexDaniel AlexDaniel Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this document also talk about backward compatibility? (Cool URIs don't change)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so. In addition, backward compatibility is going to be kind of hard because right now URIs are generated very weirdly in some cases. In any case, most URIs are not going to change. What will change (I think) is fragment generation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JJ just told me about this PR. Some comments:

  • Why is the document written in MarkDown and not POD6?
  • Why is the document written specifically for Documentable when discussing the Raku Documentation system? OK I know that I'm a bit biased as I have just completed a different way of rendering the whole of the Raku documentation system using Collection. However, surely, a Raku specification should be renderer neutral, in the same way as Raku is compiler neutral.
  • For the most part, the link problems that I see are that they are not output neutral. For example, if we have a X the target is rendered by Pod::To::HTML as index-entry_Lets_index_this_item. This is not specified.
  • Another problem is that X<B> leads to a target of index-entry_<strong>This_item</strong> which is pure HTML and not MarkDown. The specification should define how the target should be refered to by another file in an output neutral manner.
  • I would like to know what benefits there are to Secondary documents. I think they should be eliminated because they cause fragile links.