Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec this Pod cache #1

Open
3 tasks
JJ opened this issue May 6, 2020 · 12 comments
Open
3 tasks

Spec this Pod cache #1

JJ opened this issue May 6, 2020 · 12 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@JJ
Copy link
Owner

JJ commented May 6, 2020

The main intention of this cache is to be a plug-compatible replacement to Pod::To::Cached in Documentable. Since the Raku 2020.02 version, it stopped working, which led to opening this other issue in Pod::To::Cached where @niner showed us a different, and much better, way to deal with the cache.

Actually, these issues are unrelated with each other, kinda, but one thing led to the other, and Documentable has more than 40 open issues that could use a simple interface to document caches.

That is what I intend to use here.
But first, let's spec what we really use in Documentable from Pod::To::Cached:

  • .new to create a new cache in a particular directory, or load it. It returns a cache object.
  • .pod adds a new file to the cache, the equivalent of "installing" it using CUR for traditional modules.
  • .list-files shows a list of files actually in the cache.
  • .update-cache updates the cache by loading only new modules.

Mi understanding is that the last one is not going to be necessary, and that .pod, which returns an array with the compiled pod, is all we're going to need.

So what we will do is to

  • Create a Role with this interface, so that other plug-in replacements can be built for this.
  • Create a Class that does that role.
  • Reuse all tests from Pod::To::Cache that involve those three methods.
@JJ JJ added help wanted Extra attention is needed question Further information is requested labels May 6, 2020
antoniogamiz added a commit that referenced this issue May 10, 2020
@antoniogamiz
Copy link
Collaborator

So after adding the first approach to the implementation we have almost all the desired functionality. It cache the pods and recompile them when a pod is changed. You can check this with the following script:

use Pod::To::Cache;

constant DIR="t/test-doc/";
my $cache = Pod::To::Cache.new(:dir(DIR~".precomp"));

$cache.pod(:pod-file-path(DIR~"HomePage.pod6"));

And running:

# the pod is not compiled
export RAKUDO_MODULE_DEBUG=1 && raku -Ilib test.p6 &> first-execution.log
# the compiled version is used
export RAKUDO_MODULE_DEBUG=1 && raku -Ilib test.p6 &> second-execution.log
# the pod needs to be recompiled
export RAKUDO_MODULE_DEBUG=1 && raku -Ilib test.p6 &> third-execution.log

A result of that commands can be found here.

  • In the first execution, the pod needs to be compiled, see L99-L107

  • In the second execution, the pod is already compiled so we only need to load it, see L54-L58

  • In the third execution, we have modified the pod file so it needs to be recompile (this is done automatically by the CompUnit!!), see L99-L111.

@antoniogamiz
Copy link
Collaborator

I'm still trying to figure out how to get a list of names from a CompUnit::PrecompilationRepository::Default object. If there's not a way we will need to store the names in a file. Or maybe we do not need that feature, I am not sure at this moment.

If we only need to know a specific file exists in the cache we only need to check the Handle object, so it would be quite straightforward.

@JJ
Copy link
Owner Author

JJ commented May 10, 2020 via email

@JJ
Copy link
Owner Author

JJ commented May 10, 2020 via email

@antoniogamiz
Copy link
Collaborator

Great job, Antonio. Alternatively, we can use the same format and file that is used by CUR. Not sure how that's created...
-- JJ

Mmm and what format is that? I had though a basic text file with the names separated by "\n". CompUnit does not store any information about the names of the compiled files.

@niner
Copy link

niner commented May 10, 2020 via email

@JJ
Copy link
Owner Author

JJ commented May 10, 2020 via email

@JJ
Copy link
Owner Author

JJ commented May 10, 2020

@niner the use case now is that we need simply to know which files have changed, so that we only update the cache for those files. We could simply run over the whole file set and load every one of them, but that would probably be slower than just checking a file. Even simply a list of files will avoid the slow step of going over the filesystem and checking every single file.

@JJ
Copy link
Owner Author

JJ commented May 10, 2020

@niner
Copy link

niner commented May 10, 2020

No, that accesses a CompUnit::Repository::Installation's data base.

I don't see why you would need to enumerate the cache contents to check for changed files. You may as well enumerate the source files and use that list to check against the cache.

I don't think just loading all the files would be all that slow however. MoarVM does lazy deserialization, so loading a bytecode file doesn't actually do all that much. Most of the time will be spent on locating the precomp file and checking if it's still up to date - which you will need to do anyway.

If you really insist on duplicating the functionality to check up-to-dateness by yourself, you can still enumerate the source files, create the checksums and compare against the value in the CompUnit::PrecompilationUnit's .checksum. But I really don't think it's worth the trouble.

@JJ
Copy link
Owner Author

JJ commented May 10, 2020 via email

@finanalyst
Copy link

@niner I wrote Pod::To::Cached after trying to discuss these issues with you two years ago. And in particular the use of timestamps.
P2C is a hack and time has caught up with it.

Here is more detail about the use case.

  • We have some 1500 files in the Pod docs directory . It takes a considerable amount of time to compile all of the files.
  • Once compiled, it takes a shorter time to render the compiled pod tree to HTML. So caching the compiled Pod sources is essential.
  • There are regular small changes to some of them, but not all of them. Let's call this Taint, a tainted file has been altered.
  • Sometimes, one file is split into two, sometimes the name of a file is changed, sometimes the file is moved from one directory to another. Lets call the path/filename = Name
  • New files can be added.
  • We can detect when a Pod source has changed by looking at its timestamp. This is an attribute kept by the operating system and so is available very quickly.
  • It is therefore very rapid to get an ennumeration of all the Pod sources Names and their datestamps. Presumably the Precomp code calculates the checksum of the input ?
  • The following cases are possible. Here State could be either a timestamp, or a checksum.
    a) The Pod source exists in the cache Cache(Name) == Source(Name), Cache(State) == Source(State) -- leave cache unchanged.
    b) The Pod source has changed since being Cached, Cache(Name) == Source(Name), Cache(State) != Source(State).
    (b1) The New Source compiles without error. Cache is renewed.
    (b2) The new Pod source does not compile! So leave the existing Pod Source in the Cache, issue an error.
    c) The Pod source does not exist in cache, so add to cache.
    e) There are Cache(Name)'s that no longer match Source(Names) because the filename has been changed, or the file has been moved from one directory to another). Garbage collection.

The last point is the main reason that cache ennumeration is needed. If it is accepted that there is no garbage collection, then the cache will need to be purged and rebuilt periodically, so as to prevent the accumulation of orphan elements in the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants