Spec this Pod cache #1

JJ · 2020-05-06T15:31:40Z

The main intention of this cache is to be a plug-compatible replacement to Pod::To::Cached in Documentable. Since the Raku 2020.02 version, it stopped working, which led to opening this other issue in Pod::To::Cached where @niner showed us a different, and much better, way to deal with the cache.

Actually, these issues are unrelated with each other, kinda, but one thing led to the other, and Documentable has more than 40 open issues that could use a simple interface to document caches.

That is what I intend to use here.
But first, let's spec what we really use in Documentable from Pod::To::Cached:

.new to create a new cache in a particular directory, or load it. It returns a cache object.
.pod adds a new file to the cache, the equivalent of "installing" it using CUR for traditional modules.
.list-files shows a list of files actually in the cache.
.update-cache updates the cache by loading only new modules.

Mi understanding is that the last one is not going to be necessary, and that .pod, which returns an array with the compiled pod, is all we're going to need.

So what we will do is to

Create a Role with this interface, so that other plug-in replacements can be built for this.
Create a Class that does that role.
Reuse all tests from Pod::To::Cache that involve those three methods.

The text was updated successfully, but these errors were encountered:

antoniogamiz · 2020-05-10T11:26:18Z

So after adding the first approach to the implementation we have almost all the desired functionality. It cache the pods and recompile them when a pod is changed. You can check this with the following script:

use Pod::To::Cache;

constant DIR="t/test-doc/";
my $cache = Pod::To::Cache.new(:dir(DIR~".precomp"));

$cache.pod(:pod-file-path(DIR~"HomePage.pod6"));

And running:

# the pod is not compiled
export RAKUDO_MODULE_DEBUG=1 && raku -Ilib test.p6 &> first-execution.log
# the compiled version is used
export RAKUDO_MODULE_DEBUG=1 && raku -Ilib test.p6 &> second-execution.log
# the pod needs to be recompiled
export RAKUDO_MODULE_DEBUG=1 && raku -Ilib test.p6 &> third-execution.log

A result of that commands can be found here.

In the first execution, the pod needs to be compiled, see L99-L107
In the second execution, the pod is already compiled so we only need to load it, see L54-L58
In the third execution, we have modified the pod file so it needs to be recompile (this is done automatically by the CompUnit!!), see L99-L111.

antoniogamiz · 2020-05-10T11:29:07Z

I'm still trying to figure out how to get a list of names from a CompUnit::PrecompilationRepository::Default object. If there's not a way we will need to store the names in a file. Or maybe we do not need that feature, I am not sure at this moment.

If we only need to know a specific file exists in the cache we only need to check the Handle object, so it would be quite straightforward.

JJ · 2020-05-10T11:31:57Z

Great job, Antonio. Alternatively, we can use the same format and file that is used by CUR. Not sure how that's created... El dom., 10 may. 2020 a las 13:29, Antonio (<[email protected]>) escribió:

…

I'm still trying to figure out how to get a list of names from a CompUnit::PrecompilationRepository::Default object. If there's not a way we will need to store the names in a file. Or maybe we do not need that feature, I am not sure at this moment. If we only need to know a specific file exists in the cache we only need to check the Handle object, so it would be quite straightforward. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAAD5G5ZLGNJBMWW4ET4ZTRQ2FY7ANCNFSM4M2ROIZA> .

-- JJ

JJ · 2020-05-10T11:32:20Z

We need that feature to list the files that are currently in the cache, it's part of the spec. El dom., 10 may. 2020 a las 13:31, JJ Merelo (<[email protected]>) escribió:

…

Great job, Antonio. Alternatively, we can use the same format and file that is used by CUR. Not sure how that's created... El dom., 10 may. 2020 a las 13:29, Antonio ***@***.***>) escribió: > I'm still trying to figure out how to get a list of names from a > CompUnit::PrecompilationRepository::Default object. If there's not a way > we will need to store the names in a file. Or maybe we do not need that > feature, I am not sure at this moment. > > If we only need to know a specific file exists in the cache we only need > to check the Handle object, so it would be quite straightforward. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#1 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAAAD5G5ZLGNJBMWW4ET4ZTRQ2FY7ANCNFSM4M2ROIZA> > . > -- JJ

-- JJ

antoniogamiz · 2020-05-10T12:30:24Z

Great job, Antonio. Alternatively, we can use the same format and file that is used by CUR. Not sure how that's created...
-- JJ

Mmm and what format is that? I had though a basic text file with the names separated by "\n". CompUnit does not store any information about the names of the compiled files.

niner · 2020-05-10T12:42:58Z

On Sonntag, 10. Mai 2020 13:32:31 CEST Juan Julián Merelo Guervós wrote: We need that feature to list the files that are currently in the cache, it's part of the spec.

But what would that be used for? A cache is just a way to speed things up. You usually know what you hope to find in the cache. Few cache implementations let you enumerate cache keys as there's just no good use cases for this and it makes the implementation much harder and/or creates performance issues.

JJ · 2020-05-10T15:28:40Z

It's just the way we do it now. We can probably make do without it, but we'd have to refactor Documentable, which is something we'd like to avoid for the time being...

JJ · 2020-05-10T15:59:19Z

@niner the use case now is that we need simply to know which files have changed, so that we only update the cache for those files. We could simply run over the whole file set and load every one of them, but that would probably be slower than just checking a file. Even simply a list of files will avoid the slow step of going over the filesystem and checking every single file.

JJ · 2020-05-10T16:05:43Z

@niner can we use something like this? https://github.com/ugexe/zef/blob/7460e526f8facd35d8864fd4f2e1d557afcc6a04/lib/Zef/Client.pm6#L664-L676

niner · 2020-05-10T16:12:16Z

No, that accesses a CompUnit::Repository::Installation's data base.

I don't see why you would need to enumerate the cache contents to check for changed files. You may as well enumerate the source files and use that list to check against the cache.

I don't think just loading all the files would be all that slow however. MoarVM does lazy deserialization, so loading a bytecode file doesn't actually do all that much. Most of the time will be spent on locating the precomp file and checking if it's still up to date - which you will need to do anyway.

If you really insist on duplicating the functionality to check up-to-dateness by yourself, you can still enumerate the source files, create the checksums and compare against the value in the CompUnit::PrecompilationUnit's .checksum. But I really don't think it's worth the trouble.

JJ · 2020-05-10T16:19:18Z

El dom., 10 may. 2020 a las 18:12, niner (<[email protected]>) escribió:

No, that accesses a CompUnit::Repository::Installation's data base.

But that's kinda my point. Wouldn't having such database speed up things even a little bit?

I don't think just loading all the files would be all that slow however. MoarVM does lazy deserialization, so loading a bytecode file doesn't actually do all that much. Most of the time will be spent on locating the precomp file and checking if it's still up to date - which you will need to do anyway.

That's not so much what worries me, it's the physical act of recursively entering the directories and locating all files. It's probably not terribly slow by itself, but it's still 500 files to locate.

If you really insist on duplicating the functionality to check up-to-dateness by yourself, you can still enumerate the source files, create the checksums and compare against the value in the CompUnit::PrecompilationUnit's .checksum. But I really don't think it's worth the trouble.

Well, that's what I was thinking about doing, but it's the part of "enumerating the source files" what could be slower than we wished. Although come to think of it, we still need to check for new files even if we store a list to check for new files... So I see your point now. We don't need that function, we don't need that list... Thanks!

finanalyst · 2020-06-13T23:08:15Z

@niner I wrote Pod::To::Cached after trying to discuss these issues with you two years ago. And in particular the use of timestamps.
P2C is a hack and time has caught up with it.

Here is more detail about the use case.

We have some 1500 files in the Pod docs directory . It takes a considerable amount of time to compile all of the files.
Once compiled, it takes a shorter time to render the compiled pod tree to HTML. So caching the compiled Pod sources is essential.
There are regular small changes to some of them, but not all of them. Let's call this Taint, a tainted file has been altered.
Sometimes, one file is split into two, sometimes the name of a file is changed, sometimes the file is moved from one directory to another. Lets call the path/filename = Name
New files can be added.
We can detect when a Pod source has changed by looking at its timestamp. This is an attribute kept by the operating system and so is available very quickly.
It is therefore very rapid to get an ennumeration of all the Pod sources Names and their datestamps. Presumably the Precomp code calculates the checksum of the input ?
The following cases are possible. Here State could be either a timestamp, or a checksum.
a) The Pod source exists in the cache Cache(Name) == Source(Name), Cache(State) == Source(State) -- leave cache unchanged.
b) The Pod source has changed since being Cached, Cache(Name) == Source(Name), Cache(State) != Source(State).
(b1) The New Source compiles without error. Cache is renewed.
(b2) The new Pod source does not compile! So leave the existing Pod Source in the Cache, issue an error.
c) The Pod source does not exist in cache, so add to cache.
e) There are Cache(Name)'s that no longer match Source(Names) because the filename has been changed, or the file has been moved from one directory to another). Garbage collection.

The last point is the main reason that cache ennumeration is needed. If it is accepted that there is no garbage collection, then the cache will need to be purged and rebuilt periodically, so as to prevent the accumulation of orphan elements in the cache.

JJ added help wanted Extra attention is needed question Further information is requested labels May 6, 2020

antoniogamiz added a commit that referenced this issue May 10, 2020

Add some files for testing, #1

27614fb

antoniogamiz added a commit that referenced this issue May 10, 2020

Add first approach to cache implementation #1

954f638

antoniogamiz added a commit that referenced this issue May 10, 2020

Fix nqp dependency #1

da0168e

antoniogamiz added a commit that referenced this issue May 10, 2020

Replace test pod set by the one use by Pod::To::Cached #1

775217a

antoniogamiz added a commit that referenced this issue May 10, 2020

Add basic info to META6 #1

e83e386

antoniogamiz added a commit that referenced this issue May 10, 2020

Add some tests from Pod::To::Cached #1

debbe42

antoniogamiz added a commit that referenced this issue May 10, 2020

Add setup for correct filenames and directories #1

0bd8fb9

antoniogamiz added a commit that referenced this issue May 10, 2020

Add File::Directory::Tree to META6 #1

ebe0cf8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec this Pod cache #1

Spec this Pod cache #1

JJ commented May 6, 2020

antoniogamiz commented May 10, 2020

antoniogamiz commented May 10, 2020

JJ commented May 10, 2020 via email

JJ commented May 10, 2020 via email

antoniogamiz commented May 10, 2020

niner commented May 10, 2020 via email

JJ commented May 10, 2020 via email

JJ commented May 10, 2020

JJ commented May 10, 2020

niner commented May 10, 2020

JJ commented May 10, 2020 via email

finanalyst commented Jun 13, 2020

Spec this Pod cache #1

Spec this Pod cache #1

Comments

JJ commented May 6, 2020

antoniogamiz commented May 10, 2020

antoniogamiz commented May 10, 2020

JJ commented May 10, 2020 via email

JJ commented May 10, 2020 via email

antoniogamiz commented May 10, 2020

niner commented May 10, 2020 via email

JJ commented May 10, 2020 via email

JJ commented May 10, 2020

JJ commented May 10, 2020

niner commented May 10, 2020

JJ commented May 10, 2020 via email

finanalyst commented Jun 13, 2020