Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(#2764): caching logic changed, added StHash, StHashTest, fix tests #2771

Closed
wants to merge 2 commits into from

Conversation

Yanich96
Copy link
Contributor

@Yanich96 Yanich96 commented Jan 12, 2024

Closes #2764
Closes #2744
Closes #2743
Closes #2727

The target of tasks was fixed caching logic for XML program files, because the old logic worked with errors. Now caching uses hash code from node '/program/objects', using algorithm MD5.

What was done:

  1. Class StHash was created to add the program hash code to the XML file header.
  2. Class StHashTest was created to check adding the program hash code to the XML file header.
  3. Disabled tests were unlocked in OptCachedTest, OptimizeMojoTest, ShakeMojoTest.
  4. The schema XML was changed to add the program hash code to the XML file header.
  5. In classes EOSintax and PhySintax the hash code is added to the XML file header. Tests, checking it, was added.

PR-Codex overview

This PR focuses on adding a hash attribute to the XML representation of EO programs.

Detailed summary

  • Added hash attribute to the XMIR schema.
  • Added containsHash test for PhiSyntax and EoSyntax classes.
  • Added Xsline dependency.
  • Added StHash class for generating and checking hash codes.
  • Added StHashTest for testing the StHash class.

The following files were skipped due to too many changes: eo-parser/src/main/java/org/eolang/parser/StHash.java, eo-maven-plugin/src/test/java/org/eolang/maven/optimization/OptCachedTest.java

✨ Ask PR-Codex anything about this PR by commenting with /codex {your question}

@Yanich96
Copy link
Contributor Author

@volodya-lombrozo check please

@@ -175,6 +176,17 @@ void parsesDefinition() throws IOException {
);
}

@Test
void containsHash() throws IOException {
MatcherAssert.assertThat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Could you add explanation message, please?

@@ -47,6 +47,7 @@
*
* @since 0.1
*/
@SuppressWarnings("PMD.TooManyMethods")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo Tests failed without it.

* @return String hash of this XML.
* @throws NoSuchAlgorithmException If fails.
*/
public String getHash() throws NoSuchAlgorithmException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe it's better to name this method value()? The class name already mentioned that it is a hash.
Moreover, compound names is code smell: https://www.yegor256.com/2015/01/12/compound-name-is-code-smell.html

*/
private static XML updatedProgram(final XML xml)
throws NoSuchAlgorithmException, ImpossibleModificationException {
final String hash = new StHash.Hash(xml).getHash();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @param xml XML.
* @return XML representation of program.
*/
private static XML updatedProgram(final XML xml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe it's better to leave just program(final XML xml)?
https://www.yegor256.com/2015/01/12/compound-name-is-code-smell.html

@@ -199,9 +205,26 @@ private static XML program(final ZonedDateTime time, final String something) {
.add("program")
.attr("name", "main")
.attr("time", time.format(DateTimeFormatter.ISO_INSTANT))
.add("objects")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you need this row?

ZonedDateTime.parse(time.get()).toInstant().truncatedTo(ChronoUnit.MINUTES)
);
if (Files.exists(path) && hash.isPresent()) {
res = new XMLDocument(path).xpath("/program/@hash").stream().findFirst()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 We use OptCached in order to avoid repeated unnecessary computations and increase the compilation speed. However the solution you provided might make the optimization even longer. Let's me explain.

In order to read /program/@hash from the path you need to read the entire file. This operation is extremely long. I even bet that if you disable caching now, the compilation will be faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo
Yes, but I have 3 points:

  1. File won't be read from disk each time, thanks page cache https://habr.com/ru/companies/smart_soft/articles/228937/
  2. In most languages reading from cache is faster, than compilation. Isn't it true for EO lang?
  3. If reading from cache takes longer, then compilation.. so why the cache exists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96, I conducted a simple benchmark to compare the time spent in OptCache.apply for the returnsFromCacheIfXmlAlreadyInCache test in the master branch (the old solution) and your solution. The results I obtained are as follows:

Test Scenario Master Branch (ms) Your Branch (ms)
returnsFromCacheIfXmlAlreadyInCache Test 3654, 3680, 3693, 3700, 3862 4281, 4602, 4378, 4364, 4504

As you can see, the "page cache" doesn't provide much improvement. Please note that in the test, we are using an extremely small program example (only a few lines). Real programs will be much larger, and the performance gap will likely be more significant.

Below is the test code, where I modified the test to include time measurement.

Master branch:

@Test
void returnsFromCacheIfXmlAlreadyInCache(@TempDir final Path tmp) throws IOException {
    final XML program = OptCachedTest.program(ZonedDateTime.now());
    OptCachedTest.save(tmp, program);
    final long start = System.currentTimeMillis();
    XML apply = null;
    for (int i = 0; i < 10_000; ++i) {
        apply = new OptCached(
            path -> {
                throw new IllegalStateException("This code shouldn't be executed");
            },
            tmp
        ).apply(program);
    }
    final long finish = System.currentTimeMillis();
    Logger.info(
        this,
        "Optimization took %d ms",
        finish - start
    );
    MatcherAssert.assertThat(
        "We expected that the program will be returned from the cache.",
        apply,
        Matchers.equalTo(program)
    );
}

Your branch:

@Test
void returnsFromCacheIfXmlAlreadyInCache(@TempDir final Path tmp)
    throws IOException, ImpossibleModificationException, NoSuchAlgorithmException {
    final XML program = OptCachedTest.updatedProgram(
        OptCachedTest.program(ZonedDateTime.now())
    );
    OptCachedTest.save(tmp, program);
    final long start = System.currentTimeMillis();
    XML apply = null;
    for (int i = 0; i < 10_000; ++i) {
        apply = new OptCached(
            path -> {
                throw new IllegalStateException("This code shouldn't be executed");
            },
            tmp
        ).apply(program);
    }
    final long finish = System.currentTimeMillis();
    Logger.info(
        this,
        "Optimization took %d ms",
        finish - start
    );
    MatcherAssert.assertThat(
        "We expected that the program will be returned from the cache.",
        apply,
        Matchers.equalTo(program)
    );
}

@volodya-lombrozo
Copy link
Member

@Yanich96 As for other questions:

  1. It's true for eo too
  2. Reading from the cache used to be faster compared to optimization. However, it's questionable for your solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants