Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: separate established families from non-established ones #94

Closed

Conversation

mbercx
Copy link
Member

@mbercx mbercx commented May 8, 2021

Fixes #92
Fixes #93

Two small commits that fix the issues related to the workaround we have set up for users which have issues using the automated install because they cannot download the files with these commands. See the commit messages for more info.

Demo that it works:

$ aiida-pseudo install sssp --download-only
Info: downloading selected pseudopotentials archive...  [OK]
Info: downloading selected pseudopotentials metadata...  [OK]
Success: `archive.tar.gz` written to the current directory.
Success: `metadata.json` written to the current directory.
$ ls
archive.tar.gz  metadata.json
$ aiida-pseudo install family -F pseudo.family.sssp -P pseudo.upf archive.tar.gz SSSP/1.1/PBE/efficiency
Info: unpacking archive and parsing pseudos...  [OK]
Success: installed `SSSP/1.1/PBE/efficiency` containing 85 pseudopotentials
$ aiida-pseudo family cutoffs set -s normal -u Ry SSSP/1.1/PBE/efficiency metadata.json 
Success: set cutoffs for `SsspFamily<SSSP/1.1/PBE/efficiency>` with the stringency `normal`.

@mbercx mbercx requested a review from sphuber May 8, 2021 06:52
@mbercx mbercx force-pushed the fix/92/install-family-sssp_pseudodojo branch 3 times, most recently from ad98e2e to c4a0d37 Compare May 8, 2021 07:13
@mbercx
Copy link
Member Author

mbercx commented May 8, 2021

@sphuber if you agree with the implementation I'll add another commit with updates to the documentation.

@sphuber
Copy link
Contributor

sphuber commented May 8, 2021

As I wrote in #92 does this now also work for Pseudo Dojo? I think their metadata files have yet another format. In fact it is not a plain JSON file but is an archive. It needs to be parsed with PseudoDojoFamily.parse_djrepos_from_archive. I am wondering if this is the way to go, because this means we have to start adding family specific code in the generic install, while that specific code is already present in the specific command. And this will only get worse if new standard family types will get added.

So more and more I am starting to think if we shouldn't just add optional flags --archive and --metadata to the family specific install commands to skip downloading them. This will prevent code duplication. We can then even store something in the description of the family to say which file was not downloaded by taken from local disk. Since we also record the md5 in the description we can compare later to the "official" md5s of the archive and metadata files. Of course this is all assuming the user doesn't accidentally or maliciously change the description.

@mbercx
Copy link
Member Author

mbercx commented May 8, 2021

That's fair, I wasn't aware about the different metadata format for pseudo-dojo. Will adjust!

@mbercx
Copy link
Member Author

mbercx commented May 9, 2021

After giving it some more thought, I fully agree with the principle that we should try to make sure that when the user installs a pseudopotential family with established protocols using the CLI, the label, pseudos and recommended cutoffs represent the official ones. To adhere to this, I propose the following:

  • Only a pseudopotential family that is "established" [1] will receive support for automated installs with its own class (e.g. SsspFamily/PseudoDojoFamily) and CLI commands (e.g. install sssp/install pseudo-dojo). These families can only be installed with their supported CLI commands. For users which have trouble with the automated install, we provide the --download-only option (done), as well as the --archive and --metadata options (or maybe merge them into one option with two arguments?) which both have to be specified to install the pseudopotential family. There are strict checks on the format of these files to make sure they correspond to the official ones.
  • Based on the same principle of preserving the integrity of the established pseudopotentials, the family cutoffs set command can not be used to set the recommended cutoffs of an SsspFamily or PseudoDojoFamily. If users want to install the SSSP pseudos with their own set of recommended cutoffs, they will have to install them from the archive using install family as a CutoffsPseudoPotentialFamily.
  • The user can install non-established pseudopotential families with the install family command, and set their recommended cutoffs using family cutoffs set. I think here we can keep the change in c4a0d37, i.e. only filter the cutoffs so the validation doesn't raise for extraneous keys.

If you agree, I'll add these notes to the "Design" section of the tutorial, and tell users to open an issue in case they want to request support for a new set of "established" pseudopotentials.

[1] A pseudopotential family is considered established when it has a set of rigorously tested pseudopotentials with convergence tests which have been published or are publicly available (also see #78).

@sphuber
Copy link
Contributor

sphuber commented May 10, 2021

Fully agree with all of what you wrote in the last comment.

@mbercx mbercx changed the title Fix/92/install family sssp pseudodojo CLI: separate established families from non-established ones May 10, 2021
@mbercx mbercx force-pushed the fix/92/install-family-sssp_pseudodojo branch from c4a0d37 to 98282a8 Compare May 10, 2021 22:50
@mbercx
Copy link
Member Author

mbercx commented May 10, 2021

@sphuber will update test and finalize docs once we are agreed on the implementation. I thought that having a --from-dir option would easier to use than having to specify both --archive and --metadata. However, I do realise that perhaps the user might change the filename when moving the archive and metadata to the machine they want to install the pseudos on, so perhaps it's still best to split the option.

@mbercx mbercx force-pushed the fix/92/install-family-sssp_pseudodojo branch 2 times, most recently from 314101d to e627750 Compare May 11, 2021 13:27
@sphuber
Copy link
Contributor

sphuber commented May 11, 2021

@sphuber will update test and finalize docs once we are agreed on the implementation. I thought that having a --from-dir option would easier to use than having to specify both --archive and --metadata. However, I do realise that perhaps the user might change the filename when moving the archive and metadata to the machine they want to install the pseudos on, so perhaps it's still best to split the option.

My first instinct was also to have --archive and --metadata, although it will make validation of options slightly more involved. For example making sure not any of them are specified in conjunction with --download-only. For flexibility, let's go with the separate flags though. Then we could even think to support specifying just one of the two. Let's say to just download the metadata but take the archive from disk.

My biggest concern though is how we make sure that archives and metadata passed through these flags are actually "correct". I guess that we put their md5 in the description so we could check afterwards. But should we be more restrictive and have a hash table of "correct" md5s? Then again, I am not sure if we are running into the same trap where we are just making life hard for well-meaning citizens.

Also, how would the label of the family be specified when using --archive and --metadata? If the filenames can be arbitrary, we cannot parse it from them. Should the user than use the -v/-f/-p flags to indicate the configuration indirectly? That seems convoluted. But then that would mean we would have to add a parameter to specify the label, but we don't really want to allow anything here but only the canonical correct family names 🤯

Maybe all these problems go away by actually going with the --from-dir suggestion you made and requiring that the filenames are exactly as they would have been generated with the --download-only option. I think this restriction is of no real consequence to the user (I hope) but actually saves us a lot of design problems and questions

@mbercx
Copy link
Member Author

mbercx commented May 11, 2021

Yeah, even with the --from-dir options the user can mess up the version/functional/protocol option and install the pseudo family with the wrong label. If the priority is to try and ensure as well as possible that the integrity of the established families is preserved, and the only reason we have this workaround is to deal with faulty connections for the automated install, perhaps we should have the --download-only option install put the archive and metadata in one install pack that also contains a file that specifies the version/functional/protocol. Then we can have an option --from-download that only accepts such a pack in case it has all the right filenames, and takes the label from there?

@sphuber
Copy link
Contributor

sphuber commented May 12, 2021

Then we can have an option --from-download that only accepts such a pack in case it has all the right filenames, and takes the label from there?

Yes! That must be it. I was thinking that we could simply take the label from the individual filenames since that is the case for the SSSP, but that might not necessarily be the case and is definitely not the case in general (already doesn't hold for PD). It also makes a lot of sense given the reason we put this in to have the --download-only and --from-download pair.

One remaining question is then the format of the downloaded pack that is consumed by --from-download. We cannot use the filename to encode the label into (at least not literally) since the label contains forward slashes. Maybe we could have --download-only create a .tar.gz file with the name {some_name}.aiida_pseudo and inside are all the necessary downloaded files, plus a JSON that includes the md5's of the downloaded files and the values of the CLI that were used in the --download-only call, so version, protocol etc. This way we can reconstruct the label from those values (or maybe just include the label literally in the JSON 🤔 ) and we can validate the integrity of the files.

Paranoia mode engaged: If we really want to prevent people from messing with it, we can encrypt the JSON file so it is not easy to modify the data and checksums in there. For some reason I keep coming back to this idea that we want to make sure the right things gets installed, but if people really want to install an incorrect version they can always circumvent our measures and really that would be their problem. So forget this idea 😄

@mbercx
Copy link
Member Author

mbercx commented May 12, 2021

For some reason I keep coming back to this idea that we want to make sure the right things gets installed, but if people really want to install an incorrect version they can always circumvent our measures and really that would be their problem. So forget this idea 😄

We could still make it more difficult for the user to defile the integrity of the SSSP and Pseudo-Dojo families by blocking the Python API methods for changing the recommended cutoffs. Or, if we want to be more lenient, at least protect the established stringencies. :)

And then we could also encrypt the recommended cutoffs in the database so it's more difficult to do any kind of postgres hack. 😁

@sphuber
Copy link
Contributor

sphuber commented May 12, 2021

And then we could also encrypt the recommended cutoffs in the database so it's more difficult to do any kind of postgres hack

With the source code being openly accessible, this would be easily decrypted though :)

@mbercx
Copy link
Member Author

mbercx commented May 12, 2021

With the source code being openly accessible, this would be easily decrypted though :)

Alright, I guess we'll let them hack the postgres database at their leisure. 😉

Still, having "protected stringencies" for established pseudopotential families/libraries might be a good idea?

@sphuber
Copy link
Contributor

sphuber commented May 14, 2021

Still, having "protected stringencies" for established pseudopotential families/libraries might be a good idea?

Maybe. I couldn't find the details in the discussion above. What would this look like in terms of interface and implementation?

@mbercx
Copy link
Member Author

mbercx commented Jun 11, 2021

Maybe. I couldn't find the details in the discussion above. What would this look like in terms of interface and implementation?

I'll leave this for a different PR, still have to (re-)think about this a bit. I'll make the changes re --from-download now, since this PR has gotten a bit more urgency due to a new use case: Faster installs of pseudopotentials in docker images such as the AiiDAlab docker stack.

Once this is complete I'll open an issue where we can discuss the protected stringencies idea.

mbercx added 3 commits June 11, 2021 07:12
Currently the `family cutoffs set` command requires that the JSON file
that contains the cutoffs _only_ has the cutoff keys specified for each
element. This is because the cutoffs are validated by the
`RecommendedCutoffMixin.validate_cutoffs()` method, which doesn't allow
for extraneous keys.

However, the  `.json`  file could have been adapted by the user from e.g. the
SSSP `metadata.json` or some other dictionary that also contains other keys.
As long as the cutoff keys are defined, we shouldn't raise an error just because
there are other keys present.

Here we first filter the data loaded from the JSON file for the cutoffs,
before passing the dictionary to the `set_cutoffs` method.
One of the motivations of having dedicated and automated CLI commands to
install established pseudpotential families like the SSSP or the
Pseudo-dojo normconserving pseudopotentials is to try and prevent that
the pseudopotentials or recommended cutoffs deviate from the official
ones. However, the `family cutoffs set` command currently allows the
user to change the recommended cutoffs easily through the CLI.

In order to prevent this, we block usage of the command for `SsspFamily`
and `PseudoDojoFamily`'s, by adding an `exclude` input argument to the
`PseudoPotentialFamilyParam`, where the entry points of the classes that
should be disallowed can be provided. This is then used to adapt the
option decorator for the `PSEUDO_POTENTIAL_FAMILY` input argument of
`family cutoffs set`.

Note that the user _can_ still install e.g. the SSSP with their own
recommended cutoffs using the `install family` command as a
`CutoffsPseudoPotentialFamily`. However, the `SsspFamily` class is
reserved for pseudopotentials installed with the automated install
command.
Our workaround for installing an "established" family like a `SsspFamily`
or `PseudoDojoFamily` required using the `install family` command, but
this is explicitly forbidden since we do not want the user to be able to
install an arbitrary set of pseudopotentials as one of these classes.
This means there is currently no way to install the archive and metadata
obtained with the `--download` option of the automated install commands.

Here, we add the `--from-dir` option to the automated install commands
to allow the user to install the downloaded pseudopotentials and set the
recommended cutoffs automatically for an `SsspFamily` or
`PseudoDojoFamily`.
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbercx . Fine to get this merged today but there are a few minor things to touch up. Also I think it should be rebased again because it contains changes to the docs that should already be in master. If you want me to fix these things because you are busy with the tutorial, let me know, I can quickly fix it.

aiida_pseudo/cli/params/types.py Show resolved Hide resolved
aiida_pseudo/cli/params/types.py Show resolved Hide resolved
aiida_pseudo/cli/params/types.py Show resolved Hide resolved
assert sorted(family.get_cutoff_stringencies()) == sorted(['low', 'normal'])

# Invalid cutoffs structure
filepath.write_text(json.dumps({'Ar': {'cutoff_rho': 300, 'cutoff_wfc': 300, 'GME': 'moon'}}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this test. I thought the whole idea of one of the commits was to allow for extraneous keys, like GME in this example, and not have the command except because of it, but simply ignore it. So what is going on here? Why should this command fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems the command was actually failing for a different reason (i.e. not all elements being present). This has now been fixed in #132

@mbercx
Copy link
Member Author

mbercx commented Jun 11, 2021

Oh, wait, I'm still working on adapting the PR for the --from-download option. But I'll already integrate your suggestions if they are still applicable, thanks!

@mbercx mbercx force-pushed the fix/92/install-family-sssp_pseudodojo branch from e627750 to f178ea2 Compare June 11, 2021 15:53
@mbercx
Copy link
Member Author

mbercx commented Jun 11, 2021

@sphuber the PR isn't super urgent: it's probably only a small benefit in speed for the AiiDAlab docker stack deployment. I haven't managed to get to your comments, but have already implemented a first draft of using the --from-download install.

Will fix tests and update documentation once you agree with the implementation. A little demo:

$ aiida-pseudo install sssp --download-only
Info: downloading selected pseudopotentials archive...  [OK]
Info: downloading selected pseudopotentials metadata...  [OK]
$ aiida-pseudo install sssp --from-download SSSP_1.1_PBE_efficiency.aiida_pseudo 
Info: unpacking archive and parsing pseudos...  [OK]
Success: installed `SSSP/1.1/PBE/efficiency` containing 85 pseudopotentials
$ aiida-pseudo install sssp --from-download SSSP_1.1_PBE_efficiency.aiida_pseudo 
Critical: SsspFamily<SSSP/1.1/PBE/efficiency> is already installed

Perfectly fine to continue working on this once I'm back from holiday. :)

@sphuber
Copy link
Contributor

sphuber commented Jun 11, 2021

Perfectly fine to continue working on this once I'm back from holiday. :)

I'm also leaving this weekend, so let's wrap this up when we're both back.

@unkcpz
Copy link
Member

unkcpz commented Jul 27, 2021

Wondering there might be a rare case that --download-only success exit but have incomplete files downloaded. The broken file has the wrong md5 value which will fail the install step. I encounter this somehow with bad network conditions but was unable to reproduce it today.

Do we need to provide a way that the files can be downloaded separately from the browser or by wget in CLI, checking the md5 manually, and then install it.

@sphuber
Copy link
Contributor

sphuber commented Jul 27, 2021

Wondering there might be a rare case that --download-only success exit but have incomplete files downloaded. The broken file has the wrong md5 value which will fail the install step. I encounter this somehow with bad network conditions but was unable to reproduce it today.

Do we need to provide a way that the files can be downloaded separately from the browser or by wget in CLI, checking the md5 manually, and then install it.

I would have also thought that it wouldn't be possible to have the download return successfully but the file actually being corrupt. However, I remember another user that reported this behavior and they also suffered from bad connections. Seems this really is possible then. If we cannot make the requests call more robust or check that it is correct intrinsically, the only solution would be for us to check ourselves but this means we would have to hardcode the expected md5. If the md5 ever were to change of the remote files (which has happened before) then existing versions of aiida-pseudo will be broken. Only solution would be to just print a warning if md5 is incorrect and have user retry the download. We shouldn't let the command fail or prevent from continuing.

@sphuber
Copy link
Contributor

sphuber commented Oct 9, 2022

@mbercx Think something may have gone wrong in your rebasing. Could you have a look and fix this branch before I review it?

@mbercx
Copy link
Member Author

mbercx commented Oct 10, 2022

@sphuber I'm splitting up this PR in separate ones. Still have to open the final PR that adds the --from-download option, but I'll already close this one because it will have been superseded by those separate PRs.

@mbercx mbercx closed this Oct 10, 2022
@mbercx mbercx deleted the fix/92/install-family-sssp_pseudodojo branch September 3, 2023 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants