Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure ip_country #494

Open
dnwiebe opened this issue Aug 26, 2024 · 0 comments
Open

Restructure ip_country #494

dnwiebe opened this issue Aug 26, 2024 · 0 comments

Comments

@dnwiebe
Copy link
Collaborator

dnwiebe commented Aug 26, 2024

Background

At the moment, all the ip_country logic is contained in the ip_country crate. This seems to make logical sense, but it represents a potential problem.

The reason for the masq_lib crate is to get all our dependencies flowing in one direction: masq_lib doesn't depend on anything else in the codebase, and everything else in the codebase (if necessary) depends on masq_lib and on nothing else in the codebase. This keeps us from developing circular dependencies.

However, at the moment there's a direct dependency from node to ip_country. That shouldn't be.

ip_country consists of two kinds of production code: the kind that serializes the CSV file from DBIP into the source file containing the IPv4 and IPv6 compressed datastores, and the kind that uses that data for daily Node operations. The first kind of code is outside the Node, and should be run only by developers and only during the build process--and then only if the compressed datastores don't already exist. The second kind of code is shipped with the Node and run by users. This is a fairly clear-cut distinction.

However, the tests are a bit trickier, because tests of the deserialize code use the serialize logic to create something to deserialize, and tests of the serialize code use the deserialize logic to assert on the serialization.

Task

Find a reasonable way to arrange the ip_country code so that dependencies link only from subprojects to masq_lib, but all the existing ip_country tests still run--or are replaced by tests that test the same things.

Suggestions

  1. The generated dbip_country.rs file should not be part of the GitHub repo. Whenever it's needed, build scripts should download the CSV from DBIP and generate the file: however, this should only happen when it's necessary. We don't need to spend extra build time regenerating a file that will turn out exactly the same as the one it's replacing, and we shouldn't pester DBIP with large-volume downloads any more often than necessary.
  2. If the absence of dbip_country.rs causes compilation problems, we can put a version of dbip_country.rs in the GitHub repository that contains no data, but only functions that panic when called, where the panic message explains how to generate the code containing the real data that will overwrite it.
  3. It's not clear yet whether downloading 25MB from DBIP every time an Actions job runs is un-neighborly or not. If it turns out to be, we should investigate installing a regular (every six months? Three?) download from DBIP to some datastore we control (S3?) and have our Actions jobs and local builds download from that instead.
@kauri-hero kauri-hero transferred this issue from MASQ-Project/MASQ-Node-issues Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant