Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanups from #30 #38

Merged
merged 9 commits into from
Dec 8, 2024
Merged

Cleanups from #30 #38

merged 9 commits into from
Dec 8, 2024

Conversation

jurraca
Copy link
Contributor

@jurraca jurraca commented Nov 20, 2024

Following up on #30:

  • Make dict indexed by IP version
  • Subnet indexes are both integers
  • refactor some of the logic on inclusion
  • rename included function to contains_row
  • pull out the extra file to dataframe logic into its own function
  • add a basic test for the BaseNetworkIndex class
  • address pylint stuff

Copy link
Collaborator

@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactorings look very good already, thanks!

But I am not sure how to execute the test and I think we could use some documentation on that. Maybe you add a small README in the tests folder to describe how to execute the tests. Here is what I get when I run pytest tests/merge_base_class_test.py from the project root dir:

_______________________________________________________________________________________________________ ERROR collecting tests/merge_base_class_test.py _______________________________________________________________________________________________________
ImportError while importing test module '/Users/XXX/projects/python/kartograf/tests/merge_base_class_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/merge_base_class_test.py:3: in <module>
    from kartograf.merge import BaseNetworkIndex
E   ModuleNotFoundError: No module named 'kartograf'

def test_base_dict_create():
base = BaseNetworkIndex()
state = base.get()
assert state == {4: {}, 6: {}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is a bit pedantic but I don't think it's great testing style to expose the internals. Ideally the tests would check the internal through the interface we need and we wouldn't need a special function (get) to expose the internals. It's fine for me to keep it for now but I would consider it a code smell if we can not refactor the internals without changing the test. My alternative suggestion would be to randomly check 10 keys of IPv4 and 10 keys of the IPv6 dict via the inclusion function and ensure that no result is ever returned. For IPv4 it could even be all 255 and wouldn't take much time either.

Copy link
Contributor Author

@jurraca jurraca Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not pedantic, thanks! I modified the test to check for one IPv4 row, the logic is equally valid for 1 or 10 rows I think. Still need to expand it to cover IPv6 though.

state = base.get()
network = "10.10.0.0/16"
base.update(network)
assert state[4][10]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assert here on the actual value we expect?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And similar to what I wrote above the test could work with adding the network and then checking inclusion without looking at the internals.

@jurraca jurraca force-pushed the merge-cleanup branch 2 times, most recently from 3113502 to 8c476bc Compare November 25, 2024 11:29
@jurraca
Copy link
Contributor Author

jurraca commented Nov 25, 2024

Thanks. Added pytest to dev dependencies, and the tests/__init__.py empty file it needs to find the project modules, as well as a note in the README.

@fjahr
Copy link
Collaborator

fjahr commented Nov 26, 2024

I tried to reproduce the results of the latest map in asmap-data with 8c476bc but I got this error:

--- Merging RPKI and IRR data ---

Parse base file to dictionary
Traceback (most recent call last):
  File "/Users/FJ/projects/python/kartograf/./run", line 95, in <module>
    Kartograf.map(args)
  File "/Users/FJ/projects/python/kartograf/kartograf/kartograf.py", line 83, in map
    merge_irr(context)
  File "/Users/FJ/projects/python/kartograf/kartograf/timed.py", line 10, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/FJ/projects/python/kartograf/kartograf/merge.py", line 71, in merge_irr
    general_merge(
  File "/Users/FJ/projects/python/kartograf/kartograf/merge.py", line 147, in general_merge
    base.update(pfx)
  File "/Users/FJ/projects/python/kartograf/kartograf/merge.py", line 36, in update
    root_net = int(str(pfx).split(":", maxsplit=1)[0])
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '2a11'

@fjahr
Copy link
Collaborator

fjahr commented Nov 26, 2024

An idea for another test based on this: You could extract all the first numbers from the IPv6 space in a recent run and use that as a testing body for the IPv6 logic where IPv4 is using all numbers exhaustively (or generates something random).

@jurraca
Copy link
Contributor Author

jurraca commented Nov 27, 2024

I tried to reproduce the results of the latest map in asmap-data with 8c476bc but I got this error:

🤦‍♂️ my bad, fixed

An idea for another test based on this:

The problem with randomizing inputs is if there's a problem with the test logic it results in a flaky test instead of straight failing. I was looking at a library like hypothesis which might be good for generating all the test cases for networks.

I added tests for format_pfx and get_root_network to test the handling of IP addr/network inputs, though the overall flow of how we catch invalid prefixes or network seems to need more followup work. If we test the functions that wrap ipaddress.ip_network(net) and catch errors, it should be a good start to coverage.

@fjahr
Copy link
Collaborator

fjahr commented Dec 8, 2024

The problem with randomizing inputs is if there's a problem with the test logic it results in a flaky test instead of straight failing. I was looking at a library like hypothesis which might be good for generating all the test cases for networks.

I am not suggesting to randomize any inputs. Just taking all the numbers from a recent run an make a test with these that is completely static, it would just be a pretty exhaustive fixture file that should catch regressions like the previous. The fixture file doesn't have to be updated often, maybe once every few months or just whenever someone volunteers to do it.

@fjahr
Copy link
Collaborator

fjahr commented Dec 8, 2024

tACK

All looks good to me now, I did extensive testing, reproduced multiple recent runs and did two new runs.

@fjahr fjahr merged commit 7c67bb7 into asmap:master Dec 8, 2024
1 check passed
@jurraca jurraca deleted the merge-cleanup branch December 9, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants