Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkgset.FileCache: implement __contains__ for efficiency #267

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions kobo/pkgset.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,9 @@ def __getitem__(self, name):
def __setitem__(self, name, value):
self.file_cache[os.path.abspath(name)] = value

def __contains__(self, item):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say, the mismatch between getitem/setitem and contains is concerning.

Keys are converted into absolute paths in the getter/setter but they are NOT converted during a contains check.
This means the basic contract of "inserting an object into a container means the container contains it" is violated:

from kobo.pkgset import FileCache

fc = FileCache()

fc['/y'] = object()
# Is the thing we just inserted in there?
# Of course it is...
assert '/y' in fc

# How about now?
fc['x'] = object()
# Nope - CRASH! I asked for 'x' but it stored '$PWD/x'
# and did not do the same transformation during 'contains'.
assert 'x' in fc  

I initially thought that your change introduced this problem, however the problem is already present on the current foo in self.file_cache path making use of __iter__, the code just wasn't visible.

I guess we should not attempt to fix that in this change, since it might be backwards-incompatible.
But if you are looking at code using this class, maybe you want to take a look at it with this bug in mind.
Any code using this class and not providing absolute paths as input has the potential to behave very strangely.

return item in self.file_cache

def __iter__(self):
return iter(self.file_cache)

Expand Down
13 changes: 13 additions & 0 deletions tests/test_pkgset.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from __future__ import print_function
import unittest
import unittest.mock

import os
import warnings
Expand Down Expand Up @@ -218,6 +219,18 @@ def test_iter(self):
self.assertTrue(self.file1 in items)
self.assertTrue(self.file2 in items)

def test_contains(self):
open(self.file1, "w").write("hello\n")
open(self.file2, "w").write("hello\n")

self.cache = FileCache()
wrap1 = self.cache.add(self.file1)
wrap2 = self.cache.add(self.file2)

with unittest.mock.patch("kobo.pkgset.FileCache.__iter__", side_effect=ValueError):
self.assertTrue(self.file1 in self.cache)
self.assertTrue(self.file2 in self.cache)

def test_remove_by_file_path(self):
self.test_add_two_different_files()
self.cache.remove(self.file1)
Expand Down
Loading