-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove ItemProvider’s Response dependency #151
Conversation
I’ve opened a separate PR for the docs fix. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #151 +/- ##
==========================================
- Coverage 85.85% 85.38% -0.48%
==========================================
Files 14 14
Lines 813 821 +8
==========================================
+ Hits 698 701 +3
- Misses 115 120 +5
|
tests/test_providers.py
Outdated
""" | ||
# The fact that no exception is raised below proves that a Response | ||
# parameter is not required by ItemProvider. | ||
provider(set(), request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Are there existing tests which ensures that
- the original issue is fixed, and
- some potential new issues don't appear?
Regarding (2), I was thinking about the following:
class MySpider(scrapy.Spider):
def parse(self, response: DummyResponse, item: Product):
# ....
@handle_urls("example.com")
class MyPage(ItemPage[Product]):
response: HttpResponse
i.e. we start to pass DummyResponse to provider, and response is not used by the callback, but a real response is needed to create a page object which returns an item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so you definitely picked up on an issue, and the “solution” I came up with is getting messy, so I would like to discuss it before I move forward further with it, because I might be missing a better solution.
The “solution” consists of having 2 separate item provider classes, one for responseless items and one for responseful items.
Things get more complicated, though. To properly determine if an item needs a response, we need to get the page object for the item, and then check if the dependencies of that page object (which might include other items) are provided by a provider that requires a response. Moreover, we need to take the request (URL) into account, as that can determine which page object is used for an item.
To be honest, it kind of feels like there should be no item provider, just as there is no page object provider, and instead item resolution should be moved closer to the core, and work the same as page object resolution, by somehow making andi realize how to resolve item dependencies. But I am not very familiar with the code base, and I am afraid of wasting too much time exploring in that direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it kind of feels like there should be no item provider, just as there is no page object provider, and instead item resolution should be moved closer to the core, and work the same as page object resolution, by somehow making andi realize how to resolve item dependencies.
Yeah, that would solve scrapy-plugins/scrapy-zyte-api#91 automatically (AFAIK).
Unnecessary thanks to #175 |
Resolves #150.