Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ItemProvider’s Response dependency #151

Closed
wants to merge 5 commits into from

Conversation

Gallaecio
Copy link
Member

Resolves #150.

@Gallaecio Gallaecio requested review from kmike and wRAR June 15, 2023 11:57
@Gallaecio
Copy link
Member Author

I’ve opened a separate PR for the docs fix.

@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #151 (cad58fb) into master (aba2b74) will decrease coverage by 0.48%.
The diff coverage is 85.71%.

❗ Current head cad58fb differs from pull request most recent head b41b306. Consider uploading reports for the commit b41b306 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #151      +/-   ##
==========================================
- Coverage   85.85%   85.38%   -0.48%     
==========================================
  Files          14       14              
  Lines         813      821       +8     
==========================================
+ Hits          698      701       +3     
- Misses        115      120       +5     
Files Coverage Δ
scrapy_poet/page_input_providers.py 95.58% <100.00%> (-4.42%) ⬇️
scrapy_poet/api.py 96.00% <80.00%> (-4.00%) ⬇️

... and 6 files with indirect coverage changes

"""
# The fact that no exception is raised below proves that a Response
# parameter is not required by ItemProvider.
provider(set(), request)
Copy link
Member

@kmike kmike Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Are there existing tests which ensures that

  1. the original issue is fixed, and
  2. some potential new issues don't appear?

Regarding (2), I was thinking about the following:

class MySpider(scrapy.Spider):
    def parse(self, response: DummyResponse, item: Product):
        # ....


@handle_urls("example.com")
class MyPage(ItemPage[Product]):
    response: HttpResponse

i.e. we start to pass DummyResponse to provider, and response is not used by the callback, but a real response is needed to create a page object which returns an item.

Copy link
Member Author

@Gallaecio Gallaecio Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so you definitely picked up on an issue, and the “solution” I came up with is getting messy, so I would like to discuss it before I move forward further with it, because I might be missing a better solution.

The “solution” consists of having 2 separate item provider classes, one for responseless items and one for responseful items.

Things get more complicated, though. To properly determine if an item needs a response, we need to get the page object for the item, and then check if the dependencies of that page object (which might include other items) are provided by a provider that requires a response. Moreover, we need to take the request (URL) into account, as that can determine which page object is used for an item.

To be honest, it kind of feels like there should be no item provider, just as there is no page object provider, and instead item resolution should be moved closer to the core, and work the same as page object resolution, by somehow making andi realize how to resolve item dependencies. But I am not very familiar with the code base, and I am afraid of wasting too much time exploring in that direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it kind of feels like there should be no item provider, just as there is no page object provider, and instead item resolution should be moved closer to the core, and work the same as page object resolution, by somehow making andi realize how to resolve item dependencies.

Yeah, that would solve scrapy-plugins/scrapy-zyte-api#91 automatically (AFAIK).

@Gallaecio
Copy link
Member Author

Unnecessary thanks to #175

@Gallaecio Gallaecio closed this Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ItemProvider always requires Response
3 participants