Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adiciona novos 8 raspadores do Maranhão #1283

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

trevineju
Copy link
Member

As spiders adicionadas nesta PR foram criadas com um script criador de código, por isso toda essa PR deve ser revisada em todos os detalhes e também ser testada.

Fica a cargo da pessoa revisora verificar a lista de validações para contribuições.

@jjpaulo2
Copy link

jjpaulo2 commented Sep 25, 2024

@trevineju vou fazer essa revisão.

@jjpaulo2
Copy link

jjpaulo2 commented Sep 25, 2024

A princípio, todos rodaram corretamente, com exceção dos ma_pindare_mirim e ma_vila_nova_dos_martirios que deram erro apenas no scrapping do registro mais recente.

Tô dando uma debugada pra entender o que pode ser.

2024-09-25 14:08:05 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.vilanovadosmartirios.ma.gov.br/diariooficial.php?dtini=24/09/2024&dtfim=25/09/2024> (referer: None)
Traceback (most recent call last):
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/utils/defer.py", line 279, in iter_errback
    yield next(it)
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/utils/python.py", line 350, in __next__
    return next(self.data)
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/utils/python.py", line 350, in __next__
    return next(self.data)
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
    for r in iterable:
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/spidermiddlewares/offsite.py", line 28, in <genexpr>
    return (r for r in result or () if self._filter(r, spider))
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
    for r in iterable:
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/spidermiddlewares/referer.py", line 352, in <genexpr>
    return (self._set_referer(r, response) for r in result or ())
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
    for r in iterable:
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/spidermiddlewares/urllength.py", line 27, in <genexpr>
    return (r for r in result or () if self._filter(r, spider))
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
    for r in iterable:
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/spidermiddlewares/depth.py", line 31, in <genexpr>
    return (r for r in result or () if self._filter(r, response, spider))
  File "/Users/jusbrasil/.virtualenvs/querido-diario-dvpf/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
    for r in iterable:
  File "/Users/jusbrasil/dev/open-source/querido-diario/data_collection/gazette/spiders/base/adiarios_v1.py", line 26, in parse_pagination
    last_page_number = self.get_last_page_number(response)
  File "/Users/jusbrasil/dev/open-source/querido-diario/data_collection/gazette/spiders/base/adiarios_v1.py", line 75, in get_last_page_number
    last_page_index = max(page_numbers)
ValueError: max() arg is an empty sequence

Registros

ma_matoes_do_norte_complete.csv
ma_matoes_do_norte_complete.log
ma_matoes_do_norte_interval.csv
ma_matoes_do_norte_interval.log
ma_matoes_do_norte_yesterday.csv
ma_matoes_do_norte_yesterday.log
ma_paco_do_lumiar_complete.csv
ma_paco_do_lumiar_complete.log
ma_paco_do_lumiar_interval.csv
ma_paco_do_lumiar_interval.log
ma_paco_do_lumiar_yesterday.csv
ma_paco_do_lumiar_yesterday.log
ma_pedreiras_complete.csv
ma_pedreiras_complete.log
ma_pedreiras_interval.csv
ma_pedreiras_interval.log
ma_pedreiras_yesterday.csv
ma_pedreiras_yesterday.log
ma_pindare_mirim_complete.csv
ma_pindare_mirim_complete.log
ma_pindare_mirim_interval.csv
ma_pindare_mirim_interval.log
ma_pindare_mirim_yesterday.csv
ma_pindare_mirim_yesterday.log
ma_santa_luzia_do_parua_complete.csv
ma_santa_luzia_do_parua_complete.log
ma_santa_luzia_do_parua_interval.csv
ma_santa_luzia_do_parua_interval.log
ma_santa_luzia_do_parua_yesterday.csv
ma_santa_luzia_do_parua_yesterday.log
ma_trizidela_do_vale_complete.csv
ma_trizidela_do_vale_complete.log
ma_trizidela_do_vale_interval.csv
ma_trizidela_do_vale_interval.log
ma_trizidela_do_vale_yesterday.csv
ma_trizidela_do_vale_yesterday.log
ma_vargem_grande_complete.csv
ma_vargem_grande_complete.log
ma_vargem_grande_interval.csv
ma_vargem_grande_interval.log
ma_vargem_grande_yesterday.csv
ma_vargem_grande_yesterday.log
ma_vila_nova_dos_martirios_complete.csv
ma_vila_nova_dos_martirios_complete.log
ma_vila_nova_dos_martirios_interval.csv
ma_vila_nova_dos_martirios_interval.log
ma_vila_nova_dos_martirios_yesterday.csv
ma_vila_nova_dos_martirios_yesterday.log

Copy link

@jjpaulo2 jjpaulo2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversando com @trevineju vimos que esse tipo de erro que ocorreu em ma_pindare_mirim e ma_vila_nova_dos_martirios é esperado pois nem sempre vai haver diário para alguns municípios em datas específicas.

Dado isso, tudo ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: novo
Development

Successfully merging this pull request may close these issues.

2 participants