Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit b95f6c9
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 05:18:37 2024 +0200

    Update config.example.yaml

commit 8a7bd0b
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 04:58:13 2024 +0200

    Move config example

commit 2cb20f9
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 04:57:02 2024 +0200

    Minor change in README

commit 47800f6
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 04:54:00 2024 +0200

    Make redis async

commit 7f03ae6
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 04:53:43 2024 +0200

    Implement scraper bot logic

commit ac08eea
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 04:52:46 2024 +0200

    Fix result entry hash

commit 166f8aa
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 03:44:06 2024 +0200

    Renaming

commit 6e3fc3f
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 03:43:31 2024 +0200

    Create class for results

commit 381af6e
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 03:40:02 2024 +0200

    Make settings persistent

commit 3d45491
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 02:15:45 2024 +0200

    Fix sigint exit

commit 396870a
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 00:48:05 2024 +0200

    Use playwright in scraper task

commit 7764364
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 00:47:42 2024 +0200

    Minor change

commit 43512a0
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 00:46:52 2024 +0200

    Use async notify version

commit 78b4027
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 00:18:43 2024 +0200

    Add playwright installer in docker

commit 411b8f7
Author: Roberto Bochet <[email protected]>
Date:   Wed Jun 26 00:16:28 2024 +0200

    Update settings

commit a51daa6
Author: Roberto Bochet <[email protected]>
Date:   Tue Jun 25 07:13:42 2024 +0200

    Get demonize also via cli

commit 1fabcde
Author: Roberto Bochet <[email protected]>
Date:   Tue Jun 25 07:13:22 2024 +0200

    Update dependencies

commit d4ca5eb
Author: Roberto Bochet <[email protected]>
Date:   Tue Jun 25 07:11:39 2024 +0200

    Force v in version tag for the ci

commit ee9dfe6
Author: Roberto Bochet <[email protected]>
Date:   Tue Jun 25 07:11:21 2024 +0200

    Remove old files

commit 0aa63ed
Author: Roberto Bochet <[email protected]>
Date:   Tue Jun 25 07:09:32 2024 +0200

    Update settings
  • Loading branch information
RobertoBochet committed Jun 26, 2024
1 parent e598060 commit 1c818db
Show file tree
Hide file tree
Showing 33 changed files with 668 additions and 397 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: build-container
on:
push:
tags:
- '?[0-9]+.[0-9]+.[0-9]+'
- 'v[0-9]+.[0-9]+.[0-9]+'

jobs:
build-container:
Expand Down
6 changes: 5 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,16 @@ COPY . .
RUN poetry build --format wheel


FROM python:3.12-alpine
FROM python:3.12-slim

VOLUME /app

COPY --from=compiler /app/dist/*.whl /

RUN pip3 install --no-cache-dir -- *.whl

RUN playwright install --with-deps firefox

ENV SB__BROWSER__TYPE="firefox"

ENTRYPOINT python3 -m scraper_bot
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ As alternative, you can build by yourself the python package or the container
### Fast deploy (docker-compose)

1. [Create a telegram bot](https://core.telegram.org/bots#3-how-do-i-create-a-bot) and retrieve its token
2. Download `config.yaml` and put into `/etc/scraperbot` folder
2. Download `config.example.yaml` and rename it to `config.yaml`
3. Change the configuration follow the [guidelines](#configuration)
4. Download `docker-compose.yaml`
5. Start the scraper with `docker-compose`
Expand All @@ -44,4 +44,4 @@ Furthermore you can get the config json schema from command line with `--config-
scraper_bot --config-schema
```

You can also find a configuration example in `config.yaml`.
You can also find a configuration example in `config.example.yaml`.
42 changes: 42 additions & 0 deletions config.example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#######################
# Example config.yaml #
#######################
# This file contains a config example
# thought to find real estate ads
# In particular we look for an apartment
# in Milano at least tree rooms
notifications:
message: |
# [{{title}}]({{url}})
{% if location %}📍 *{{location}}*{% endif %}
{% if price %}💶 *{{price}}€*{% endif %}
{% if size %}📐 *{{size}}m²*{% endif %}
format: markdown
channels:
# It is a list of apprise supported channels
# where the scraped entities have to be sent
- "tgram://{YOUR_BOT_TOKEN}/{CHAT_ID1}"
- "tgram://{YOUR_BOT_TOKEN}/{CHAT_ID2}"
- message: "Found a new adds at {{url}}"
format: "text"
uri: "discord://webhook_id/webhook_token"
tasks:
- name: "immobiliare.it"
url: "https://www.immobiliare.it/affitto-case/lodi/?criterio=rilevanza&localiMinimo=3"
target: |
[...document.querySelectorAll("li.in-searchLayoutListItem")].map(t =>({
url: t.querySelector("a.in-listingCardTitle")?.href,
title: t.querySelector("a.in-listingCardTitle")?.innerText,
price: t.querySelector(".in-listingCardPrice span")?.innerText,
size: t.querySelector(".in-listingCardFeatureList__item:nth-child(2) span")?.innerText.replace(/[^0-9]+/g,"")
}))
- name: "mioaffitto"
url: "https://www.mioaffitto.it/search?provincia=50&poblacion=67355"
target: |
[...document.querySelectorAll(".property-list .propertyCard:not(.property-alternative)")].map(t=> ({
url: t.querySelector("a")?.href,
title: t.querySelector("a p")?.innerText,
price: t.querySelector(".propertyCard__price--value")?.innerText.replace(/[^0-9]+/g,""),
size: t.querySelector(".propertyCard__details li:has(.fa-size-o)")?.innerText.replace(/[^0-9]+/g,""),
location: t.querySelector(".propertyCard__location p")?.innerText
}))
38 changes: 0 additions & 38 deletions config.yaml

This file was deleted.

156 changes: 126 additions & 30 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,7 @@ classifiers=[

[tool.poetry.dependencies]
python = "^3.12"
beautifulsoup4 = ">=4.10.0,<4.11.0"
redis = "^4.6.0"
requests = "^2.32.3"
ischedule = ">=1.2.2,<1.3.0"
pyyaml = ">=6.0,<7.0"
pydantic = "^2.7.4"
Expand All @@ -28,6 +26,8 @@ termcolor = "^2.4.0"
urllib3 = "^2.2.2"
apprise = "^1.8.0"
jinja2 = "^3.1.4"
playwright = "^1.44.0"
playwright-stealth = "^1.0.6"


[tool.poetry.group.dev.dependencies]
Expand Down
Loading

0 comments on commit 1c818db

Please sign in to comment.