Abandoned pet-project to get acquainted with the Go programming language. Some good practices are not supported here. The project is workable and ready to be used.
You need to prepare the pricewatcher.db
before use the app.
First, populate the shops
table:
id
- unique identifier;shops.title
- shop name; e.g., aliexpress, ozon, etc.title_x_path
,picture_x_path
,price_x_path
- x path of the html element that stores the data to be parsed; for example it could be taken from the google chrome html inspector.
Second, populate the items
table by data you want to parse:
id
- uniqure identifier;items.title
- name of the item you want to parse; e.g., ps5, iphoneN, etc.;link
- url to the item page in a web store;shop_id
-id
of the record from theshops
table;
Third. go run main.go
Results can be found in the prices
table.
btw there should be the log/
directory in the main.go
folder:
%YYYY-mm-dd_hh:mm%.log
- is the output from the main.go execution;%timestamp%/%shop name%/%item title%/
- directory contains received screenshot and html files of the page to be parsed;
There is a dockerfile I've checked only once :]
main function already has (but commented) usage of the cron lib for go. You can add it to depends and uncomment section from the main()
function.
I used crontab on ubuntu server:
crontab -e
add this line at the end:
0 */3 * * * cd root/workspace/priceparser && bash launch.sh
launch.sh
with the following content:
#!/bin/bash
cd /root/workspace/priceparser/
go run main.go >> "log/$(date +%Y-%m-%d_%H:%M).log"
I almost forgot. chromedp requires chrome to be installed on your system :] I believe you just need to install headless-chrome for linux server. I think I used this github-gist to install this on my ubuntu server.
- - add table to store parsed values;
- - add connection from go to sqlite db;
- - read/write data from/to sqlite db;
- - receive shop lists from db;
- - receive shop items from db;
- - fill the database with source data that will need to be parsed;
- - select and prepare data to parse;
- - parse only 1 element (price w/o title, desc, etc) from the source;
- - can't receive data from aliexpress;
- - prepare structure to store into db parsed data;
- - remove letters from parsed price;
- - do not forget to store original (not filtered) price into db;
- - add parsing data into db;
- - store screeshot and logs paths into db;
- - use proxy;
- [] - deploy via docker;
- [] - use this cron library
- [] - can use datadog to check logs;
- [] - add concurrent execution (no need to parse data sequentially);
- [] - increase timeout between requests to the same domains;
- [] - need to switch emulated-clients periodically;