A serverless web browser which crawls websites and compares pages by schedule, with Serverless, AWS Lambda, Headless Chrome and Selenium!
What it can do | Prerequisites | Configuration | Deploy
- Runs headless chrome in a serverless environement.
- Finds web elements with xpath and compares it.
- Send notifications/emails if something has changed.
- You need to have a aws account,
aws_access_key_id
andaws_secret_access_key
. If you don't an account yet, go to here to sign up for one, and then follow the instructions here to get the key_id and access key. - Install node and make sure
npm
is available in your path. - Install docker from here.
- Install Serverless:
npm install -g serverless
- Configure serverless with aws, using the
aws_access_key_id
andaws_secret_access_key
from step 0:sls config credentials -p aws -k aws_access_key_id -s aws_secret_access_key
- Clone this project:
git clone https://github.com/LeiShi1313/serverless-web-differ.git
- Install the dependencies:
cd serverless-web-differ && npm install
There are couple of things you need to configure before actually make the function running in the cloud. First you need to create a file called config.yml
and copy/paste everything from config.yml.example
. Inside the config.yml
:
events
: This is the place you can define what is the frequency the function runs, you can read more from here. But most of the time, you want one entry like- schedule: rate(1 hour)
or- schedule: cron(0 * * * ? *)
, which both means run the function every hour.server_chan
/sendgrid
/ifttt
/pushbullet
: There are the pre-defined ways to notify you if something has changed on the websites you interested. You can read the comments to find out how to get the security keys.websites
: This is the main configuration part.websites
can have multiple entries, and each one represents a website you want to periodically check.url
andxpath
is easy to understand, if you don't know how to get the xpath, try searchxpath [THE BROWSER YOU ARE USING]
, you should be able to find a lot of informations. Fororiginal
, actually it can be absent if you are expecting a web element will appear, otherwise the function will compare the EXACT text fromoriginal
and the web element of the web page byxpath
.
This is simple, when everything is all set, run:
sls deploy
and 🎉🎉🎉
- Add Chinese ver. README.
- For each
website
, deploy a lambda function instead of visiting all websites in one function. - Make
notify.py
another lambda function. - Add more ways to notify.
- Maybe move
notify.py
to another project and make a submodule here? - Provide a headerless chrome docker image
- Options for other cloud provider: Google Cloud, Aliyun, etc