Note: This repository is intended only as a study project
Experimental scraper project to retrieve data from Park4Night built using Node.js, the Playwright API, and Supabase to store place detail page data from the new Park4Night website.
The enqueuePlaceList
function initiates the getPlaceIdList
function, which reads the range.json
file and downloads the requested range of IDs from a Supabase table called places.
This table contains all the available place IDs from Park4Night, retrieved from an old and now removed public endpoint, converted from JSON to SQL rows.
A file named queueList.json
will be created, containing a list of IDs to be scanned. The extractData
function will then be enqueued to process each ID.
Once the dequeue process is completed, the program will execute the updateRange
function to download the next set of IDs to be scanned.
To retrieve data such as contact information, you need to be logged in.
You can set your PHPSESSID in the storageState
file or use the login file to dynamically set the session. (Please note that the provided file is currently an example.)
Download p4n-scaper repo and launch
npm install
To successfully run this project, please make sure to include the following environment variables in your .env file, see the env.example file
BASE_URL = https://www.park4night.com
BASE_PLACE_PAGE_URL = place
BASE_LOGIN_URL = auth/login
P4N_USERNAME
P4N_PASSWORD
SUPABASE_KEY
SUPABASE_URL
UPDATE_RANGE = 5000
CONCURRENT = 5
Disable javascript to scrape fast as hell
JAVASCRIPT = true
Enable only the scrape modules you need or add yours
no get images module are currently provided
GET_TITLE = true
GET_CONTACTS = true
GET_ADDRESS = true
GET_USEFUL_INFORMATION = true
GET_SERVICES = true
GET_ACTIVITIES = true
GET_LOWER_RATING_IDS = false
npm run start
npm run test
To convert geojson data to Vector Tiles use Tippecanoe from MapBox.
Under folder json_to_geojson you will find an index.ts with a launchConversion
function who can take a json file of places and trasform them to .geojson
spatial data with the current proprierties:
title,
place_id,
code,