Skip to content

Script that scrapes Hong Kong supermarket pricing data and pushes it to MongoDB

License

Notifications You must be signed in to change notification settings

prashcr/supermarket-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

supermarket-scraper

Script that scrapes HK supermarket pricing data and dumps it into MongoDB

Gets data from http://www3.consumer.org.hk/pricewatch/supermarket/index.php?keyword=&lang=en

Examples

This is what an example document might look like.

Each document stores product information, and maintains an array of price objects, which are generated each time data is acquired.

{
  "_id" : "P000000038",
  "brand" : "Garden",
  "category" : "Cakes",
  "name" : "Chiffon Cake - Chocolate Flavour 60g",
  "prices" : [
    {
      "date" : ISODate("2015-09-27T08:50:58.309Z"),
      "marketplace" : {
        "price" : "6.80",
        "discount" : true
      },
      "parknshop" : {
        "price" : "6.80",
        "discount" : true
      },
      "wellcome" : {
        "price" : "6.80",
        "discount" : true
      }
    }
  ]
}

_id allows you to look up the product page for an individual product at: http://www3.consumer.org.hk/pricewatch/supermarket/detail.php?itemcode=_id

TODO

Consider moving the price data and product data into two seperate collections, so that price data can be queried more efficiently, without a collection scan.

License

MIT License © 2015 Prashanth Chandra

About

Script that scrapes Hong Kong supermarket pricing data and pushes it to MongoDB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published