Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 1.33 KB

dynamic-web-content-scraping-with-requests-html.md

File metadata and controls

24 lines (17 loc) · 1.33 KB

Dynamic Web Content Scraping with Requests HTML

Beautifulsoup, combine with requests is very powerful for scraping data from static websites. However, if the content on a site is dynamically loaded via JavaScript, we will need to use tools such as Selenium or requests-html. Here is a quick example that I put together using Python's requests-html module.

from requests_html import HTMLSession

url = 'https://www.beerwulf.com/en-gb/c/all-beers?segment=Beers&catalogCode=Beer_1'

s = HTMLSession()
r = s.get(url)

# The sleep here is after the content is loaded. This will ensure the content is fully loaded and avoid problems when we start scraping the content
r.html.render(sleep=1)
products = r.html.xpath('//*[@id="product-items-container"]', first=True)

for item in products.absolute_links:
    r = s.get(item)
    name = r.html.find('.product-detail-info-title > h1', first=True).text
    subtext = r.html.find('.product-subtext > span', first=True).text
    print(name, subtext)

In the example above, the HTMLSession call will create a browser under the hood, so it will wait for the content to be loaded and from there start query the DOM elements and scraping the content that we want.