PyScraping

PyScraping is a universal web-scraping util for Python, built with simplicity in mind.

Installation

Start to do the installation.

pip install pyscraping

Example

All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways:

from pyscraping.PyScraping import PyScraping

pyscraping = PyScraping("https://google.com")

print(pyscraping.title())

Documentation

`<head>` tags

Get Website Title

Scraping the title from a website is simple.

pyscraping.title()

Get Meta Charset

To access the defined charset, you can use the following method:

pyscraping.charset()

Get Meta Viewport

In some cases, such as the viewport and the meta keywords, the string is representing an array and will be provided as such:

pyscraping.viewport()

If you need to access the original "viewport"-string, you can use viewportString:

pyscraping.viewportString()

Get Canonical URL

The canonical URL, if given, can be accessed as shown in the example below:

pyscraping.canonical()

Get Meta Content-Type

To access the content type you can use the following functionality:

pyscraping.contentType()

Get Meta CSFR Token

The CSFR token method assumes that the token is stored in a meta tag with the name "csrf-token". This is the default for Laravel. You can access it using the following code:

pyscraping.csrfToken()

Get Meta Author, Description and Image

The following example shows the extraction of three attributes:

the Meta Author,
the Meta Description and
the Meta Image URL

pyscraping.author()
pyscraping.description()
pyscraping.image()

Get Meta Keywords

The keywords meta-tag is naturally an array and will be split for your convenience:

pyscraping.keywords()

Alternatively, you can access the original keyword string:

pyscraping.keywordString()

Get Meta Open-Graph (OG) Data

Fetching open-graph data can be done:

og:site_name
og:type
og:title
og:description
og:url
og:image

# Example
pyscraping.openGraph("og:title")

# All
pyscraping.openGraph()

Get Meta Twitter Card

Parsing the Twitter Card works similarly:

twitter:card
twitter:title
twitter:description
twitter:url
twitter:image

# Example
pyscraping.twitterCard("twitter:title")

# All
pyscraping.twitterCard()

`<body>` tags

Get Headings by Level

There might be cases, in which all headings of a particular level should be retrieved. The example below shows how to do so:

pyscraping.h1()
pyscraping.h2()
pyscraping.h3()
pyscraping.h4()
pyscraping.h5()
pyscraping.h6()

Get all Paragraphs

The following example will return a list of all paragraphs (<p>-tags) on the website:

pyscraping.p()

Get Unordered Lists

The following example will return a list of all list (<ul>-tags) on the website:

pyscraping.ul()

Get Ordered Lists

The following example will return a list of all list (<ol>-tags) on the website:

pyscraping.ol()

Get all Image URLs

The following example parses a web-page for images and returns absolute image URLs as an array.

pyscraping.images()

Get all Images with Details

If you are in need of more details the following requests allows you to access attributes of the image tag:

pyscraping.imagesDetails()

Get all Link List

The following example parses a web-page for any links and returns an array of absolute URLs:

pyscraping.links()

Get all Links with Details

If you are in need of more details you can access these in a similar way as on the images. Below is an example to access the detailed data of the first link on the page:

pyscraping.linksDetails()

Custom xPath Selectors

The following examples of custom selectors should be seen as a starting point for any custom information you need to scrape.

pyscraping.filter(element, attribute)

Example

pyscraping.filter('div', 'class="container"')

Donate

Contact me

Contact me via email: [email protected], I'm waiting for your input or suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
pyscraping		pyscraping
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
upload.sh		upload.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyScraping

Installation

Example

Documentation

`<head>` tags

Get Website Title

Get Meta Charset

Get Meta Viewport

Get Canonical URL

Get Meta Content-Type

Get Meta CSFR Token

Get Meta Author, Description and Image

Get Meta Keywords

Get Meta Open-Graph (OG) Data

Get Meta Twitter Card

`<body>` tags

Get Headings by Level

Get all Paragraphs

Get Unordered Lists

Get Ordered Lists

Get all Image URLs

Get all Images with Details

Get all Link List

Get all Links with Details

Custom xPath Selectors

Donate

Contact me

About

Releases

Packages

Languages

License

rioagungpurnomo/pyscraping

Folders and files

Latest commit

History

Repository files navigation

PyScraping

Installation

Example

Documentation

<head> tags

Get Website Title

Get Meta Charset

Get Meta Viewport

Get Canonical URL

Get Meta Content-Type

Get Meta CSFR Token

Get Meta Author, Description and Image

Get Meta Keywords

Get Meta Open-Graph (OG) Data

Get Meta Twitter Card

<body> tags

Get Headings by Level

Get all Paragraphs

Get Unordered Lists

Get Ordered Lists

Get all Image URLs

Get all Images with Details

Get all Link List

Get all Links with Details

Custom xPath Selectors

Donate

Contact me

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`<head>` tags

`<body>` tags

Packages