Skip to content

lmfmaier/meta-scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Meta Scraper

Build Status Code Climate Test Coverage

SensioLabsInsight

Page meta scraper parse meta information from page.

Instalation

via composer:

composer require tomaj/meta-scraper

How to use

Example:

use Tomaj\Scraper\Scraper;
use Tomaj\Scrapper\Parser\OgParser;
$scraper = new Scraper();
$parsers = [new OgParser()];
$meta = $scraper->parse(file_get_contents('http://www.google.com/'), $parsers);
var_dump($meta);

or you can use parseUrl method (internaly use Guzzle library)

use Tomaj\Scraper\Scraper;
use Tomaj\Scrapper\Parser\OgParser;
$scraper = new Scraper();
$parsers = [new OgParser()];
$meta = $scraper->parseUrl('http://www.google.com/', $parsers);
var_dump($meta);

Parsers

There are 2 parsers included in package and you can crate new implementing interface Tomaj\Scraper\Parser\ParserInterface.

2 parsers:

  • Tomaj\Scraper\Parser\OgParsers - based on og meta attributes in html
  • Tomaj\Scraper\Parser\SchemaParser - based on schema json structure

You can combine these parsers. Data that will not fe found in first parser will be replaced with data from second parser.

use Tomaj\Scraper\Scraper;
use Tomaj\Scrapper\Parser\SchemaParser;
use Tomaj\Scrapper\Parser\OgParser;
$scraper = new Scraper();
$parsers = [new SchemaParser(), new OgParser()];
$meta = $scraper->parseUrl('http://www.google.com/', $parsers);
var_dump($meta);

About

Page meta scraper

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 99.5%
  • Makefile 0.5%