Skip to content

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

Notifications You must be signed in to change notification settings

diggin/Diggin_Http_Charset

Repository files navigation

Diggin_Http_Charset

Automatically convert to UTF-8.

Master: Build Status Coverage Status

Detecting based on header's charset & html meta charset.

(handling several charset more carefully - SJIS-win, TIS-620 and others..)

This library aims to used in web-scraping.

Requirements

  • PHP 5.3 or over
  • mbstring and iconv

Usage

  1. wrap response object:
<?php
use Diggin\Http\Charset\WrapperFactory;
$client = new Zend\Http\Client($url);
$response = $client->send();
$response = WrapperFactory::factory($response); // then, response getBody() return with converted UTF-8.

Please see more at demos/Diggin/Http/Charset .

Guzzle & Goutte

guzzle-plugin-AutoCharsetEncodingPlugin supports for using with Guzzle3.

Usage of with Behat by @MugeSo

Technical Information

Diggin_Http_Charset is based on HTMLScraping.

License

Diggin_Http_Charset is licensed under LGPL(GNU Lesser General Public License).

Similar library

TODOs

  • handling non text/html content types.
  • better APIs & according ZF2 coding standard.
  • struggle in more charset :-\

About

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages