-
Notifications
You must be signed in to change notification settings - Fork 42
Consider using DOMDocument recovery mode #73
Comments
The problem is that false results could lead to subsequent errors in parsing and handling of the entire feed. Maybe it's an option to inject your own |
You can use the recovery mode yourself: // Import by URI
$httpClient = Zend\Feed\Reader\Reader::getHttpClient();
$response = $httpClient->get(
'https://github.com/zendframework/zend-feed/releases.atom'
);
$xmlString = $response->getBody();
// Create DOMDocument
$dom = new DOMDocument;
$dom->recover = true;
$dom->loadXML(trim($xmlString));
// Detect type
$type = Zend\Feed\Reader\Reader::detectType($dom);
// Create reader
if (0 === strpos($type, 'rss')) {
$reader = new Zend\Feed\Reader\Feed\Rss($dom, $type);
}
if (0 === strpos($type, 'atom')) {
$reader = new Zend\Feed\Reader\Feed\Atom($dom, $type);
}
var_dump($reader->getTitle()); // "Release notes from zend-feed" |
Thanks for help! This is indeed what I ended up doing: |
@Isinlor Can you provide a link to a feed which is malformed and needs the recovery mode? |
Here is one example: http://itbrokeand.ifixit.com/atom.xml Code I used for testing: <?php
$libxmlErrflag = libxml_use_internal_errors(true);
$oldValue = libxml_disable_entity_loader(true);
$dom = new \DOMDocument;
//$dom->recover = true; // Allows to parse slightly malformed feeds
$status = $dom->loadXML(file_get_contents("http://itbrokeand.ifixit.com/atom.xml"));
if (!$status) {
// Build error message
$error = libxml_get_last_error();
if ($error instanceof \LibXMLError && $error->message != '') {
$error->message = trim($error->message);
$errormsg = "DOMDocument cannot parse XML: {$error->message}";
} else {
$errormsg = "DOMDocument cannot parse XML: Please check the XML document's validity";
}
throw new Exception($errormsg);
} |
@Isinlor |
I think your initial reaction was correct.
I missed it when I was working on it myself. But indeed, even tough I'm really curious how Firefox handle it, because I have no issues if I open: |
@Isinlor |
https://blog.noredink.com/rssThere were some problems, but now I have not found anything. http://itbrokeand.ifixit.com/atom.xmlProblem is (Also fails in a browser.) http://aasnova.org/feed/Two problems: 403 and wrong header. (Also fails in a browser. [Download]) https://blog.floydhub.com/rss/Many feeds contain characters out of the legal range. Try the following preg_replace(
'/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+/u',
' ',
$string
) This should eliminate problems like "CData section not finished". (Also fails in a browser.) Thanks for the examples. At the moment I do not know if we should do something in zend-feed, because it opens the door to many pitfalls or ugly workarounds. I see the benefit for the user but also the problem of maintain. I remain open to suggestions and improvements. |
This repository has been closed and moved to laminas/laminas-feed; a new issue has been opened at laminas/laminas-feed#8. |
See stack overflow for details: https://stackoverflow.com/a/9281963/893222
The idea is to handle malformed XML thanks to recovery option in libxml that is implemented in userland:
The text was updated successfully, but these errors were encountered: