simplexml - DOMDocument saveHTML is not returning correct HTML Standards for "IMG", "INPUT" -
i'm big fan of php library phpquery content parser (because quite jquery, while using php domdocument extract markup) i've noticed bug specific elements quick closing event <img />
instead of <div></div>
i've noticed bug occurs in domdocument
phpquery
.
i've written simple class phpcontentdocument dump simple html document.
require_once "../phpquery_lib/phpquery.php"; require_once "phpcontentdocument.class.php"; $sample_document = new phpcontentdocument('sample document'); $sample_document->addelement('text element', "<span class='text_element'>this sample text</span>"); $sample_document->addelement('image element', "<img src='png_file.png' alt='png_file' id='png_file' />"); $sample_document_string = $sample_document->get_string();
the results expect ...
<!doctype html> <html> <head> <title>sample document</title> <meta http-equiv="content-type" content="text/html;charset=utf-8" /> <body> <span class='text_element'>this sample text</span> <img src='png_file.png' alt='png_file' id='png_file' /> </body> </html>
but when recalling document using savehtml
$php_query_document = new domdocument('utf-8', '1.0'); $php_query_document->formatoutput = true; $php_query_document->preservewhitespace = true; $php_query_document->loadhtml($sample_document_string); $php_query_document_string = $php_query_document->savehtml(); echo $php_query_document_string;
it returns ...
<!doctype html> <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8"> <title>sample document</title> </head> <body> <span class="text_element">this sample text</span> <img src="png_file.png" alt="png_file" id="png_file"> </body> </html>
the main problem have this, when use simplexmlelement on element img#png_file
(for example)
using content parser passing <img src="png_file.png" alt="png_file" id="png_file">
argument
$simple_doc = new simplexmlelement((string) $php_query_document->find('img#png_file'));
i following warnings , exceptions, though original markup work simplexmlelement
.
warning: simplexmlelement::__construct(): entity: line 1: parser error : premature end of data in tag img line 1 in f:\xampp\htdocs\test_code\phpquery_test_items\index.php on line 17 warning: simplexmlelement::__construct(): <img src="png_file.png" alt="png_file" id="png_file"> in f:\xampp\htdocs\test_code\phpquery_test_items\index.php on line 17 warning: simplexmlelement::__construct(): ^ in f:\xampp\htdocs\test_code\phpquery_test_items\index.php on line 17 fatal error: uncaught exception 'exception' message 'string not parsed xml' in f:\xampp\htdocs\test_code\phpquery_test_items\index.php:17 stack trace: #0 f:\xampp\htdocs\test_code\phpquery_test_items\index.php(17): simplexmlelement->__construct('<img src="png_f...') #1 {main} thrown in f:\xampp\htdocs\test_code\phpquery_test_items\index.php on line 17
due element having no closing event
.
tl:dr warning: simplexmlelement::__construct(): entity: line 1: parser error : premature end of data in tag img line 1
how can fix this? have ideas preferably
- i want solution can use regex (where know element type) in order replace
/>
<{element_type}/>
, , vice versa. domdocument
classsavehtml
fixed (maybe class extendsdomdocument
in order inherit other functionality).
if use domdocument::savexml()
instead of domdocument::savehtml()
you'll valid xml.
if necessary strip xml declaration line <?xml version="1.0" encoding="utf-8" standalone="yes"?>
.
i realized want find()
method return proper xml. therefore i'm not sure above-mentioned suggestion helpful, if means have alter class implements method.
perhaps little convoluted like:
$node = $php_query_document->find('img#png_file'); $simple_doc = new simplexmlelement( $node->ownerdocument->savexml( $node ) );
this presupposes $node
implementation of domnode
, suspect is. ask $node->ownerdocument
(the domdocument
contains node) save specific node xml.
another possibility (which not recommend) let simplexml
lenient, when parsing, passing following libxml options constructor:
$simple_doc = new simplexmlelement( (string) $php_query_document->find('img#png_file'), libxml_noerror | libxml_err_none | libxml_err_fatal );
this suppresses libxml errors while parsing content. libxml underlying xml parser, used simplexml , domdocument (amongst others).
Comments
Post a Comment