simplexml - DOMDocument saveHTML is not returning correct HTML Standards for "IMG", "INPUT" -


i'm big fan of php library phpquery content parser (because quite jquery, while using php domdocument extract markup) i've noticed bug specific elements quick closing event <img /> instead of <div></div>

i've noticed bug occurs in domdocument phpquery.

i've written simple class phpcontentdocument dump simple html document.

require_once "../phpquery_lib/phpquery.php"; require_once "phpcontentdocument.class.php";  $sample_document = new phpcontentdocument('sample document'); $sample_document->addelement('text element', "<span class='text_element'>this sample text</span>"); $sample_document->addelement('image element', "<img src='png_file.png' alt='png_file' id='png_file' />");  $sample_document_string = $sample_document->get_string(); 

the results expect ...

<!doctype html> <html> <head> <title>sample document</title> <meta http-equiv="content-type" content="text/html;charset=utf-8" /> <body> <span class='text_element'>this sample text</span> <img src='png_file.png' alt='png_file' id='png_file' /> </body> </html> 

but when recalling document using savehtml

$php_query_document = new domdocument('utf-8', '1.0'); $php_query_document->formatoutput = true; $php_query_document->preservewhitespace = true; $php_query_document->loadhtml($sample_document_string);  $php_query_document_string = $php_query_document->savehtml();  echo $php_query_document_string; 

it returns ...

<!doctype html> <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8"> <title>sample document</title> </head> <body> <span class="text_element">this sample text</span> <img src="png_file.png" alt="png_file" id="png_file"> </body> </html> 

the main problem have this, when use simplexmlelement on element img#png_file (for example)

using content parser passing <img src="png_file.png" alt="png_file" id="png_file"> argument

$simple_doc = new simplexmlelement((string) $php_query_document->find('img#png_file')); 

i following warnings , exceptions, though original markup work simplexmlelement.

warning: simplexmlelement::__construct(): entity: line 1: parser error : premature end of data in tag img line 1 in f:\xampp\htdocs\test_code\phpquery_test_items\index.php on line 17  warning: simplexmlelement::__construct(): <img src="png_file.png" alt="png_file" id="png_file"> in f:\xampp\htdocs\test_code\phpquery_test_items\index.php on line 17  warning: simplexmlelement::__construct(): ^ in f:\xampp\htdocs\test_code\phpquery_test_items\index.php on line 17  fatal error: uncaught exception 'exception' message 'string not parsed xml' in f:\xampp\htdocs\test_code\phpquery_test_items\index.php:17 stack trace: #0 f:\xampp\htdocs\test_code\phpquery_test_items\index.php(17): simplexmlelement->__construct('<img src="png_f...') #1 {main} thrown in f:\xampp\htdocs\test_code\phpquery_test_items\index.php on line 17 

due element having no closing event.

tl:dr warning: simplexmlelement::__construct(): entity: line 1: parser error : premature end of data in tag img line 1

how can fix this? have ideas preferably

  • i want solution can use regex (where know element type) in order replace /> <{element_type}/>, , vice versa.
  • domdocument class savehtml fixed (maybe class extends domdocument in order inherit other functionality).

if use domdocument::savexml() instead of domdocument::savehtml() you'll valid xml.

if necessary strip xml declaration line <?xml version="1.0" encoding="utf-8" standalone="yes"?>.


i realized want find() method return proper xml. therefore i'm not sure above-mentioned suggestion helpful, if means have alter class implements method.

perhaps little convoluted like:

$node = $php_query_document->find('img#png_file'); $simple_doc = new simplexmlelement( $node->ownerdocument->savexml( $node ) ); 

this presupposes $node implementation of domnode, suspect is. ask $node->ownerdocument (the domdocument contains node) save specific node xml.


another possibility (which not recommend) let simplexml lenient, when parsing, passing following libxml options constructor:

$simple_doc = new simplexmlelement(     (string) $php_query_document->find('img#png_file'),      libxml_noerror | libxml_err_none | libxml_err_fatal ); 

this suppresses libxml errors while parsing content. libxml underlying xml parser, used simplexml , domdocument (amongst others).


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -