java - How to keep link title attribute with jsoup? -


using jsoup.clean(), jsoup turns title attribute of html link from:

<a href="" title="test &lt;br /&gt;">test</a> 

into:

<a href="" title="test <br />">test</a> 

this demo application:

whitelist whitelist = new whitelist(); whitelist.addtags("a"); whitelist.addattributes("a", "href", "title");  string input = "<a href=\"\" title=\"test &lt;br /&gt;\">test</a>"; system.out.println("input: " + input); string output = jsoup.clean(input, whitelist); system.out.println("output: " + output); 

which prints:

input: <a href="" title="test &lt;br /&gt;">test</a>
output: <a href="" title="test <br />">test</a>

i tried add outputsettings escapemode:

outputsettings outputsettings = new outputsettings(); outputsettings.escapemode(escapemode.xhtml); 

escapemode.base , escapemode.extend have no effect. escapemode.xhtml prints following:

input: <a href="" title="test &lt;br /&gt;">test</a>
output: <a href="" title="test &lt;br />">test</a>

any idea how jsoup not manipulate title tag?

this known issue/behavior: https://github.com/jhy/jsoup/issues/684 (marked "won't fix" jsoup team).

there's not bug here.

when serializing (i.e. in example when you're printing out xml/html), escape few characters necessary. why > not escaped >; because it's in quoted attribute, there's no ambiguity it's closing tag, doesn't escaped.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -