java - How to keep link title attribute with jsoup? -
using jsoup.clean()
, jsoup turns title
attribute of html link from:
<a href="" title="test <br />">test</a>
into:
<a href="" title="test <br />">test</a>
this demo application:
whitelist whitelist = new whitelist(); whitelist.addtags("a"); whitelist.addattributes("a", "href", "title"); string input = "<a href=\"\" title=\"test <br />\">test</a>"; system.out.println("input: " + input); string output = jsoup.clean(input, whitelist); system.out.println("output: " + output);
which prints:
input: <a href="" title="test <br />">test</a>
output: <a href="" title="test <br />">test</a>
i tried add outputsettings
escapemode
:
outputsettings outputsettings = new outputsettings(); outputsettings.escapemode(escapemode.xhtml);
escapemode.base
, escapemode.extend
have no effect. escapemode.xhtml
prints following:
input: <a href="" title="test <br />">test</a>
output: <a href="" title="test <br />">test</a>
any idea how jsoup not manipulate title
tag?
this known issue/behavior: https://github.com/jhy/jsoup/issues/684 (marked "won't fix" jsoup team).
there's not bug here.
when serializing (i.e. in example when you're printing out xml/html), escape few characters necessary. why > not escaped >; because it's in quoted attribute, there's no ambiguity it's closing tag, doesn't escaped.
Comments
Post a Comment