c# Regex replace everything not in specific UTF-8 character set ranges (whitelist) -
i'm trying include non-printable characters specific latin character set: http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=0x
my regex looks this:
var output = regex.replace(input, @"[^\u0020-\u007e]|[^\u00a0-\u00ff]", string.empty); i have problem line separator' (u2028) specifically, want exclude control character well, wanted whitelist, rather blacklist.
i'm trying include u0020 (space) through u007e (tilde) or u00a0 (no-break space) through u00ff (latin small letter y diaeresis).
i've got negation wrong on sets, can't figure out how solve it. ideas?
update
the following appears work
var input = "</span><span>
</span><span>" var output = regex.replace(input, @"[^\u0020-\u007e\u00a0-\u00ff]", string.empty); // gives: </span><span> </span><span> example working: http://rextester.com/yciwtn86420
Comments
Post a Comment