error while decoding html and farsi from hex encoding in python -
i have string in hex enocoding this:
data = \xd8\xa7\xdb\x8c \xd9\x84\xda\x86\xdb\x8c<br/> \xd8\xa7\xda\xaf\xd8\xb1\xda\x86\xd9\x87 \xd8\xa7\xd9\x82\xd8\xaf\xd8\xa7\xd9\x85\xd8\xa7\xd8\xaa
it contains persian string , html elements.
using ddcode.com convert them , meaningful results(i'm not sure string in hex!), when want decode strings python errors.
using codec: codecs.decode(data,'hex',errors='ignore')
i
assertionerror traceback (most recent call last) <ipython-input-124-5246163fba41> in <module>() ----> 1 codecs.decode(data,'hex',errors='ignore') assertionerror: decoding 'hex' codec failed (assertionerror: )
using binascii
: binascii.unhexlify(data)
i get:
--------------------------------------------------------------------------- valueerror traceback (most recent call last) <ipython-input-126-fbe8c6445b8a> in <module>() 1 import binascii ----> 2 binascii.unhexlify(data) valueerror: string argument should contain ascii characters.
what suggestion? string in hex? if there none hex in string how can ignore them during decoding?
is string in hex?
no, it's in bytes, using broken encoding.
>>> 'data \xd8\xa7\xdb\x8c \xd9\x84\xda\x86\xdb\x8c<br/> \xd8\xa7\xda\xaf\xd8\xb1\xda\x86\xd9\x87 \xd8\xa7\xd9\x82\xd8\xaf\xd8\xa7\xd9\x85\xd8\xa7\xd8\xaa'.encode('latin-1').decode('utf-8') 'data ای لچی<br/> اگرچه اقدامات'
Comments
Post a Comment