error while decoding html and farsi from hex encoding in python -


i have string in hex enocoding this:

data = \xd8\xa7\xdb\x8c \xd9\x84\xda\x86\xdb\x8c<br/> \xd8\xa7\xda\xaf\xd8\xb1\xda\x86\xd9\x87 \xd8\xa7\xd9\x82\xd8\xaf\xd8\xa7\xd9\x85\xd8\xa7\xd8\xaa

it contains persian string , html elements.

using ddcode.com convert them , meaningful results(i'm not sure string in hex!), when want decode strings python errors.

using codec: codecs.decode(data,'hex',errors='ignore')

i

assertionerror                            traceback (most recent call last) <ipython-input-124-5246163fba41> in <module>() ----> 1 codecs.decode(data,'hex',errors='ignore')  assertionerror: decoding 'hex' codec failed (assertionerror: ) 

using binascii: binascii.unhexlify(data)

i get:

--------------------------------------------------------------------------- valueerror                                traceback (most recent call last) <ipython-input-126-fbe8c6445b8a> in <module>()       1 import binascii ----> 2 binascii.unhexlify(data)  valueerror: string argument should contain ascii characters. 

what suggestion? string in hex? if there none hex in string how can ignore them during decoding?

is string in hex?

no, it's in bytes, using broken encoding.

>>> 'data  \xd8\xa7\xdb\x8c \xd9\x84\xda\x86\xdb\x8c<br/> \xd8\xa7\xda\xaf\xd8\xb1\xda\x86\xd9\x87 \xd8\xa7\xd9\x82\xd8\xaf\xd8\xa7\xd9\x85\xd8\xa7\xd8\xaa'.encode('latin-1').decode('utf-8') 'data  ای لچی<br/> اگرچه اقدامات' 

Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -