python - Parse text file from content-type=application/zip and base64 encoding in AWS SES -


on amazon ses, have rule save incoming emails s3 buckets. amazon saves these in mime format.

these emails have .txt in attachment shown in mime file content-type=text/plain, content-disposition=attachment ... .txt, , content-transfer-encoding=quoted-printable or bases64.

i able parse fine using python.

i have problem decoding content of .txt file attachment when compressed (i.e., content-type: applcation/zip), if encoding wasn't base64.

my code:

import base64 s = unicode(base64.b64decode(attachment_content), "utf-8") 

throws error:

traceback (most recent call last):   file "<input>", line 796, in <module> unicodedecodeerror: 'utf8' codec can't decode byte 0xcf in position 10: invalid continuation byte 

below first few lines of "base64" string in attachment_content, btw has length 53683 + "==" @ end, , thought length of base64 should multiple of 4 (??). maybe decoding failing because compression changing attachment_content , need other operation before/after decoding it? have no idea..

uesdbbqaaaaiam9ah0otgkpwx5oaadmtagajaaaax2noyxqudhh0tl3bjirjkix23sd+g0u3ioxu rewgu8c1l2ag8lkd0v2zwajm3kluc6hubu5ufezm3nyjl6+n4t4ry8eodwcsmyqxbrblgmq+7cp5 qpbj5gdyn0cri6jqfxwv7hlyszursijv1g6qoni5cmqyet6dpp9cncat6yvp5yvz6xfje7cp8p/k 1sbl8xfju0osvuvr2q3tonfvwjxrknwzfeuk2vrlu978s19mrvnmrhneov51sozlgutmlynfp0nd  ... 

i have tried used "latin-1", gibberish.

the problem that, after conversion, dealing zipped file in format, "pk \x03 \x04 \x3c \xa \x0c ...", , needed unzip before transforming utf-8 unicode.

this code worked me:

import email  # parse results email received_email = email.message_from_string(email_text) part in received_email.walk():     c_type = part.get_content_type()     c_enco = part.get('content-transfer-encoding')      attachment_content = part.get_payload()      if c_enco == 'base64':         import base64         decoded_file = base64.b64decode(attachment_content)         print("file decoded base64")          if c_type == "application/zip":             cstringio import stringio             import zipfile             zfp = zipfile.zipfile(stringio(decoded_file), "r")             unzipped_list = zfp.open(zfp.namelist()[0]).readlines()             decoded_file = "".join(unzipped_list)             print('and un-zipped')      result = unicode(decoded_file, "utf-8") 

Comments

Popular posts from this blog

How to understand 2 main() functions after using uftrace to profile the C++ program? -

c# - Update a combobox from a presenter (MVP) -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -