csv - parse a list looking string having dict type elements in python -
i want parse below list looking string, ( calling string because type str
) , info dict elements:
"[{""isin"": ""us51817r1068"", ""name"": ""latam airlines group sa""}, {""isin"": ""cl0000000423"", ""name"": ""latam airlines group sa""}, {""isin"": null, ""name"": ""latam airlines group sa""}, {""isin"": ""brlatmbdr001"", ""name"": ""latam airlines group sa""}]"
i used ast packege , literal_eval convert list , parse on it. counter valueerror: malformed string
error.
below code same:
company_list = ast.literal_eval(line[18]) print company_list in company_list: #print type(i) print i["isin"]
here line[18] string above.
or how can ignore such list lookign string if contains null value, does.
ps: line[18] column number of csv want read.
ok going start off saying: wow way harder thought going be!
so 2 problems string:
- when python prints string removes double-quotes because parser getting confused - have add them in.
- the
null
type doesn't exist in python need changenone
.
so here's code:
import re import ast data_in = "[{""isin"": ""us51817r1068"", ""name"": ""latam airlines group sa""}, {""isin"": ""cl0000000423"", ""name"": ""latam airlines group sa""}, {""isin"": null, ""name"": ""latam airlines group sa""}, {""isin"": ""brlatmbdr001"", ""name"": ""latam airlines group sa""}]" # make copy modification. formatted_data = data_in # captures positional information of adding , removing characters. offset = 0 # finds key , values. p = re.compile("[\{\:,]([\w\s\d]{2,})") m in p.finditer(data_in): # counts number of characters removed via strip(). strip_val = len(m.group(1)) - len(m.group(1).strip()) # adds in quotes single match. formatted_data = formatted_data[:m.start(1)+offset] + "\"" + m.group(1).strip() + "\"" + formatted_data[m.end(1)+offset:] # offset add 2 ("+name+"), minus whitespace removed. offset += 2 - strip_val company_list = ast.literal_eval(formatted_data) # finds 'null' values , replaces them none. item in company_list: k,v in item.iteritems(): if v == 'null': item[k] = none print company_list
it written in python 3 , changed bits remembered 2, there might small errors.
the result list
of dict
objects:
[{'isin': 'us51817r1068', 'name': 'latam airlines group sa'}, {'isin': 'cl0000000423', 'name': 'latam airlines group sa'}, {'isin': none, 'name': 'latam airlines group sa'}, {'isin': 'brlatmbdr001', 'name': 'latam airlines group sa'}]
for more info on regex used, see here.
Comments
Post a Comment