python - Loop through URLs, apply BeautifulSoup, save files named after tuple elements -

June 15, 2014

this post contains several problems. thank taking look.

i have dataframe containing columns 'url,' 'cik' , 'date.' urls 10-ks edgar website -- ciks unique id's every filing entity, in case wondering. portion of dataframe can found csv here.

i loop through each url, apply beautifulsoup, , save each unique text file named after cik , date.

my code far:

import urllib bs4 import beautifulsoup import pandas pd import numpy import os  #x dataframe including columns 'url', 'cik' , 'date' #convert x tuple  subset = x[['url', 'cik', 'date']] tuples = [tuple(x) x in subset.values]  os.chdir("c:/10k/python")  #goal: loop through each url, run bs, #write .txt named matching cik , date element  index, url in enumerate(tuples):      fp = urllib.request.urlopen(tuples)     test = fp.read()     soup = beautifulsoup(test,"lxml")     output=soup.get_text()     file=open("url%s.txt","w",encoding='utf-8')     file.close()     file.write(output)

a couple of issues:

when try write loop dataframe following error:

'series' objects mutable, cannot hashed

i believe answer here convert tuple, did. makes immutable. not sure how refer different elements of tuple when writing loop.

the next step attempted use enumerate loop through tuple. getting following error this:

attributeerror: 'list' object has no attribute 'timeout'.

i believe means loop trying read tuple in entirety , not each element, not sure , cannot find answer on forums.

finally, not sure how reference elements of tuple when writing each file .txt. right have url%s url1, url2, etc.

Search This Blog

MOno

python - Loop through URLs, apply BeautifulSoup, save files named after tuple elements -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

python - ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> -

java - How to provide dependency injections in Eclipse RCP 3.x? -