python - Loop through URLs, apply BeautifulSoup, save files named after tuple elements -


this post contains several problems. thank taking look.

i have dataframe containing columns 'url,' 'cik' , 'date.' urls 10-ks edgar website -- ciks unique id's every filing entity, in case wondering. portion of dataframe can found csv here.

i loop through each url, apply beautifulsoup, , save each unique text file named after cik , date.

my code far:

import urllib bs4 import beautifulsoup import pandas pd import numpy import os  #x dataframe including columns 'url', 'cik' , 'date' #convert x tuple  subset = x[['url', 'cik', 'date']] tuples = [tuple(x) x in subset.values]  os.chdir("c:/10k/python")  #goal: loop through each url, run bs, #write .txt named matching cik , date element  index, url in enumerate(tuples):      fp = urllib.request.urlopen(tuples)     test = fp.read()     soup = beautifulsoup(test,"lxml")     output=soup.get_text()     file=open("url%s.txt","w",encoding='utf-8')     file.close()     file.write(output) 

a couple of issues:

when try write loop dataframe following error:

'series' objects mutable, cannot hashed

i believe answer here convert tuple, did. makes immutable. not sure how refer different elements of tuple when writing loop.

the next step attempted use enumerate loop through tuple. getting following error this:

attributeerror: 'list' object has no attribute 'timeout'.

i believe means loop trying read tuple in entirety , not each element, not sure , cannot find answer on forums.

finally, not sure how reference elements of tuple when writing each file .txt. right have url%s url1, url2, etc.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -