multithreading - locks needed for multithreaded python scraping? -

May 15, 2015

i have list of zipcodes want pull business listings using yelp fusion api. each zipcode have make @ least 1 api call ( more) , so, want able keep track of api usage daily limit 25000. have defined each zipcode instance of user defined locale class. locale class has class variable locale.pulls, acts global counter number of pulls.

i want multithread using multiprocessing module not sure if need use locks , if so, how so? concern race conditions need sure each thread sees current number of pulls defined zip.pulls class variable in pseudo code below.

import multiprocessing.dummy mt    class locale():     pulls = 0     max_pulls = 20000      def __init__(self,x,y):         #initialize instance arguments needed complete api call        def pull(self):         if locale.pulls > max_pulls:              return none         else:              # make request, store returned data , increment counter             self.data = self.call_yelp()              locale.pulls += 1   def main():     #zipcodes below list of arguments needed initialize each zipcode locale class object     pool = mt.pool(len(zipcodes)/100) # let each thread work on 100 zipcodes     data = pool.map(locale, zipcodes)

a simple solution check len(zipcodes) < map_pulls before running map().

Search This Blog

MOno

multithreading - locks needed for multithreaded python scraping? -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

python - ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> -

java - How to implement an entity bound odata action in olingo v4.3 -