python - How to divide up webscraping workload -

June 15, 2015

i have massive webscraping project (part 1 scraping 300k+ separate data entries website). may have more of these in future , 1 data entry @ time won't suffice. have been using selenium enter data js site, , beautifulsoup parse results. have looked @ selenium grid don't believe accomplish want because i'm not trying have every instance perform same function.

i take ~300k separate data entries , split them search, example, 8+ @ time.

is option @ point (in python) setup several vms , execute python script in each? current time finish scrape 30 hours.

Search This Blog

MOno

python - How to divide up webscraping workload -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

python - ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> -

java - How to implement an entity bound odata action in olingo v4.3 -