python - How to divide up webscraping workload -
i have massive webscraping project (part 1 scraping 300k+ separate data entries website). may have more of these in future , 1 data entry @ time won't suffice. have been using selenium enter data js site, , beautifulsoup parse results. have looked @ selenium grid don't believe accomplish want because i'm not trying have every instance perform same function.
i take ~300k separate data entries , split them search, example, 8+ @ time.
is option @ point (in python) setup several vms , execute python script in each? current time finish scrape 30 hours.
Comments
Post a Comment