python - Scrapy fails to crawl recursively when two rules are set -

May 15, 2014

i've written script in scrapy crawl website recursively. reason it's not being able to. i've tested xpaths in sublime , working perfectly. so, @ point can't fix i've done wrong.

"items.py" includes:

import scrapy class craigpitem(scrapy.item):     name = scrapy.field()     grading = scrapy.field()     address = scrapy.field()     phone = scrapy.field()     website = scrapy.field()

the spider named "craigsp.py" includes:

from scrapy.spiders import crawlspider, rule  scrapy.linkextractors import linkextractor  class craigspspider(crawlspider):     name = "craigsp"     allowed_domains = ["craigperler.com"]     start_urls = ['https://www.americangemsociety.org/en/find-a-jeweler']     rules=[rule(linkextractor(restrict_xpaths='//area')),                rule(linkextractor(restrict_xpaths='//a[@class="jeweler__link"]'),callback='parse_items')]          def parse_items(self, response):         page = response.xpath('//div[@class="page__content"]')         titles in page:             aa= titles.xpath('.//h1[@class="page__heading"]/text()').extract()             bb= titles.xpath('.//p[@class="appraiser__grading"]/strong/text()').extract()             cc = titles.xpath('.//p[@class="appraiser__hours"]/text()').extract()             dd = titles.xpath('.//p[@class="appraiser__phone"]/text()').extract()             ee = titles.xpath('.//p[@class="appraiser__website"]/a[@class="appraiser__link"]/@href').extract()             yield {'name':aa,'grading':bb,'address':cc,'phone':dd,'website':ee}

the command i'm running is:

scrapy crawl craigsp -o items.csv

hope lead me right direction.

filtered offsite request

this error means url queued scrapy not pass allowed_domains setting.

you have:

allowed_domains = ["craigperler.com"]

and spider trying crawl http://ww.americangemsociety.org. either need add allowed_domains list or rid of setting entirely.

Search This Blog

MOno

python - Scrapy fails to crawl recursively when two rules are set -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -

How to understand 2 main() functions after using uftrace to profile the C++ program? -