python - Scrapy fails to crawl recursively when two rules are set -


i've tried upload can see in consolei've written script in scrapy crawl website recursively. reason it's not being able to. i've tested xpaths in sublime , working perfectly. so, @ point can't fix i've done wrong.

"items.py" includes:

import scrapy class craigpitem(scrapy.item):     name = scrapy.field()     grading = scrapy.field()     address = scrapy.field()     phone = scrapy.field()     website = scrapy.field() 

the spider named "craigsp.py" includes:

from scrapy.spiders import crawlspider, rule  scrapy.linkextractors import linkextractor  class craigspspider(crawlspider):     name = "craigsp"     allowed_domains = ["craigperler.com"]     start_urls = ['https://www.americangemsociety.org/en/find-a-jeweler']     rules=[rule(linkextractor(restrict_xpaths='//area')),                rule(linkextractor(restrict_xpaths='//a[@class="jeweler__link"]'),callback='parse_items')]          def parse_items(self, response):         page = response.xpath('//div[@class="page__content"]')         titles in page:             aa= titles.xpath('.//h1[@class="page__heading"]/text()').extract()             bb= titles.xpath('.//p[@class="appraiser__grading"]/strong/text()').extract()             cc = titles.xpath('.//p[@class="appraiser__hours"]/text()').extract()             dd = titles.xpath('.//p[@class="appraiser__phone"]/text()').extract()             ee = titles.xpath('.//p[@class="appraiser__website"]/a[@class="appraiser__link"]/@href').extract()             yield {'name':aa,'grading':bb,'address':cc,'phone':dd,'website':ee} 

the command i'm running is:

scrapy crawl craigsp -o items.csv 

hope lead me right direction.

filtered offsite request

this error means url queued scrapy not pass allowed_domains setting.

you have:

allowed_domains = ["craigperler.com"] 

and spider trying crawl http://ww.americangemsociety.org. either need add allowed_domains list or rid of setting entirely.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -