Tag: 网刮

连接拒绝错误61:Scrapy +飞溅docker

我遇到了一些问题,刮JavaScript网站。 我正在使用docker的scrapy-splash来渲染js到html。 import scrapy from scrapy_splash import SplashRequest class MySpider (scrapy.Spider): name = 'spd' start_urls = ['http://example.com'] def start_requests (self): for url in self.start_urls: yield SplashRequest(url, self.parse, endpoint='render.html', args={'wait':0.5},) def parse (self, response): for href in response.xpath('xpath'): yield {'info': href.xpath('xpath')} 这是我的terminal输出: 2017-05-30 13:20:51 [scrapy.core.engine] INFO: Spider opened 2017-05-30 13:20:51 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at […]