docker飞溅与Scrapy不工作

我想刮一个网站已启用JavaScript,因为scrapy不能处理JavaScript我正在使用飞溅来处理JavaScript的scrapy和使用scrapy-splash插件

我已经安装了与Docker与我使用Ubuntu 16.04这些命令的飞溅

$ sudo docker pull scrapinghub/splash $ sudo docker run -p 8050:8050 scrapinghub/splash 

我有运行的飞溅docker,就像一切似乎没事但是

在处理scrapy错误时,会抛出此错误

 2017-07-20 03:03:23+0000 [-] Log opened. 2017-07-20 03:03:23.870491 [-] Splash version: 3.0 2017-07-20 03:03:24.007457 [-] Qt 5.9.1, PyQt 5.9, WebKit 602.1, sip 4.19.3, Twisted 16.1.1, Lua 5.2 2017-07-20 03:03:24.007614 [-] Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] 2017-07-20 03:03:24.007746 [-] Open files limit: 65536 2017-07-20 03:03:24.007879 [-] Can't bump open files limit 2017-07-20 03:03:24.291391 [-] Xvfb is started: ['Xvfb', ':911054901', '-screen', '0', '1024x768x24', '-nolisten', 'tcp'] QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root' 2017-07-20 03:03:43.425858 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles 2017-07-20 03:04:09.534239 [-] verbosity=1 2017-07-20 03:04:09.534387 [-] slots=50 2017-07-20 03:04:09.534499 [-] argument_cache_max_entries=500 2017-07-20 03:04:09.534974 [-] Web UI: enabled, Lua: enabled (sandbox: enabled) 2017-07-20 03:04:09.535774 [-] Site starting on 8050 2017-07-20 03:04:09.535904 [-] Starting factory <twisted.web.server.Site object at 0x7f0e78e18d30> libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile process 1: D-Bus library appears to be incorrectly set up; failed to read machine uuid: UUID file '/etc/machine-id' should contain a hex string of length 32, not length 0, with no other text **See the manual page for dbus-uuidgen to correct this issue. qt.network.ssl: QSslSocket: cannot resolve SSLv2_client_method qt.network.ssl: QSslSocket: cannot resolve SSLv2_server_method** 

我认为这部分可能是问题

 qt.network.ssl: QSslSocket: cannot resolve SSLv2_client_method qt.network.ssl: QSslSocket: cannot resolve SSLv2_server_method 

而且该网站是一个HTTPS网站

在破碎

我已经在scrapy中像这样镶嵌了scrapy-splash

 from scrapy_splash import SplashRequest 

我正在提出这样的要求

 yield SplashRequest(link, meta={'item': item}, callback=self.parse_data) 

代替

 yield scrapy.Request(link, meta={'item': item}, callback=self.parse_data) 

但像往常一样,splash不处理请求

我在这里做错了什么? Ubuntu有什么问题吗?

SCRAPY DEBUG OUTPUT

 crawl sofaspider -o out.csv 2017-07-20 13:03:40 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: sofa) 2017-07-20 13:03:40 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'sofa.spiders', 'FEED_URI': 'out.csv', 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'SPIDER_MODULES': ['sofa.spiders'], 'BOT_NAME': 'sofa', 'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36', 'FEED_FORMAT': 'csv'} 2017-07-20 13:03:40 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats'] 2017-07-20 13:03:40 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy_splash.SplashCookiesMiddleware', 'scrapy_splash.SplashMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-07-20 13:03:40 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy_splash.SplashDeduplicateArgsMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2017-07-20 13:03:40 [scrapy.middleware] INFO: Enabled item pipelines: [] 2017-07-20 13:03:40 [scrapy.core.engine] INFO: Spider opened 2017-07-20 13:03:40 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-07-20 13:03:40 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-07-20 13:03:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.raymourflanigan.com/Sofas.aspx> (referer: None) 2017-07-20 13:04:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/willoughby-sofa-200326456.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/union-square-sofa-200223105.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/castin-microfiber-sofa-200278403.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/toby-microfiber-leather-look-reclining-sofa-200217215.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/bryant-II-leather-power-reclining-sofa-217282538.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/crosby-sofa-with-chaise-200235097.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/anastasia-sofa-200209167.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/stylus-power-reclining-sofa-202239352.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:40 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min) 2017-07-20 13:04:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/cordelia-sofa-200211201.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/ellington-leather-power-reclining-sofa-202291427.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/delano-power-reclining-sofa-200212520.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/quincey-power-reclining-sofa-200215627.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/corliss-sofa-200331104.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/skye-microfiber-power-reclining-sofa-200320074.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/mckinley-sofa-200211302.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out 2017-07-20 13:04:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.raymourflanigan.com/diana-sofa-200345115.aspx via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out