Scrapy | A Fast and Powerful Scraping and Web Crawling Framework
An open source and collaborative framework for extracting the data you need from websites.
In a fast, simple, yet extensible way.
Maintained by Zyte and many other contributors
Install the latest version of Scrapy
Scrapy 2.12.0
pip install scrapy
Terminal•
pip install scrapy cat > myspider.py <<EOFEOF scrapy runspider myspider.py import scrapy class BlogSpider(scrapy.Spider): name = 'blogspider' start_urls = ['https://www.zyte.com/blog/'] def parse(self, response): for title in response.css('.oxy-post-title'): yield {'title': title.css('::text').get()} for next_page in response.css('a.next'): yield response.follow(next_page, self.parse)
Build and run your
web spiders
Terminal•
pip install shub shub login Insert your Zyte Scrapy Cloud API Key: <API_KEY> # Deploy the spider to Zyte Scrapy Cloud shub deploy # Schedule the spider for execution shub schedule blogspider Spider blogspider scheduled, watch it running here: https://app.zyte.com/p/26731/job/1/8 # Retrieve the scraped data shub items 26731/1/8{"title": "Improved Frontera: Web Crawling at Scale with Python 3 Support"} {"title": "How to Crawl the Web Politely with Scrapy"} ...
Fast and powerful
write the rules to extract the data and let Scrapy do the rest
Easily extensible
extensible by design, plug new functionality easily without having to touch the core
Portable, Python
written in Python and runs on Linux, Windows, Mac and BSD
Healthy community
- - 43,100 stars, 9,600 forks and 1,800 watchers on GitHub
- - 5.500 followers on Twitter
- - 18,000 questions on StackOverflow