Awesome

Spider

A web crawler and scraper, building blocks for data curation workloads.

Concurrent
Streaming
Decentralization
Headless Chrome Rendering
HTTP Proxies
Cron Jobs
Subscriptions
Smart Mode
Anti-Bot mitigation
Privacy and Efficiency through Ad, Analytics, and Custom Tiered Network Blocking
Blacklisting, Whitelisting, and Budgeting Depth
Dynamic AI Prompt Scripting Headless with Step Caching
CSS/Xpath Scraping with spider_utils
HTML to markdown, text, and etc transformations with spider_transformations
Changelog

Getting Started

The simplest way to get started is to use the Spider Cloud hosted service. View the spider or spider_cli directory for local installations. You can also use spider with Node.js using spider-nodejs and Python using spider-py.

Benchmarks

See BENCHMARKS.

Examples

See EXAMPLES.

License

This project is licensed under the MIT license.

Contributing

See CONTRIBUTING.