Awesome
<div align="center"> <!-- title --> <!--lint ignore no-dead-urls-->Awesome PHP Scrapers, Spiders and Crawlers
<!-- subtitle -->A collection of scrapers, spiders, crawlers, and related tools.
<!-- image --> <a href="https://github.com/spekulatius/awesome-php-scrapers-and-crawlers" target="_blank" rel="noopener noreferrer"> <img src="logo.png" /> </a> <!-- description -->A curated list of anything open-source in the PHP crawler and scraping space: Scrapers, Crawlers, Spiders, Tools and along with how to guides, articles, etc.
</div> <!-- TOC -->Contents
<!-- CONTENT -->Crawlers
- Spatie/Crawler - An easy to use, powerful crawler implemented in PHP. Can execute JavaScript. Toolkit available for those keen to use the full power of the Spatie crawler.
- crawlzone/crawlzone - Crawlzone is a fast asynchronous crawling framework.
- zrashwani/arachnid - SEO-focused crawler to collect link information, etc.
- nadar/crawler - A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.
Spiders
- mvdbos/PHP-Spider - A configurable and extensible PHP web spider. Various Examples available.
Scrapers
- spekulatius/PHPScraper - A simple way to scraper and crawl the web from PHP.
- roach-php/core - A complete PHP web-scraping toolkit inspired by Scrapy. Laravel adapter available.
Tools and Related Libraries
- spatie/robots-txt - Determine if a page may be crawled from robots.txt, robots meta tags and robot headers.
- symfony/dom-crawler - The DomCrawler component eases DOM navigation for HTML and XML documents.
- symfony/panther - A browser testing and web crawling library for PHP and Symfony.
Detection
- JayBizzle/Crawler-Detect - CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent.
- donatj/PhpUserAgent - Lightning Fast, Minimalist PHP User Agent String Parser.
- niespodd/browser-fingerprinting - Analysis of Bot Protection systems with available countermeasures.
HTML Handling: Serialization, Sanitization, etc
- Masterminds/html5-php - An HTML5 parser and serializer for PHP.
- symfony/html-sanitizer - Provides an object-oriented API to sanitize untrusted HTML input for safe insertion into a document's DOM.
Contributing
Contributions of any kind welcome, just follow the guidelines!