Home

Awesome

spider-py

The spider project ported to Python.

Getting Started

  1. pip install spider_rs
import asyncio

from spider_rs import Website

async def main():
    website = Website("https://choosealicense.com")
    website.crawl()
    print(website.get_links())

asyncio.run(main())

View the examples to learn more.

Development

Install maturin pipx install maturin and python.

  1. maturin develop

Benchmarks

View the benchmarks to see a breakdown between libs and platforms.

Test url: https://espn.com

librariespagesspeed
spider(rust): crawl150,3871m
spider(nodejs): crawl150,387153s
spider(python): crawl150,387186s
scrapy(python): crawl49,5981h
crawlee(nodejs): crawl18,77930m

The benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.

Issues

Please submit a Github issue for any issues found.