Home

Awesome

Norconex Crawlers

Norconex web and filesystem crawlers are full-featured crawlers (or spider) that can manipulate and store collected data in a repository of your choice (e.g., a search engine). They are very flexible, powerful, easy to extend, and portable. They can be used command-line with file-based configuration on any OS or embedded into Java applications using well-documented APIs.

Visit the website for binary downloads and documentation: https://opensource.norconex.com/crawlers/

Are you on the right branch?

This branch holds version 4 code, which is still in development.

For the latest stable release of Norconex Web Crawler, use the version 3 branch.

UPCOMING: Crawler V4 Stack

As of Feb 24, 2024, the default main branch holds code for the upcoming version 4 crawler stack. It is now a mono-repo containing all Norconex crawler-related projects previously maintained in separate repos. All projects in this mono report will now be released simultaneously and share the same version number.

Until v4 is officially released, this branch should not be considered stable.

Projects

Java CI with Maven

FolderArtifact IdBuild
crawler/core/nx-crawler-core testQuality Gate Status
crawler/fs/nx-crawler-fsQuality Gate Status
crawler/web/nx-crawler-webQuality Gate Status
importer/nx-importerQuality Gate Status
committer/amazoncloudsearch/nx-committer-amazoncloudsearchQuality Gate Status
committer/apachekafka/nx-committer-apachekafkaQuality Gate Status
committer/azurecognitivesearch/nx-committer-azurecognitivesearchQuality Gate Status
committer/core/nx-committer-coreQuality Gate Status
committer/idol/nx-committer-idolQuality Gate Status
committer/elasticsearch/nx-committer-elasticsearchQuality Gate Status
committer/neo4j/nx-committer-neo4jQuality Gate Status
committer/solr/nx-committer-solrQuality Gate Status
committer/sql/nx-committer-sqlQuality Gate Status

All projects in this repository share the same Maven group id:

com.norconex.crawler