Home

Awesome

<div align="center"> <p> <a href="#"><img src="images\image.png" width="300" height="300" alt="overview image" /></a> </p> </div> <p align="center"> <b>Alibaba-CLI-Scraper</b> </p> <p align="center"> 🛒-💻- 🕸 </p>
<p align="center"> <b> Create your own Alibaba dataset and interact with it in plain English. </b> </p> <div align="center">

PyPI - Version PyPI - Downloads GitHub Release Date

</div> <div align="center">

PyPI - Python Version GitHub License

</div> <div align="center">

Codacy Badge pre-commit Codacy Badge

</div> <div align="center">

https://github.com/user-attachments/assets/10f2ff2b-eef7-4b61-b317-89cf22252f1e

</div>

Run commands in one click.

No need to RTFM (I means this Readme) again, run the text-mode to know the purspose of each commands and theirs related options and parameters.

S/o Trogon. An amazing tool to easily turn your Click CLI application into a powerful TUI application.

<div align="center">

https://github.com/user-attachments/assets/fbab45ff-b46c-4021-b481-5d74eadc1813

</div>

Chat with your scraped data in plain english to generate and visualize plot to make decisions based on real-time data.

<div align="center">

https://github.com/user-attachments/assets/5beb4efb-f9b0-4dad-b0d6-a9771db6c61c

</div>

About

Alibaba-CLI-Scraper is a python CLI tool designed to scrape, save and interact in plain english with data from Alibaba.com. Based on user, some products data and theirs related suppliers data will be extracted and saved it in a local database (SQLite or MySQL) and then will be ready to be analysed and even visualized through a powefull ai-agent powered by data-horse. It's also be designed to be user-friendly and therefore has fairly simple and easy-to-use commands to navigate through all the features of this tool.

Table of Contents

Features:

Which important informations will be retrieved from the Alibaba website ?

Fields related to Suppliers:

`id`: int

`name`: str

`verification_mode`: str

`sopi_level`: int

`country_name`: str

`years_as_gold_supplier`: int

`supplier_service_score`: float

Fields related to Products:

`id`: int

`name`: str

`alibaba_guranteed`: bool

`certifications`: str

`minimum_to_order`: int

`ordered_or_sold`: int

`supplier_id`: int

`min_price`: float

`max_price`: float

`product_score`: float

`review_count` : float

`review_score` : float

`shipping_time_score` : float

`is_full_promotion`: bool

`is_customizable`: bool

`is_instant_order`: bool

`trade_product`:bool

Sample of CSV output

When you will run command to export your sqlite file as a csv a OUTER FULL JOIN operation will be made to join all the fields of the both tables. Bellow you have a sample results maching agricultural machinery keywords.

idnamealibaba_guranteedminimum_to_ordersupplier_idalibaba_guranteedcertificationsordered_or_soldproduct_scorereview_countreview_scoreshipping_time_scoreis_full_promotionis_customizableis_instant_ordertrade_productmin_pricemax_pricenameverification_modesopi_levelcountry_nameyears_as_gold_suppliersupplier_service_score
1mesh knitting weaving machine produce sunscreen net agricultural shade net anti net111105.01.05.05.011119997.018979.0qingdao shanzhong imp and exp ltd.unverified0chine95.0
2chinese small farm rotary tiller 12hp 15hp 20hp two wheel mini hand tractor walk behind tractors112100.00.00.00.01111455.0455.0shandong guoyoule agricultural machinery co., ltd.unverified0chine10.0
3small multifunctional flexible 130l orchard remote control garden crawler agriculture robot sprayer113100.00.00.00.011112350.04620.0shandong my agricultural facilities co., ltd.unverified0chine10.0
45hp/7hp/12hp rotary electric start agricultural farming walking tractor power tiller weeder cultivators114120.00.00.00.01111244.0371.0shandong jinlong lutai international trade co., ltd.verified0chine10.0
5free shipping 3.5 ton mini excavator 1 ton 2 ton kubota engine digger excavator mini pelle chinese cheap small excavator machine1151CE954.625.04.64.61111988.01235.0shandong qilu industrial co., ltd.unverified5chine44.6

Prerequisites

Installation

It's recommended to use pipx instead of pip for end-user applications written in Python. pipx installs the package, exposes his CLI() entrypoints in an isolated environment and makes it available everywhere in your system. This guarantees no dependency conflicts and clean uninstall. let's install aba-cli-scrapper using pipx:

If you'd like to use pip instead, just replace pipx with pip but obviously as usual you'll need to create a virtual environment and activate it before to use aba-cli-scrapper to avoid any dependency conflicts issues. let's install aba-cli-scrapper using pip:

Using the CLI Interface

Need Help? run any commands followed by --help for detailed informations about its usage and options. For example: aba-run --help will show you all subcommands available and how to use them.

<div align="center"> <p> <a href="#"><img src="images\aba-run-help-image.png" width="700" height="340" alt="aba-run help image" /></a> </p> </div>

Warnings:


Important Informations

Available Sub-Commands

Scraper and syphoon-scraper subcommands
<div align="center"> <p> <a href="#"><img src="images\syphoon-scraper-help-options.png" width="700" height="340" alt="aba-run help image" /></a> </p> </div>

Based on which proxy provider you like, you need to choose between two sub-commands.

How to set My API KEY ?
    aba-run set-api-key your_proxy_provider_name

replace your_proxy_provider_name with syphoon or brightdata based on your choice. after that, a message will appear waiting for you to set your api key.

Both of the above sub-commands have the same options but i will use syphoon-scraper sub-command as an example.

aba-run syphoon-scraper "electric bikes" -hf "bike_results" -pr 15  --sync-api/-sa

and voila!

Now bike_results (since you already provided name you wish to have) directory has been created and should contains all html files from alibaba.com matching your keywords.

db-init Sub-Command
<div align="center"> <p> <a href="#"><img src="images\db-init-help-options.png" width="700" height="340" alt="aba-run help image" /></a> </p> </div>

NB: --host and --port are respectively set to localhost and 3306 by default. Also When you initialize your database with Mysql Engine for the first time, you must to set --user, --password and --db-name arguments. this will create a db_credentials.json file in your current directory with your credentials. Prevent you to set it again next time. Thus you will be able to set just import field when the time will come to update your database.

MySQL Use case:

aba-run db-init mysql -u "mysql_username" -pw "mysql_password" -db "alibaba_products"

SQLite Use case :

aba-run db-init sqlite --sqlite-file alibaba_data

db-init subcommand will try to use sqlite engine by default so if you are planning to use it run as bellow :

SQLite Use case V2 :

aba-run db-init -f alibaba_data

As soons as your database has been initialized, you can update it with the scraped data.


db-update Sub-Command
<div align="center"> <p> <a href="#"><img src="images\db-update-help.png" width="700" height="340" alt="aba-run help image" /></a> </p> </div>

this command takes two required arguments and two optional arguments:

MySQL Use case:

command bellow assuming that you already have your database credentials in db_credentials.json file to autocomplete required parameter. if not this will raise an error.

  aba-run db-update  mysql --kw-results bike_results\

NB: What if you want to change something while you updating the database? Assuming that you have run another scraping command and you want to save this data in another database name whitout update credential file or rewriting all theses parameter just to change your database name then, simply run:

  aba-run db-update mysql --kw-results another_keyword_folder_result\ --db-name "another_database_name

SQLite Use case:

aba-run db-update  sqlite --kw-results bike_results\ --filename alibaba_data

export-as-csv Sub-Command
<div align="center"> <p> <a href="#"><img src="images\export-as-csv-help.png" width="700" height="340" alt="aba-run help image" /></a> </p> </div>

this command takes one required argument and one optional argument:

ai-agent Sub-Command
<div align="center"> <p> <a href="#"><img src="images\ai-agent-help.png" width="700" height="340" alt="aba-run help image" /></a> </p> </div>

The purpose of this command is to provide a way to interact with your scraped data in plain english.

this command takes one required argument and one optional argument:

Contributions Welcome!

I believe in the power of open source! If you'd like to contribute to this project, feel free to fork the repository, make your changes, and submit a pull request. I'm always open to new ideas and improvements.

License

This project is licensed under the MIT License.