Home

Awesome

Build Status codecov Code Health License Python

----
Version1.0.1
WebSitehttp://119.23.223.90:8000
Sourcehttps://github.com/JustForFunnnn/webspider
KeywordsPython3, Tornado, Celery, Requests

Introduction

This project crawls job&company data from job-seeking websites, cleans the data, modelizes, converts, and stores it in the database. then use Echarts and Bootstrap to build a front-end page to display the IT job statistics, to show the newest requirements and trends of the IT job market.

Demo

You can input the keyword you are interested in into the search box, such as "Python", then click the search button, and the statistics of this keyword will show.

and we also got charts:

Python Charts Example:

Alt text

Quick Start

This tutorial is based on Linux - Ubuntu, for other systems, please find the corresponding command

git clone git@github.com:JustForFunnnn/webspider.git
# install Redis
apt-get install redis-server

# run Redis in background
nohup redis-server &

# install Python3
apt-get install python3

# install MySQL
apt-get install mysql-server

# start MySQL
sudo service mysql start
# create database
CREATE DATABASE `spider` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

We still need to create the tables, copy the table definition SQL from tests/schema.sql and run it in MySQL

# after a successful build, some executable jobs will be generated under the path env/bin 
make
make test
make flake8
env/bin/web
# run task scheduler/dispatcher
env/bin/celery_beat
# run celery worker for job data
env/bin/celery_lg_jobs_data_worker
# run celery worker for job count
env/bin/celery_lg_jobs_count_worker
# start crawl job count immediately
env/bin/crawl_lg_jobs_count
# start crawl job data immediately
env/bin/crawl_lg_data
# start celery monitoring
env/bin/celery_flower
# clean the existing build result
make clean