Home

Awesome

wordpress-madara-scraper

A bash script for scraping image focused madara wordpress in json.

Requirements

Installation

install -m 755 wordpress-madara-scraper /usr/bin

Json format

Here's example of comics.

Structure

There are two sets of options that define what will be downloaded, and are divided into those that download metadata and those thad download images.

Metadata

Is downloaded by -p, -c, -l, --full-comic and --full-pages. Files created by them are named by the md5 hash of their urls.

-p takes LINK argument and outputs a list of urls to comics. This might be used to get all of the comics from the website, category or an artist.

-c takes FILE argument from which it reads urls to comics and saves them in json files.

-l takes FILE argument from which it reads urls to chapters and saves the list of urls to their images to files.

--full-comic takes LINK argument and downloads comic and its chapters creating a directory for its chapters named with its name with '_' character at the end.

--full-pages takes LINK argument and downloads all comics from pages using --full-comic.

Example structure created by --full-pages:

0001c692d6cadaa3c692412bc0ac51fe
0001c692d6cadaa3c692412bc0ac51fe_/
    02c8e3f630d0cd48f13515f65a91fe3e
    0ba18e4d9db640693a8584b01983b451
    0df4a828f07137e21f585aa29375b223
008216d512f75bcb86e2a08c4df7ae8c
008216d512f75bcb86e2a08c4df7ae8c_/
    091bf018a3e41cb974c20be4901ba89a
    4e35d40ad644114a17e2995b30aa52fb

Images

These options are meant for consumption purposes only, and are just a practical simplification of Metadata. Files created by them are named by their names with / character translated to |.

--download-chapter takes LINK as argument and downloads the images of the chapter --download-comic takes LINK as argument and downloads the comic, its chapters and their images. --download-pages takes LINK as argument and downloads all comics from pages using --download-comic

Example structure created by --download-pages:

+99 Wooden stick manhwa
+99 Wooden stick manhwa_/
    Chapter 1/
        ch_0_1.jpg
        ch_0_2.jpg
        ch_0_3.jpg
    Chapter 89.5/
        45.webp
        46.webp
My School Life Pretending To Be a Worthless Person
My School Life Pretending To Be a Worthless Person_/
    Chapter 1/
        ch_0_1.jpg
        ch_0_2.jpg
        ch_0_3.jpg
    Chapter 59/
        13.webp
        14.webp
        15.webp

Tested sites

https://manhwatop.com/
https://www.nightcomic.com/
https://shibamanga.com/
https://topmanhua.com/

Usage

wordpress-madara-scraper [OPTIONS]...

Download the images of the chapter, comic, genre and the whole site

wordpress-madara-scraper --download-chapter 'https://manhwatop.com/manga/love-hug/chapter-233/'
wordpress-madara-scraper --download-comic 'https://manhwatop.com/manga/love-hug/'
wordpress-madara-scraper --download-pages 'https://manhwatop.com/manga-genre/magical-genre/'
wordpress-madara-scraper --download-pages 'https://manhwatop.com/'

Download the metadata of comic and the whole page

wordpress-madara-scraper --full-comic 'https://nightcomic.com/manga/versatile-mage/'
wordpress-madara-scraper --full-pages 'https://nightcomic.com/new/'

Download links to comics into FILE

wordpress-madara-scraper -p 'https://www.topmanhua.com' > FILE

Download comics from links in FILE using 4 threads into DIR, it will create json files named by md5 hash of their links

wordpress-madara-scraper -d DIR -t 4 -c FILE

Download images links from chapters in comics FILE into FILES named by md5 hash of their links

wordpress-madara-scraper -l FILE

Get some help

wordpress-madara-scraper -h