Home

Awesome

image_crawler v1.0.4

This is an image crawler in pure shell script, which could download the images (actually the image URLs on the web) once given a keyword. It fakes to visit Google Image/Baidu Image, using the keywords you provide to perform like a human search, and then parse and record the results(image urls) returned by Google or Baidu. Once you have the image urls, you could download the real images using any script languages you like. This tool could also be used to compare the search quality and relevance between Google Image and Baidu Image.

What's NEW?

TODO:

How to use it?

Performance

I've tested this script with 10 keywords (just as in the query_list.txt), each keyword crawling 300 results using Google.<br/> Results are as follows:<br/> [unix14 ~/imagecrawler]$ time ./image_crawler.sh google 300 <br/> real 0m5.766s user 0m2.425s sys 0m2.254s

[unix14 ~/imagecrawler]$ time ./image_crawler.sh baidu 300 <br/> real 0m11.419s user 0m1.254s sys 0m1.044s

The result is not bad, and in the future I'll tweak it into a more concurrent version. <br/>

Note:

It works in any platform that supports bash, egrep, awk, python, wget | curl. So, Ubuntu, MacOS, etc.