Home

Awesome

Library Dataset: Image-text matching for large-scale book collections

The Library Dataset consists of 285 high resolution images of bookshelves with a total 7,536 books. We also provide two book catalogues for matching, the true library inventory (closed-set scenario) and a large-scale catalogue (open-set scenario).

One Shelf ExampleTwo Shelves Example

Dataset

Target Lists

The Large-Scale Catalogue must be downloaded from here.

Annotations

The main annotation for this dataset is the books that appear in each image. The annotations are stored in the following file:

Annotations - data/annotations/library_books_annotations.csv

Images

There are two sets of images, original, and split. The split images where created by cutting the original images to reduce size in order to fit in the OCR API size limits. Both sets of images can be downloaded from the following links:

Unzip the images in data/images/.

Demos

Check out the demos at demo/:

Book Identification DemoBook Search Demo