Awesome
Blacklight IIIF Search
A plugin that provides IIIF Content Search functionality for Blacklight-based applications.
IIIF Content Search is an API specification for searching the full text of a resource that is described by a IIIF Presentation API manifest.
When installed, this plugin provides an endpoint in your Blacklight app that will return a JSON response conforming to IIIF Content Search API v. 1.0.
By integrating the URL for this service into your IIIF Presentation manifests, clients/viewers that support the IIIF Content Search API (such as the Universal Viewer) will be able to provide functionality for searching within a resource and displaying results.
Prerequisites
Currently highly opinionated towards Solr as the back-end search index.
This plugin assumes:
- You have a working Blacklight application
- You have items with full text (e.g. scanned books, newspapers, etc.) in your Solr index
- The text for these items is indexed in Solr
- Each page has its own Solr record, with the corresponding text in a discrete field
- the text field must be indexed
- if search term highlighting is desired, the text field must be indexed and stored
- The relationship between page records and their parent book/volume/issue/etc records is indexed in Solr
Blacklight/Solr Version Compatibility:
blacklight_iiif_search version | works with Blacklight | works with Solr |
---|---|---|
2.0 | ~> 7.0 | 7.* |
1.0 | >= 6.3.0 to < 7.* | 7.* |
Installation
Add Blacklight IIIF Search to your Gemfile
:
gem 'blacklight_iiif_search'
Run the install generator, which will copy over some initial templates, routes, and configuration:
$ rails generate blacklight_iiif_search:install
The generator:
- Adds some configuration settings to
app/controller/catalog_controller.rb
- Adds the
IiifSearchBuilder
class toapp/models
- Adds routing to
config/routes.rb
- Injects some configuration into
solr/conf/schema.xml
andsolr/conf/solrconfig.xml
to support contextual autocomplete (To skip the Solr changes, run the install command withskip-solr
flag.)
After install, you'll probably need to adjust the iiif_search
settings in CatalogController
:
Config option | Description |
---|---|
full_text_field | The Solr field where the OCR text is indexed/stored. |
object_relation_field | The Solr field where the parent/child relationship is stored. |
supported_params | An array of IIIF Content Search query parameters supported by the search service. (Note: motivation , date , and user are not currently supported.) |
autocomplete_handler | The value of the @name attribute for the Solr <requestHandler name="/#{autocomplete_handler}"> in solrconfig.xml that handles autocomplete suggestions. |
suggester_name | The value of the <str name="name">#{suggester_name}</str> element for the Solr <searchComponent> in solrconfig.xml that handles autocomplete suggestions. |
See below for additional customization options.
Basic Usage
The search service will be available at:
http://host:port/catalog/:id/iiif_search
There is a solr_document_iiif_search
route helper that can be called to construct a path or URL to the search service in your app. For example:
solr_document_iiif_search_url('abcd1234', {q: 'blacklight'})
Would return:
http://host:port/catalog/abcd1234/iiif_search?q=blacklight
The autocomplete service will be available at:
http://host:port/catalog/:id/iiif_suggest
There is a solr_document_iiif_suggest
route helper that can be called to construct a path or URL to the autocomplete service in your app. For example:
solr_document_iiif_suggest_url('abcd1234', {q: 'blacklight'})
Would return:
http://host:port/catalog/abcd1234/iiif_suggest?q=blacklight
Implementation
In order to successfully deploy this plugin, you'll most likely need to customize a few things to match how your Solr index and/or repository are set up.
Parent/child relationship
The plugin needs to construct Solr query parameters such that only records that represent children/members (e.g. pages) of the parent work are returned. The out-of-the-box default is:
{is_page_of_ssi: 'parent_id'}
Where parent_id
is the identifier of the parent object. The above assumes that each page record has an indexed is_page_of_ssi
field that indicates its parent.
To customize the construction of the parent/child object relationship Solr parameters (beyond the name of the field, which can be set in the CatalogController
config), create a local copy of the BlacklightIiifSearch::IiifSearchBehavior
module in app/models/concerns/blacklight_iiif_search/iiif_search_behavior.rb
and override the #object_relation_solr_params
method.
Default search settings
A IiifSearchBuilder
class will be available in your app's app/models
directory, and can be customized as needed, especially with regards to Solr's highlighting settings.
URI constructors
As part of the JSON response, the plugin needs to construct URIs for IIIF Annotation objects representing the search hits, and IIIF Canvas objects corresponding to the pages of the item.
To customize these URIs (including the addition of word/image coordinates to facilitate hit highlighting in a viewer), create a local copy of the BlacklightIiifSearch::IiifSearchAnnotationBehavior
module in app/models/concerns/blacklight_iiif_search/iiif_search_annotation_behavior.rb
and override the appropriate methods.
Important notes:
- The base URI returned by
#canvas_uri_for_annotation
must match the@id
value of the corresponding Canvas in your IIIF manifest. - The URI returned by
#canvas_uri_for_annotation
must additionally end with the#xywh=Integer,Integer,Integer,Integer
syntax in order to work with the Universal Viewer.
Linking to the Search service from your IIIF manifest
In order for a viewer application to be aware of the search service, you need to include the following in your IIIF manifest:
"service": {
"@context": "http://iiif.io/api/search/0/context.json",
"@id": "http://host:port/catalog/:id/iiif_search",
"profile": "http://iiif.io/api/search/0/search",
"label": "Search within this item"
}
The value of @id
should be replaced with the link to the search service for the item. The text of label
can be whatever you want.
To make a viewer aware of the autocomplete service, include the following in your IIIF manifest:
"service": {
"@context": "http://iiif.io/api/search/0/context.json",
"@id": "http://host:port/catalog/:id/iiif_search",
"profile": "http://iiif.io/api/search/0/search",
"label": "Search within this item",
"service": {
"@id": "http://example.org/services/identifier/autocomplete",
"profile": "http://iiif.io/api/search/0/autocomplete"
}
}
Important note: Although the current version (as of June 2018) of the Content Search API is http://iiif.io/api/search/1.0
, the Universal Viewer will NOT automatically recognize the search service unless the @context
and profile
URIs use http://iiif.io/api/search/0/
as the base.
Configuring Solr for contextual autocomplete
Solr >=5.4 provides the ability to do contextual autocomplete queries that can be filtered/limited by a contextField
configured in the autocomplete <searchComponent>
in solrconfig.xml.
For IIIF Content Search autocomplete behavior, we want to limit the suggestions to terms that appear in pages that are children of the parent object. The contextField
should be the same as the object_relation_field
defined in the CatalogController
configuration.
This is best set up as a separate <searchComponent>
from any existing autocomplete/suggest functionality that may already be defined in your Solr configuration. The install generator will create a new <searchComponent>
in solrconfig.xml and several field definitions in the schema.xml file to support the autocomplete behavior. You may need to customize these settings for your implementation.
You also need to add the tokenizing-suggest-v1.0.1.jar
library to your Solr install's contrib
directory. This library is needed so that Solr will return single terms for autocomplete queries, rather than the entire full text field.
Note: It's often helpful to test Solr directly to make sure autocomplete is working properly, this can be done like so:
http://host:port/solr/[core_name]/iiif_suggest?wt=json&suggest.cfq=[parent_identifier]&q=[query_term]
Test Drive
After cloning the repository, and running bundle install
:
- Generate the test application at
.internal_test_app
:
$ rake engine_cart:generate
- Start up Solr (run from a new terminal window):
$ solr_wrapper
This will throw an error, since the Solr config will look for a library that doesn't exist yet.
3. Copy the tokenizing-suggest-v1.0.1.jar
library to Solr's contrib
directory:
$ cp ./lib/generators/blacklight_iiif_search/templates/solr/lib/tokenizing-suggest-v1.0.1.jar /path/to/solr/contrib
- Start up Solr again (run from same new terminal window):
$ solr_wrapper
- Index sample documents into Solr (run from
./.internal_test_app
):
$ RAILS_ENV=test rake blacklight_iiif_search:index:seed
- Start up the Rails server (run from
./.internal_test_app
):
$ rails s
- In a browser, go to:
http://127.0.0.1:3000
. You should see the default Blacklight home page. - Test a sample search:
http://127.0.0.1:3000/catalog/7s75dn48d/iiif_search?q=sugar
- Test a sample autocomplete request:
http://127.0.0.1:3000/catalog/7s75dn48d/iiif_suggest?q=be
To see how search snippets work, change the value of the full_text_field
config to alternative_title_tsim
in ./.internal_test_app/app/controllers/catalog_controller.rb
, and restart the Rails server.
Development
After cloning the repository, and running bundle install
, run rake ci
from the project's root directory, which will:
- Generate the test application at
.internal_test_app
- Run
Blacklight
andBlacklightIiifSearch
generators - Start Solr and index the sample Solr docs from
spec/fixtures
- (Note: The Solr config is created by Blackight's installer, and is generated into
.internal_test_app/solr/conf
.)
- (Note: The Solr config is created by Blackight's installer, and is generated into
- Run all specs
Credits
This project was developed as part of the Newspapers in Samvera grant. Thanks to the Institute of Museum and Library Services for their support.
Inspiration for this code was drawn from Stanford University Digital Library's content_search and NCSU Libraries' ocracoke.
Special thanks to Chris Beer and Stanford University Digital Library for the use of the tokenizing-suggest-v1.0.1.jar
library.