Awesome
:warning: This repository has been archived as now the inventaire server itself takes care of keeping Elasticsearch entities and wikidata indexes updated
Entities Search Engine
Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities (see entities map), and keep those up-to-date, to answer questions like "give me all the humans with a name starting by xxx" in a super snappy way, typically for the needs of an autocomplete field.
For the Wikidata-only version see the archived branch #wikidata-subset-search-engine
branch.
Summary
<!-- START doctoc generated TOC please keep comment here to allow auto update --> <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> <!-- END doctoc generated TOC please keep comment here to allow auto update -->Setup
see setup
Dependencies
see setup to install dependencies:
- NodeJs
>= v6.4
- ElasticSearch (this repo was developed targeting ElasticSearch
v2.4
, but it should work with newer version with some minimal changes) - Nginx
- Let's Encrypt
- already installed in any good nix system: curl, gzip
Start server
see Wikidata and Inventaire per-entity import
Data imports
from scratch
add
Wikidata entities
3 ways to import Wikidata entities data into your ElasticSearch instance
Inventaire entities
update
To update any entity, simply re-add it, typically by posting its URI (ex: 'wd:Q180736' for a Wikidata entity, or 'inv:9cf5fbb9affab552cd4fb77712970141' for an Inventaire one) to the server
remove
To un-index entities that were mistakenly added, pass the path of a results json file, supposedly made of an array of ids. All those ids' documents will be deleted
index=wikidata
type=humans
ids_json_array=./queries/results/mistakenly_added_wikidata_humans_ids.json
npm run delete-from-results $index $type $ids_json_array
index=entities-prod
type=works
ids_json_array=./queries/results/mistakenly_added_inventaire_works_ids.json
npm run delete-from-results $index $type $ids_json_array
importing dumps
You can import dumps from inventaire.io prod elasticsearch instance:
# Download Wikidata dump
wget -c https://dumps.inventaire.io/wd/elasticsearch/wikidata_data.json.gz
gzip -d wikidata_data.json.gz
# elasticdump should have been installed when running `npm install`
# --limit: increasing batches size
./node_modules/.bin/elasticdump --input=./wikidata_data.json --output=http://localhost:9200/wikidata --limit 2000
# Same for Inventaire
wget -c https://dumps.inventaire.io/inv/elasticsearch/entities_data.json.gz
gzip -d entities_data.json.gz
./node_modules/.bin/elasticdump --input=./entities_data.json --output=http://localhost:9200/entities --limit 2000
Query ElasticSearch
curl "http://localhost:9200/wikidata/humans/_search?q=Victor%20Hugo"
References
Donate
We are developing and maintaining tools to work with Wikidata from NodeJS, the browser, or simply the command line, with quality and ease of use at heart. Any donation will be interpreted as a "please keep going, your work is very much needed and awesome. PS: love". Donate
See Also
- wikidata-sdk: a javascript tool suite to query and work with wikidata data, heavily used by wikidata-cli
- wikidata-edit: Edit Wikidata from NodeJS
- wikidata-cli: The command-line interface to Wikidata
- wikidata-filter: A command-line tool to filter a Wikidata dump by claim
- wikidata-taxonomy: Command-line tool to extract taxonomies from Wikidata
- Other Wikidata external tools:
You may also like
Do you know inventaire.io? It's a web app to share books with your friends, built on top of Wikidata! And its libre software too.
License
AGPL-3.0