Home

Awesome

nashi (nasḫī)

Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's more: download now and get a complete webapp written in Python/Flask that handles import and export of your scanned pages to and from LAREX for semi-automatic layout analysis, does the line segmentation for you (via kraken) and saves your precious PageXML in a database. All you've got to do is follow the instructions below and help me implement all the missing features... OCR training and recognition is currently not included because of our webhost's limited capacity.

Instructions for nashi.html

The interface

Keyboard shortcuts in the text input area

Global keyboard shortcuts

Edit mode

Instructions for the server

pip install nashi
BOOKS_DIR = "/home/username/books/"
LAREX_DIR = "/home/username/larex_books/"

export DATABASE_URL="mysql+pymysql://user:pw@localhost/mydb?charset=utf8"
from nashi import user_datastore
from nashi.database import db_session, init_db
init_db()
user_datastore.create_user(email="me@myserver.de.vu", password="secret")
db_session.commit()
export NASHI_SETTINGS=/home/user/path/to/config.py
celery -A nashi.celery worker --loglevel=info
export FLASK_APP=nashi
export NASHI_SETTINGS=/home/user/path/to/config.py
flask run

Planned features