Awesome

DOR Services App

This Ruby application provides a REST and GraphQL API for DOR Services. There is a OAS 3.0 spec that documents the API in openapi.yml. You can browse the generated documentation at http://sul-dlss.github.io/dor-services-app/

Authentication

To generate an authentication token run RAILS_ENV=production bin/rails generate_token on the prod server. This will use the HMAC secret to sign the token. It will ask you to submit a value for "Account". This should be the name of the calling service, or a username if this is to be used by a specific individual. This value is used for traceability of errors and can be seen in the "Context" section of a Honeybadger error. For example:

{"invoked_by" => "workflow-service"}

GraphQL

DSA exposes a limited GraphQL API at the /graphql endpoint. The API is implemented using graphql-ruby. The purpose of the API is to allow retrieving only the parts of cocina objects that are needed, in particular, to avoid retrieving very large structural metadata.

It is limited in that:

It only supports querying, not mutations.
Only the first level of attributes (description, structural, etc.) are expressed in the GraphQL schema; the contents of each of these attributes are just typed as JSON.

Developer notes:

Most GraphQL code is in app/graphql.
In local development, the GraphiQL browser is available at http://localhost:3000/graphiql.

Developer Notes

DOR Services App is a Rails app.

Background Jobs

Dor Services App uses Sidekiq to process background jobs, which requires Redis. You can either install this locally, if running services locally, or run it via docker-compose. To spin up Sidekiq, run:

bundle exec sidekiq # use -d option to daemonize/run in the background

See the output of bundle exec sidekiq --help for more information.

Note that the application has a web UI for monitoring Sidekiq activity at /queues.

Running Tests

First, ensure the database container is spun up:

docker compose up db # use -d to daemonize/run in background

And if you haven't yet prepared the test database, run:

RAILS_ENV=test bundle exec rails db:test:prepare

To run the tests:

bundle exec rspec

To run rubocop:

bundle exec rubocop

Console and Development Server

Using Docker

First, you'll need both Docker and docker-compose installed.

Run dor-services-app and its dependencies using:

docker compose up -d

Update Docker image

docker build -t suldlss/dor-services-app:latest .
docker push suldlss/dor-services-app:latest

Without Docker

To spin up a local rails console:

bundle exec rails c

To spin up a local development server:

bundle exec rails s

Setup RabbitMQ

You must set up the durable rabbitmq queues that bind to the exchange where workflow messages are published.

RAILS_ENV=production bin/rake rabbitmq:setup

This is going to create queues for this application that bind to some topics.

RabbitMQ queue workers

In a development environment you can start sneakers this way:

WORKERS=CreateEventJob bin/rake sneakers:run

but on the production machines we use systemd to do the same:

sudo /usr/bin/systemctl start sneakers
sudo /usr/bin/systemctl stop sneakers
sudo /usr/bin/systemctl status sneakers

This is started automatically during a deploy via capistrano

Cron check-ins

Some cron jobs (configured via the whenever gem) are integrated with Honeybadger check-ins. These cron jobs will check-in with HB (via a curl request to an HB endpoint) whenever run. If a cron job does not check-in as expected, HB will alert.

Cron check-ins are configured in the following locations:

config/schedule.rb: This specifies which cron jobs check-in and what setting keys to use for the checkin key. See this file for more details.
config/settings.yml: Stubs out a check-in key for each cron job. Since we may not want to have a check-in for all environments, this stub key will be used and produce a null check-in.
config/settings/production.yml in shared_configs: This contains the actual check-in keys.
HB notification page: Check-ins are configured per project in HB. To configure a check-in, the cron schedule will be needed, which can be found with bundle exec whenever. After a check-in is created, the check-in key will be available. (If the URL is https://api.honeybadger.io/v1/check_in/rkIdpB then the check-in key will be rkIdpB).

Rolling (Re)Indexer

(This runs here so it has efficient access to the cocina for each object.)

This helps keep the index fresh by reindexing the oldest data. It is managed as a systemd service. To interact with it from your machine, you can use Capistrano:

$ cap ENV rolling_indexer:status
$ cap ENV rolling_indexer:start
$ cap ENV rolling_indexer:stop
$ cap ENV rolling_indexer:restart

Or if you're on a server that has the rolling_indexer capistrano role, use systemd commands:

$ sudo systemctl status rolling-index
$ sudo systemctl start rolling-index
$ sudo systemctl stop rolling-index
$ sudo systemctl restart rolling-index

NOTE 1: The rolling indexer is automatically restarted during deployments.

NOTE 2: The rolling indexer runs only on one node per environment. Conventionally, this is the -a node, but for production, it is dor-services-worker-prod-b.

NOTE 3: The rolling indexer logs to {capistrano_shared_dir}/log/rolling_indexer.log

Robots

DSA hosts robots that perform DSA actions. This replaces the previous pattern in which a common accessioning robot would invoke a DSA endpoint that would start a DSA job that would perform the action and then update the workflow status.

Robots are in jobs/robots/*. All DSA robots must be added to Workflow Server Rails' QueueService so that the workflow jobs are handled by DSA robots (instead of normal robots).

There also must be a sidekiq process to handle the DSA robot queues. For example:

:labels:
 - robot
:concurrency: 5
:queues:
  - [accessionWF_default_dsa, 2]
  - accessionWF_low_dsa

Other tools

Running Reports

There is information about how to run reports on the sdr-infra VM in the cocina-models README. This approach has two advantages:

sdr-infra connects to the DSA database as read-only
no resource competition with production DSA processing

Generating a list of druids from Solr query

$ bin/generate-druid-list 'is_governed_by_ssim:"info:fedora/druid:rp029yq2361"'

The results are written to druids.txt.

Removing deleted items from a list of druids

$ bin/clean-druid-list -h Usage: bin/clean-druid-list [options] -i, --input FILENAME File containing list of druids (instead of druids.txt). -o, --output FILENAME File to write list of druids (instead of druids.clean.txt). -h, --help Displays help.

Solr is used to determine if an item still exists.

Find druids missing from the SOLR index

Run the missing druid rake task:

RAILS_ENV=production bundle exec rake missing_druids:unindexed_objects

This produces a missing_druids.txt file in the application root.

Missing druids can be indexed with:

RAILS_ENV=production bundle exec rake missing_druids:index_unindexed_objects

Data migrations / bulk remediations

bin/migrate-cocina provides a framework for data migrations and bulk remediations. It supports optional versioning and publishing of objects after migration.

Usage: bin/migrate-cocina MIGRATION_CLASS [options]
        --mode [MODE]                Migration mode (dryrun, migrate, verify). Default is dryrun
    -p, --processes PROCESSES        Number of processes. Default is 4.
    -s, --sample SAMPLE              Sample size per type, otherwise all objects.
    -h, --help                       Displays help.

The process for performing a migration/remediation is:

Implement a Migrator (app/services/migrators/). See Migrators::Base and Migrators::Exemplar for the requirements of a Migrator class. Migrators should be unit tested.
Perform a dry run: bin/migrate-cocina Migrators::Exemplar --mode dryrun and inspect migrate-cocina.csv for any errors. This is a way to change the cocina and validate the new objects without saving the updated cocina or publishing or versioning.
Perform migration/remediation: bin/migrate-cocina Migrators::Exemplar --mode migrate and inspect migrate-cocina.csv for any errors.
Perform verification: bin/migrate-cocina Migrators::Exemplar --mode verify and inspect migrate-cocina.csv for any errors. (An error here means that an object matching .migrate? has been found ... which is presumably NOT desired after migration.)

Additional notes:

The dry run and the verification can be performed on sdr-infra. See the existing documentation on setting up db connections.
The migration/remediation must be performed on the DSA server since it requires a read/write DB connection. (sdr-infra has a read-only DB connection.)
Migrations are performed on an ActiveRecord object, not a Cocina object. This allows the remediation of invalid items (i.e., items that cannot be instantiated as Cocina objects).
Migrations can be performed against all items or just a list provided by the Migrator.
Breaking changes, especially breaking cocina model changes, are going to require additional steps, e.g., stopping SDR processing. The complete process is to be determined.

Reset Process (for QA/Stage)

Steps

Reset the database