Home

Awesome

perl-DataLoader Build Status Coverage Status

DataLoader - automatically batch and cache repeated data loads

Synopsis

use DataLoader;
my $user_loader = DataLoader->new(sub {
   my @user_ids = @_;
   return getUsers(@user_ids);  # a Mojo::Promise
});

# Now fetch your data whenever (asynchronously)
my $data = Mojo::Promise->all(
   $loader->load(1),
   $loader->load(2),
   $loader->load(2),
);

# getUsers is called only once - with (1,2)

Description

DataLoader is a generic utility to be used as part of your application's data fetching layer. It provides a consistent API over various backends and reduces requests to those backends via automatic batching and caching of data.

It is primarily useful for GraphQL APIs where each resolver independently requests the object(s) it wants, then this loader can ensure requests are batched together and not repeated multiple times.

It is a port of the JavaScript version available at https://github.com/graphql/dataloader.

Batching

To get started, create a batch loading function that maps a list of keys (typically strings/integers) to a Mojo::Promise that returns a list of values.

my $user_loader = DataLoader->new(\&myBatchGetUsers);

Then load individual values from the loader. All individual loads that occur within a single tick of the event loop will be batched together.

$user_loader->load(1)
    ->then(fun($user) { $user_loader->load($user->invitedById) })
    ->then(fun($invitedBy) { say "User 1 was invited by ", $invitedBy->name });

# Somewhere else in the application
$user_loader->load(2)
    ->then(fun($user) { $user_loader->load($user->lastInvitedId) })
    ->then(fun($lastInvited) { say "User 2 last invited ", $lastInvited->name }); 

A naive application may have issued four round-trips to the backend for the required information, but with DataLoader this application will make at most two.

Batch Function

The batch loading function takes a list of keys as input, and returns a Mojo::Promise that resolves to a list of values. The ordering of the values should correspond to the ordering of the keys, with any missing values filled in with undef. For example, if the input is (2,9,6,1) and the backend service (e.g. database) returns:

{ id => 9, name => 'Chicago' }
{ id => 1, name => 'New York' }
{ id => 2, name => 'San Francisco' }

The backend has returned results in a different order than we requested, and omitted a result for key 6, presumably because no value exists for that key.

We need to re-order these results to match the original input (2,9,6,1), and include an undef result for 6:

[
  { id => 2, name => 'San Francisco' },
  { id => 9, name => 'Chicago' },
  undef,
  { id => 1, name => 'New York' },
]

There are two typical error cases in the batch loading function. One is you get an error that invalidates the whole batch, for example you do a DB query for all input rows, and the DB fails to connect. In this case, simply die and the error will be passed through to all callers that are waiting for values included in this batch. In this case, the error is assumed to be transient, and nothing will be cached.

The second case is where some of the batch succeeds but some fails. In this case, use DataLoader->error to create error objects, and mix them in with the successful values:

[
  { id => 2, name => 'San Francisco' },      # this succeeded
  DataLoader->error("no permission"),        # this failed (id 9)
  undef,                                     # this item is missing (id 6)
  { id => 1, name => 'New York' },           # this succeeded
]

Now callers that have called load->(9) will get an exception. Callers for id 6 will receive undef and callers for ids 1 and 2 will get hashrefs of data. Additionally, these errors will be cached (see 'Caching Errors' below).

Caching

DataLoader provides a simple memoization cache for all loads that occur within a single request for your application. Multiple loads for the same value result in only one backend request, and additionally, the same object in memory is returned each time, reducing memory use.

my $user_loader = DataLoader->new(...);
my $promise1a = $user_loader->load(1);
my $promise1b = $user_loader->load(1);
is( refaddr($promise1a), refaddr($promise1b) );   # same object

Caching Per-Request

The suggested way to use DataLoader is to create a new loader when a request (for example GraphQL request) begins, and destroy it once the request ends. This prevents duplicate backend operations and provides a consistent view of data across the request.

Using the same loader for multiple requests is not recommended as it may result in cached data being returned unexpectedly, or sensitive data being leaked to other users who should not be able to view it.

The default cache used by DataLoader is a simple hashref that stores all values for all keys loaded during the lifetime of the request; it is useful when request lifetime is short. If other behaviour is desired, see the cache_hashref constructor parameter.

Clearing Cache

It is sometimes necessary to clear values from the cache, for example after running an SQL UPDATE or similar, to prevent out of date values from being used. This can be done with the clear method.

Caching Errors

If the batch load fails (throws an exception or returns a rejected Promise), the requested values will not be cached. However, if the batch function returns a DataLoader::Error instance for individual value(s), those errors will be cached to avoid frequently loading the same error.

If you want to avoid this, you can catch the Promise error and clear the cache immediately afterwards, e.g.

$user_loader->load(1)->catch(fun ($error) {
   if ($should_clear_error) {
       $user_loader->clear(1);
   }
   die $error;   # or whatever
});

Priming the Cache

It is also possible to prime the cache with data. For example if you fetch a user by ID, you could also prime a username-based cache:

$user_by_id->load(1)->then(fun ($user) {
   $user_by_name->prime($user->name, $user);
   ...
});

If your backend query includes additional data, you could cache that too:

for my $tag (@{$user->tags}) {
   $tag_loader->prime($tag->id, $tag->name);
}

If you update a value in the backend, you can update the cache to save queries later:

$user = $user->update(favourite_color => 'red');
$user_cache->clear($user->id)->prime($user->id, $user);

Using Outside of GraphQL

DataLoader assumes the use of Mojolicious, specifically its promise implementation Mojo::Promise. The Mojo::Reactor::EV backend is recommended (and is automatically used provided you have EV installed) for optimal batching, although other backends will also work.

With the EV backend, DataLoader will work fine with any AnyEvent-based code. See the unit tests of this module for examples.

It would be possible to write a version of DataLoader that depends only on AnyEvent/EV and does not depend on Mojolicious. Let me know if there is interest.

Methods