Home

Awesome

level-map-merge

Like a simple realtime map-reduce, but simpler.

map-merge has two parts, a batch part, and a real-time part. the batch is run (say) every few hours, and the real-time part is run the rest of the time. The realtime map-merge can add things, but only the batch part can delete things.

map-merge is suitable for situations where it's okay to have slightly old or incorrect data, like an inverted index!

The included ./example.js builds a full text index for the npm registry is about ~1.5 minutes.

JSON

one of the performance improvements possible in map-merge is to run a batch and keep the merged data structure in memory, instead of stringifing and parsing. Building an inverted index of all the READMEs in npm takes over 20 minutes parsing JSON, but only 40 seconds keeping it as objects!

Example

Inverted Index (word -> list of documents it appears in)

See also ./example.js

var LevelMapMerge = require('level-map-merge')
var LevelUp       = require('levelup')

var db = LevelUp()
LevelMapMerge(db, {
  map: function (key, val, emit) {
    //the raw data can be anything, so you have to parse it.
    val = JSON.parse(val)
    val.split(/\W+/).forEach(function (e) {
      //but the mapped structure is expected to be js.
      emit(e, [key])
    })
  },
  //merge 'little' into 'big'
  merge: function (big, little, key) {
    //you don't need to use parse here!
    little.forEach(function (e) {
      if(-1 == big.indexOf(e))
        big.push(e)
    })
    //don't use stringify here either!
    return big
  }
}

Limitations

currently, the entire batch view needs to fit into memory. (will be a problem with millions of records, but not for thousands)

Licence

MIT