Home

Awesome

Status: Archived

This repository has been archived and is no longer maintained.

status: inactive

Flashlight

A pluggable integration with ElasticSearch to provide advanced content searches in Firebase.

This script can:

Getting Started

Check out the recommended security rules in example/seed/security_rules.json. See example/README.md to seed and run an example client app.

If you experience errors like {"error":"IndexMissingException[[firebase] missing]","status":404}, you may need to manually create the index referenced in each path:

curl -X POST http://localhost:9200/firebase

To read more about setting up a Firebase service account and configuring FB_SERVICEACCOUNT, click here.

Client Implementations

Read example/index.html and example/example.js for a client implementation. It works like this:

The body object can be any valid ElasticSearch DSL structure (see Building ElasticSearch Queries).

Deploy to Heroku

Setup Initial Index with Bonsai

After you've deployed to Heroku, you need to create your initial index name to prevent IndexMissingException error from Bonsai. Create an index called "firebase" via curl using the BONSAI_URL that you copied during Heroku deployment.

Migration

0.2.0 -> 0.3.0

Flashlight now returns the direct output of ElasticSearch, instead of just returning the hits part. This change is required to support aggregations and include richer information. You must change how you read the reponse accordingly. You can see example responses of Flashlight below:

Before, in 0.2.0

"total" : 1000,
"max_score" : null,
"hits" : [
  ..
]

After, in 0.3.0

{
  "took" : 63,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : null,
    "hits" : [
      ..
    ]
  },
  "aggregations" : {
    ..
  }
}

Advanced Topics

Parsing and filtering indexed data

The paths specified in config.js can include the special filter and parse functions to manipulate the contents of the index. For example, if I had a messaging app, but I didn't want to index any system-generated messages, I could add the following filter to my messages path:

filter: function(data) { return data.name !== 'system'; }

Here, data represents the JSON snapshot obtained from the database. If this method does not return true, that record will not be indexed. Note that the filter method is applied before parse.

If I want to remove or alter data getting indexed, that is done using the parse function. For example, assume I wanted to index user records, but remove any private information from the index. I could add a parse function to do this:

parse: function(data) {
   return {
      first_name: data.first_name,
      last_name: data.last_name,
      birthday: new Date(data.birthday_as_number).toISOString()
   };
}

Building ElasticSearch Queries

The full ElasticSearch API is supported. Check out this great tutorial on querying ElasticSearch. And be sure to read the ElasticSearch API Reference.

Example: Simple text search

 {
   "q": "foo*"
 }

Example: Paginate

You can control the number of matches (defaults to 10) and initial offset for paginating search results:

 {
   "from" : 0, 
   "size" : 50, 
   "body": {
     "query": {
        "match": {
           "_all": "foo"
        }
     }
   }
 }; 

Example: Search for multiple tags or categories

 {
   "body": {
     "query": {
       { "tag": [ "foo", "bar" ] }
     }
   }
 }

read more

Example: Search only specific fields

 {
   "body": {
     "query": {
       "match": {
         "field":  "foo",
       }
     }
   }
 }

Example: Give more weight to specific fields

 {
   "body": {
     "query": {
       "multi_match": {
         "query":  "foo",
         "type":   "most_fields", 
         "fields": [ 
            "important_field^10", // adding ^10 makes this field relatively more important 
            "trivial_field" 
         ]
       }
     }
   }
 }

read more

Helpful section of ES docs

Search lite (simple text searches with q) Finding exact values Sorting and relevance Partial matching Wildcards and regexp Proximity matching Dealing with human language

Operating at massive scale

Is Flashlight designed to work at millions or requests per second? No. It's designed to be a template for implementing your production services. Some assembly required.

Here are a couple quick optimizations you can make to improve scale:

Use refBuilder to improve indexing efficiency

In config.js, each entry in paths can be assigned a refBuilder function. This can construct a query for determining which records get indexed.

This can be utilized to improve efficiency by preventing all data from being re-indexed any time the Flashlight service is restarted, and generally by preventing a large backlog from being read into memory at once.

For example, if I were indexing chat messages, and they had a timestamp field, I could use the following to never look back more than a day during a server restart:

exports.paths = [
   {
      path  : "chat/messages",
      index : "firebase",
      type  : "message",
      fields: ['message_body', 'tags'],
      refBuilder: function(ref, path) {
         return ref.orderByChild('timestamp').startAt(Date.now());
      }
   }
];

Loading paths to index from the database instead of config file

Paths to be indexed can be loaded dynamically from the database by providing a path string instead of the paths array. For example, the paths given in config.example.js could be replaced with dynamic_paths and then those paths could be stored in the database, similar to this.

Any updates to the database paths are handled by Flashlight (new paths are indexed when they are added, old paths stop being indexed when they are removed).

Unfortunately, since JSON data stored in Firebase can't contain functions, the filter, parser, and refBuilder options can't be used with this approach.

Support

Submit questions or bugs using the issue tracker.

For Firebase-releated questions, try the mailing list.

License

MIT LICENSE Copyright © 2013 Firebase opensource@firebase.com