Awesome
ErlasticSearch
A thrift based erlang client for ElasticSearch.
It incorporates a connection-pool based on poolboy - if/when you get down to productizing, you might want to take a look at the pool's size
and max_overflow
options
NOTE : 1.5.0 for ES 0.9.x; 1.6.0 and higher for ES 1.x
Installation
Add this as a rebar dependency to your project.
- Be sure to set up ElasticSearch to support thrift!
- Install the thrift plugin (available here)
- Probably something like -->
bin/plugin -install elasticsearch/elasticsearch-transport-thrift/1.5.0.
- Probably something like -->
- You'll need to add (at least) the following settings to config.yaml
thrift.port: 9500
thrift.protocol: 'binary'
- You might want to set the port to whatever you want instead of
9500
. Mind you, in that case you might need to updateapp.config
and/or yourconnection_options
in your application/erlasticsearch setup too. - Start ElasticSearch
- If you plan on running the tests, you probably want to do this.
- Heck, if you plan on using ElasticSearch, you probably want to do this.
- If you plan on running the tests, you might want to run it in 'in-memory' mode.
- Probably something like -->
elasticsearch -f -Des.index.storage.type=memory -Des.config=/usr/local/opt/elasticsearch/config/elasticsearch.yml
- Probably something like -->
- Install the thrift plugin (available here)
- Update your environment with the following
connection_options
(look in app.config for examples)thrift_options
(default : [{framed, true}])thrift_host
(default : "localhost")thrift_port
(default : 9500)binary_response
(default true. When false, this will runjsx:decode
on the ES response, and send the tuples back to you, instead of one long binary)pools
- If you are using the default pools, be sure to use the (uncommented) pool settings from
app.config
( If you use the default pools, then you will have to start up elasticsearch before the application, otherwise Bad Things™ will happen )
- If you are using the default pools, be sure to use the (uncommented) pool settings from
- Start a pool
erlasticsearch:start_pool(<<"some_unique_name_here">>).
, orerlasticsearch:start_pool({"localhost", 9500, <<"some_unique_name_here">>).
- of course, "localhost" and 9500 should be replaced with your actual thrift_host and thrift_port
- these values for thrift_host and thrift_port will override any values that you provide in connection_options
- Profit
WARNING
THE TESTS WILL CREATE AND DELETE INDICES IN WHATEVER ELASTICSEARCH INSTANCE YOU POINT THE CLIENT AT
THE TESTS WILL CREATE AND DELETE INDICES IN WHATEVER ELASTICSEARCH INSTANCE YOU POINT THE CLIENT AT
THE TESTS WILL CREATE AND DELETE INDICES IN WHATEVER ELASTICSEARCH INSTANCE YOU POINT THE CLIENT AT
!!!!!!!SERIOUSLY!!!!!!
YOU HAVE BEEN WARNED
TL;DR
- Make sure you have ElasticSearch running.
- You can explicitly start a (new) connection pool, and access ElasticSearch that way, as follows (e.g. if you need distinct pools for distinct Indexes)
- Start up a pool --->
erlasticsearch:start_pool(<<"some_unique_name_here">>).
- From that point, use
<<"some_unique_name_here">>
asServerRef
, e.g.
- Start up a pool --->
(erlasticsearch@paglierino)1 > erlasticsearch:is_index(<<"some_unique_name_here">>, <<"an_index">>).
- You can just pick
<<"another_unique_name">>>
and use it asServerRef
, without starting up a connection pool. In this case, a connection pool will be automagically started. e.g.
(erlasticsearch@paglierino)1 > erlasticsearch:is_index(<<"some_unique_name_here">>, <<"an_index">>).
- Any JSON expected by ElasticSearch will need to go in as JSON or jsx encodable proplists. e.g.
<<"{\"settings\":{\"number_of_shards\":3}}">>
, or[{<<"settings">>, [{<<"number_of_shards">>, 3}]}]
- Output returned by everything is in the form of
[tuple()] | {error, Reason}
, i.e., either it is a list of tuples, or an error. The tuple list will contain the following- {status, Status} <-- This is the REST code (200, 201, 404, etc.)
- {body, Body} <-- The body of the response from ElasticSearch. More on this next
- {result, Result} <-- A boolean representing the result for the various boolean methods (
is_index
,is_doc
, etc.) - The Body of the response from ElasticSearch - when it exists - will be JSON. That said,
binary_response
in yourconnection_options
is going to determine the form of the response.- The default is
binary_response = true
. In this case, you{body, Body}
is just going to contain the entire payload from Elasticsearch as a single binary.- e.g. -->
{body , <<"{\"ok\":true,\"acknowledged\":true}">>}
- e.g. -->
- If you set
binary_response = false
,{body, Body}
will contain the JSON as a decoded tuple-list (basically, what you get by runningjsx:decode(Body)
){body , [ {<<"ok">> , true} , {<<"acknowledged">> , true} ] }
- The default is
- The default timeout for workers is 5000ms. If you're ES instance is slow (or you are running CT), you can bump this by setting the
worker_timeout
environment variable (programmatically, or in app.config. Take a look attest/erlasticsearch_SUITE.erl:1122
)
Details
Pool Management
These methods are available to start and stop the thrift pools
Once the erlasticsearch application has been started, you can use start_pool
and stop_pool
to start/stop a connection-pool. After this, you can use your PoolName as ServerRef
in the accessors below.
Alternatively, you can just use PoolName, and a pool will be started up for you. The difference here is that the defaults will get used for the pool's size
and max_overflow
options
PoolName can also be used to refer to multiple ElasticSearch instances. This is because in all cases, PoolName is one of
PoolIdentifier
e.g.<<"some unique name">>
)
{ThriftHost, ThriftPort, PoolIdentifier}
e.g.{"localhost", 9500, <<"some unqiue name">>}
{unidentified, 9500, <<"some unique name">>}
The type spec
-type pool_identifier() :: binary().
-type thrift_host() :: undefined | string().
-type thrift_port() :: undefined | integer().
Function | Parameters | Description |
---|---|---|
start_pool/1 | PoolName | Start a connection pool referenceable as PoolName, with default pool_options and connection_options |
start_pool/2 | PoolName, PoolParameters | Start a connection pool referenceable as PoolName, with custom pool_options and default connection_options |
start_pool/3 | PoolName, PoolParameters, ConnectionParameters | Start a connection pool referenceable as PoolName, with custom pool_options and connection_options |
stop_pool/1 | PoolName | Stop the connection pool referenced as PoolName |
EXAMPLES
Using the client based accessors (note that bar2
has {binary_response, false}
)
erlasticsearch@pecorino)1> erlasticsearch:start_pool(<<"bar1">>).
{ok,<0.178.0>}
erlasticsearch@pecorino)2> {ok, Pid} = erlasticsearch:start_pool(<<"bar2">>, [{thrift_options, [{framed, false}]}, {binary_response, false}]).
{ok,<0.182.0>}
erlasticsearch@pecorino)4> erlasticsearch:flush(<<"bar1">>).
[{status,<<"200">>},
{body,<<"{\"ok\":true,\"_shards\":{\"total\":0,\"successful\":0,\"failed\":0}}">>}]
erlasticsearch@pecorino)5> erlasticsearch:flush(<<"barbar">>).
{ok,{restResponse,200,undefined,<<"{\"ok\":true,\"_shards\":{\"total\":552,\"successful\":276,\"failed\":0}}">>}}
erlasticsearch@pecorino)8> erlasticsearch:stop_pool(<<"bar1">>).
ok
erlasticsearch@pecorino)9> erlasticsearch:stop_pool(<<"bar2">>).
ok
erlasticsearch@pecorino)9> erlasticsearch:stop_pool(<<"barbar">>).
ok
Note that the pool associated with <<"barbar">>
was automatically started up by erlasticsearch above!
Index CRUD
These methods are available to perform CRUD activities on Indexes (kinda, sorta, vaguely the equivalent of Databases in ElasticSearch. But you already knew that, right?)
Function | Parameters | Description |
---|---|---|
create_index/2 | ServerRef, IndexName | Creates the Index called IndexName |
create_index/3 | ServerRef, IndexName, Parameters | Creates the Index called IndexName, with additional options as specified here |
delete_index/2 | ServerRef, IndexName | Deletes the Index called IndexName |
is_index/2 | ServerRef, IndexName | Checks if the Index called IndexName exists. (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] ) |
is_type/3 | ServerRef, IndexName, TypeName | Checks if the Type called _TypeName exists in the index IndexName. (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] ), as well as a list of types (e.g. [<<"type1">>, <<"type2">>] ) |
open_index/2 | ServerRef, IndexName | Opens the Index called IndexName |
close_index/2 | ServerRef, IndexName | Closes the Index called IndexName |
EXAMPLES
(note that bar2
has {binary_response, false}
)
erlasticsearch@pecorino)3> erlasticsearch:create_index(<<"bar">>, <<"foo2">>).
[{status,<<"200">>},
{body,<<"{\"ok\":true,\"acknowledged\":true}">>}]
erlasticsearch@pecorino)6> erlasticsearch:delete_index(<<"bar">>, <<"foo2">>).
[{status,<<"200">>},
{body,<<"{\"ok\":true,\"acknowledged\":true}">>}]
erlasticsearch@pecorino)8> erlasticsearch:is_index(<<"an_erlasticsearch_pool">>, <<"foo3">>).
[{result,<<"false">>},{status,<<"404">>}]
Document CRUD
These methods are available to perform CRUD activities on actual documents
(Note: ServerRef is either a Client Name/Reference, or {pool, PoolName})
Function | Parameters | Description |
---|---|---|
insert_doc/5 | ServerRef, IndexName, Type, Id, Doc | Creates the Doc under IndexName, with type Type, and id Id |
insert_doc/6 | ServerRef, IndexName, Type, Id, Doc, Params | Creates the Doc under IndexName, with type Type, and id Id, and passes the tuple-list Params to ElasticSearch |
is_doc/4 | ServerRef, IndexName, Type, Id | Checks if the Doc under IndexName, with type Type, and id Id exists |
get_doc/4 | ServerRef, IndexName, Type, Id | Gets the Doc under IndexName, with type Type, and id Id |
get_doc/5 | ServerRef, IndexName, Type, Id, Params | Gets the Doc under IndexName, with type Type, and id Id, and passes the tuple-list Params to ElasticSearch |
mget_doc/2 | ServerRef, Doc | Gets documents from the ElasticSearch cluster based on the Index(s), Type(s), and Id(s) in Doc |
mget_doc/3 | ServerRef, IndexName, Doc | Gets documents from the ElasticSearch cluster index IndexName based on the Type(s), and Id(s) in Doc |
mget_doc/4 | ServerRef, IndexName, TypeName, Doc | Gets documents from the ElasticSearch cluster index IndexName, with type TypeName, based on the Id(s) in Doc |
delete_doc/4 | ServerRef, IndexName, Type, Id | Deleset the Doc under IndexName, with type Type, and id Id |
delete_doc/5 | ServerRef, IndexName, Type, Id, Params | Deletes the Doc under IndexName, with type Type, and id Id, and passes the tuple-list Params to ElasticSearch |
bulk/2 | ServerRef, Doc | Bulk insert of all the data in Doc (Index and Type defined in Doc) |
bulk/3 | ServerRef, IndexName, Doc | Bulk insert of all the data in Doc, with the Index defaulting to IndexName (Type defined in Doc) |
bulk/4 | ServerRef, IndexName, TypeName, Doc | Bulk insert of all the data in Doc, with the Index defaulting to IndexName and Type defaulting to TypeName |
count/2 | ServerRef, Doc | Counts the docs in the cluster based on the search in Doc. (note that if Doc is empty, you get a count of all the docs in the cluster) |
count/3 | ServerRef, Doc, Params | Counts the docs in the cluster based on the search in Doc, using Params. Note that either Doc or Params can be empty, but clearly not both :-) |
count/4 | ServerRef, IndexName, Doc, Params | Counts the docs in the cluster based on the search in Doc, associated with the index IndexName, using Params (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] . This list can also be empty - [] ) |
count/5 | ServerRef, IndexName, TypeName, Doc, Params | Counts the docs in the cluster based on the search in Doc, associated with the index IndexName, and type TypeName using Params (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] , as well as a list of types (e.g. ) [<<"type1">>, <<"type2">>] . Each of these lists can also be empty - [] ) |
delete_by_query/2 | ServerRef, Doc | Deletes the docs in the cluster based on the search in Doc. (note that if Doc is empty, you get a count of all the docs in the cluster) |
delete_by_query/3 | ServerRef, Doc, Params | Deletes the docs in the cluster based on the search in Doc, using Params. Note that either Doc or Params can be empty, but clearly not both :-) |
delete_by_query/4 | ServerRef, IndexName, Doc, Params | Deletes the docs in the cluster based on the search in Doc, associated with the index IndexName, using Params (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] . This list can also be empty - [] ) |
delete_by_query/5 | ServerRef, IndexName, TypeName, Doc, Params | Deletes the docs in the cluster based on the search in Doc, associated with the index IndexName, and type TypeName using Params (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] , as well as a list of types (e.g. ) [<<"type1">>, <<"type2">>] . Each of these lists can also be empty - [] ) |
Note:
- For both
insert_doc/4
andinsert_doc/5
, sending inundefined
as theId
will result in ElasticSearch generating an Id for the document. This Id will be returned as part of the result... - Yes, the order of the arguments to mget_doc/[2,3,4] is weird. Its just that ElasticSearch is slightly strange in this one...
EXAMPLES
erlasticsearch@pecorino)4> erlasticsearch:start_pool(<<"bar">>).
{ok,<0.178.0>}
erlasticsearch@pecorino)5> erlasticsearch:start_pool(<<"bar2">>, [{binary_response, false}]).
{ok,<0.182.0>}
erlasticsearch@pecorino)6> erlasticsearch:insert_doc(<<"bar">>, <<"index1">>, <<"type1">>, <<"id1">>, <<"{\"some_key\":\"some_val\"}">>).
[{status,<<"201">>},
{body,<<"{\"ok\":true,\"_index\":\"index1\",\"_type\":\"type1\",\"_id\":\"id1\",\"_version\":1}">>}]
erlasticsearch@pecorino)7> erlasticsearch:insert_doc(<<"bar">>, <<"index2">>, <<"type3">>, <<"id2">>, <<"{\"some_key\":\"some_val\"}">>, [{'_ttl', '1d'}]).
[{status,<<"201">>},
{body,<<"{\"ok\":true,\"_index\":\"index2\",\"_type\":\"type3\",\"_id\":\"id2\",\"_version\":1}">>}]
erlasticsearch@pecorino)8> erlasticsearch:insert_doc(<<"bar2">>, <<"index3">>, <<"type3">>, undefined, <<"{\"some_key\":\"some_val\"}">>).
[{status,201},
{body,[{<<"ok">>,true},
{<<"_index">>,<<"index3">>},
{<<"_type">>,<<"type3">>},
{<<"_id">>,<<"z9M78se6SuKsZ0lYlybAwg">>},
{<<"_version">>,1}]}]
erlasticsearch@pecorino)10> erlasticsearch:get_doc(<<"bar2">>, <<"index1">>, <<"type1">>, <<"id1">>, [{fields, foobar}]).
[{status,200},
{body,[{<<"_index">>,<<"index1">>},
{<<"_type">>,<<"type1">>},
{<<"_id">>,<<"id1">>},
{<<"_version">>,1},
{<<"exists">>,true}]}]
erlasticsearch@pecorino)11> erlasticsearch:get_doc(<<"bar2">>, <<"index1">>, <<"type1">>, <<"id1">>, [{fields, some_key}]).
[{status,200},
{body,[{<<"_index">>,<<"index1">>},
{<<"_type">>,<<"type1">>},
{<<"_id">>,<<"id1">>},
{<<"_version">>,1},
{<<"exists">>,true},
{<<"fields">>,[{<<"some_key">>,<<"some_val">>}]}]}]
Search
API to perform searches against ElasticSearch (this is why you are using ElasticSearch, right?)
Function | Parameters | Description |
---|---|---|
search/4 | PoolName, IndexName, Type, Doc | Searches the index IndexName, with type Type for the JSON query embedded in Doc |
search/5 | PoolName, IndexName, Type, Doc, Params | Searches the index IndexName, with type Type for the JSON query embedded in Doc, and passes the tuple-list Params to ElasticSearch |
EXAMPLES
erlasticsearch@pecorino)2> erlasticsearch:insert_doc({"localhost", 9500, <<"an_erlasticsearch_pool">>}, <<"index1">>, <<"type1">>, <<"id1">>, <<"{\"some_key\":\"some_val\"}">>).
[{status,<<"201">>},
{body, <<"{\"ok\":true,\"_index\":\"index1\",\"_type\":\"type1\",\"_id\":\"id1\",\"_version\":1}">>}]
erlasticsearch@pecorino)3> erlasticsearch:search({"localhost", 9500, <<"an_erlasticsearch_pool">>}, <<"index1">>, <<"type1">>, <<>>, [{q, "some_key:some_val"}]).
[{status,<<"200">>},
{body, <<"{\"took\":1,\"timed_out\":false,\"_shards\":{\"total\":5,\"successful\":5,\"failed\":0},\"hits\":{\"total\":"...>>}]
Index Helpers
A bunch of functions that do "things" to indices (flush, refresh, etc.)
(Note: ServerRef refers to either PoolName or {ThriftHost, ThriftPort, PoolIdentifier})
Function | Parameters | Description |
---|---|---|
flush/1 | ServerRef | Flushes all the indices |
flush/2 | ServerRef, Index | Flushes the index IndexName. (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] ) |
optimize/1 | ServerRef | Optimizes all the indices |
optimize/2 | ServerRef, Index | Optimizes the index IndexName. (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] This list can also be empty - [] ) |
segments/1 | ServerRef | Provides segment information for all the indices in the cluster |
segments/2 | ServerRef, Index | Provides segment information for the index IndexName. (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] This list can also be empty - [] ) |
refresh/1 | ServerRef | Refreshes all the indices |
refresh/2 | ServerRef, Index | Refreshes the index IndexName. (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] This list can also be empty - [] ) |
status/2 | ServerRef, Index | Returns the status of index IndexName. (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] This list can also be empty - [] ) |
clear_cache/1 | ServerRef | Clears all the caches in the cluster |
clear_cache/2 | ServerRef, Index | Clears all the caches associated with the index IndexName. (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] This list can also be empty - [] ) |
clear_cache/3 | ServerRef, Index, params | Clears all the caches associated with the index IndexName, using Params (Note that a list of Indices can also be sent in (e.g., [<<"foo">>, <<"bar">>] This list can also be empty - [] ) |
EXAMPLES
erlasticsearch@pecorino)2> erlasticsearch:refresh(<<"bar">>). [{status,<<"200">>},
{body, <<"{\"ok\":true,\"_shards\":{\"total\":552,\"successful\":276,\"failed\":0}}">>}]
erlasticsearch@pecorino)3> erlasticsearch:refresh(<<"bar">>, <<"index1">>). [{status,<<"200">>},
{body, <<"{\"ok\":true,\"_shards\":{\"total\":10,\"successful\":5,\"failed\":0}}">>}]
erlasticsearch@pecorino)4> erlasticsearch:refresh(<<"bar">>, [<<"index1">>, <<"index2">>]).
[{status,<<"200">>},
{body, <<"{\"ok\":true,\"_shards\":{\"total\":16,\"successful\":8,\"failed\":0}}">>}]
erlasticsearch@pecorino)5> erlasticsearch:flush(<<"bar">>). [{status,<<"200">>},
{body, <<"{\"ok\":true,\"_shards\":{\"total\":552,\"successful\":276,\"failed\":0}}">>}]
erlasticsearch@pecorino)6> erlasticsearch:refresh(<<"bar">>, <<"index1">>). [{status,<<"200">>},
{body, <<"{\"ok\":true,\"_shards\":{\"total\":10,\"successful\":5,\"failed\":0}}">>}]
erlasticsearch@pecorino)7> erlasticsearch:refresh(<<"bar">>, [<<"index1">>, <<"index2">>]).
[{status,<<"200">>},
{body, <<"{\"ok\":true,\"_shards\":{\"total\":16,\"successful\":8,\"failed\":0}}">>}]
Cluster Helpers
A bunch of functions that do "things" to clusters (health, etc.)
(Note: ServerRef refers to either PoolName or {ThriftHost, ThriftPort, PoolIdentifier})
Function | Parameters | Description |
---|---|---|
health/1 | ServerRef | Reports the health of the cluster |
state/1 | ServerRef | Reports the state of the cluster |
state/2 | ServerRef, Params | Reports the state of the cluster, with optional parameters |
nodes_info/1 | ServerRef | Reports the state of all the nodes in the cluster |
nodes_info/2 | ServerRef, NodeName | Reports the state of the node NodeName in the cluster. (Note that a list of Nodes can also be sent in (e.g., [<<"node1">>, <<"node2">>] This list can also be empty - [] ) |
nodes_info/3 | ServerRef, NodeName, Params | Reports the state of the node NodeName in the cluster, with optional Params. (Note that a list of Nodes can also be sent in (e.g., [<<"node1">>, <<"node2">>] This list can also be empty - [] ) |
nodes_stats/1 | ServerRef | Reports stats on all the nodes in the cluster |
nodes_stats/2 | ServerRef, NodeName | Reports the stats of the node NodeName in the cluster. (Note that a list of Nodes can also be sent in (e.g., [<<"node1">>, <<"node2">>] ) |
nodes_stats/3 | ServerRef, NodeName, Params | Reports the stats of the node NodeName in the cluster, with optional Params. (Note that a list of Nodes can also be sent in (e.g., [<<"node1">>, <<"node2">>] This list can also be empty - [] ) |
EXAMPLES
erlasticsearch@pecorino)2> erlasticsearch:refresh(<<"bar">>). {ok,{restResponse,200,undefined, <<"{\"ok\":true,\"_shards\":{\"total\":552,\"successful\":276,\"failed\":0}}">>}]
erlasticsearch@pecorino)3> erlasticsearch:health(<<"bar">>).
[{status,<<"200">>},
{body, <<"{\"cluster_name\":\"elasticsearch_mahesh\",\"status\":\"yellow\",\"timed_out\":false,\"number_of_nodes\""...>>}]
erlasticsearch@pecorino)1> erlasticsearch:stop_client(<<"bar">>).
ok
erlasticsearch@pecorino)4> erlasticsearch:state(<<"bar">>).
[{status,<<"200">>},
{body, <<"{\"cluster_name\":\"elasticsearch_mahesh\",\"master_node\":\"7k3ViuT5SQ67ayWsF1y8hQ\",\"blocks\":{\"ind"...>>}]
erlasticsearch@pecorino)5> erlasticsearch:state(<<"bar">>, [{filter_nodes, true}]).
[{status,<<"200">>},
{body, <<"{\"cluster_name\":\"elasticsearch_mahesh\",\"blocks\":{\"indices\":{\"index1\":{\"4\":{\"description\":\"inde"...>>}]
Credits
Thanks to Paul Oliver for helping with the poolboy implementation
This is not to be confused with erlastic_search by Tristan Sloughter, which is HTTP/REST based, and almost certainly did not involve quite this level of head-thumping associated w/ figuring out how Thrift works…
(Yes, this is a Credit)