Awesome
Stockdb
This library is a storage for Stock Exchange quotes.
It is an append-only online-compress storage, that supports failover, indexing and simple lookups of stored quotes.
It can compress 400 000 daily quotes into 4 MB of disk storage.
Usage
First install and compile it. Include it as a rebar dependency.
It can be used either as an appender, either as reader. You cannot mix these two modes now.
Writing database: Appender
Typical workflow when appending data to DB:
{ok, Appender} = stockdb:open_append('NASDAQ.AAPL', "2012-01-15", [{depth, 2}]),
{ok, Appender1} = stockdb:append({md, 1326601810453, [{450.1,100},{449.56,1000}], [{452.43,20},{454.15,40}]}, Appender),
stockdb:close(Appender1).
Now lets explain, what is happening.
- Open appender. Stock name should be a symbol, date should either erlang date
{YYYY,MM,DD}
, either a string"YYYY-MM-DD"
. - Don't forget to specify proper depth. If you skip it, default depth is 1 and you will save only best bid and best ask
- Specify also
{scale, 1000}
option, if you want to store quotes with precision less than 1 cent. Stockdb stores your prices as int:round(Price*Scale)
- Now append market data.
- Market data is following:
{md, UtcMilliseconds, [{L1BidPrice,L1BidSize},{L2BidPrice,L2BidSize}..], [{L1AskPrice,L2AskSize}..]}
- You can include
-include_lib("stockdb/include/stockdb.hrl").
to use#md{}
and#trade{}
records
Now take a look at db/stock folder. There you can see new file db/stock/NASDAQ.AAPL-2012-01-15.stock
and now you can read back stocks from it.
Reading database
Read whole DB
The most simple way is just to read all daily events to replaying them
{ok, Events} = stockdb:events('NASDAQ.AAPL', "2012-01-15").
Get candle for whole day or specified time range:
DayCandleEvents = stockdb:events('NASDAQ.AAPL', "2012-08-10", [{filter, candle, [{period, undefined}]}]).
RangeCandleEvents = stockdb:events('NASDAQ.AAPL', "2012-08-10", [{range, {15,0,0}, {16,0,0}}, {filter, candle, [{period, undefined}]}]).
But there are possible more enhanced ways of limiting amount of loaded data.
Iterator
If you need something more complex than just getting all data from DB for stock/date pair, you can use iterators. Iterator is database opened for read-only with (optionally) filters applied on it. You can read iterator's events one-by-one, saving memory by not keeping extracted data.
Basic iterator is created as follows:
{ok, Iterator} = stockdb:init_reader('NASDAQ.AAPL', {2012, 8, 7}, []).
Here first argument is stock, second is date, third is list of filters (empty for basic case).
You can read events one-by-one using stockdb:read_event/1
function:
{Event1, Iterator1} = stockdb:read_event(Iterator),
{Event2, Iterator2} = stockdb:read_event(Iterator1).
When there are no more events, eof
event is returned. Make sure your code handles it well!
Also, you can call stockdb:events(Iterator)
to get all events from it.
Iterator filters
Iterator filters currently may be {range, Start, End}
or {filter, FilterFun, FilterState0}
.
FilterFun
may be function name from module stockdb_filters
(currently only candle
) or
any function with arity 2 which returns list of emitted events. Filter must accept events from
previous filter (or #md{}
and trade{}
from DB) and eof
to handle end of underlying source.
Return value is tuple with list of emitted events on first place and next state on second.
For example, this simple function drops every second event:
FilterFun = fun
(eof, _State) -> {[], eof};
(Event, true) -> {[Event], false};
(_Event, _Other) -> {[], true}
end.
And, we can see it is working:
25> length(stockdb:events(Iterator)).
20703
27> {ok, FIterator} = stockdb:init_reader('NASDAQ.AAPL', "2012-08-07", [{filter, FilterFun, false}]),
27> length(stockdb:events(FIterator)).
10351
To use pre-defined filters you can just specify filter name:
28> {ok, CIterator} = stockdb:init_reader('NASDAQ.AAPL', "2012-08-07", [{filter, candle, [{period, 120000}]}]),
28> length(stockdb:events(CIterator)).
2242
StockDB index is optimized for fast timestamp seeking, so you can use {range, Start, End}
pseudo-filter. Start and End (if defined)
are both millisecond timestamps or erlang-style {HH, MM, SS}
tuples (tuples will work only over DB source, not over other iterator). undefined
for Start
or End
means the very beginning or the very end respectively. Example:
31> {ok, RIterator} = stockdb:init_reader('NASDAQ.AAPL', "2012-08-07", [{range, {14,0,0}, {15,0,0}}]),
31> length(stockdb:events(RIterator)).
5139
49> {ok, HIterator} = stockdb:init_reader('NASDAQ.AAPL', "2012-08-07", [{range, undefined, 1344348900451}]),
49> length(stockdb:events(HIterator)).
1954
Of course, you may specify multiple filters:
32> {ok, RFCIterator} = stockdb:init_reader('NASDAQ.AAPL', "2012-08-07", [{range, {14,0,0}, {15,0,0}}, {filter, FilterFun, false}, {filter, candle, [{period, 120000}]}]),
32> length(stockdb:events(RFCIterator)).
372
Also, iterators may cascade:
35> {ok, RIterator_F} = stockdb:init_reader(RIterator, [{filter, FilterFun, false}]),
35> {ok, RIterator_F_C} = stockdb:init_reader(RIterator_F, [{filter, candle, [{period, 120000}]}]),
35> length(stockdb:events(RIterator_F_C)).
372
36> stockdb:events(RIterator_F_C) == stockdb:events(RFCIterator).
true
Self-sufficient read-only state
Function stockdb:init_reader/3
currently accesses file directly. If you have distributed setup, it will fail. Stockdb is able to bypass this by using a lower-level stockdb:open_read/2
.
open_read/2
returns in-memory read-only database state with full buffer and file descriptor closed. Actually, stockdb:init_reader/3
first opens DB using stockdb:open_read/2
and then calls stockdb:init_reader/2
on it. So does stockdb:events/2
. You can do the same:
42> {ok, S} = stockdb:open_read('NASDAQ.AAPL', {2012, 8,7}),
42> {ok, Iterator} = stockdb:init_reader(S, []),
42> stockdb:events(S) == stockdb:events(Iterator).
true
Note that we still use the same Iterator which matches perfectly. stockdb:init_reader(S, [])
can be called when original file is unavailable allowing to minimize network load when DB content is needed on other node.
Querying existing data
There are simple functions which let you know what data you have.
- To list all stocks having any data in database, use
stockdb:stocks()
- To list dates when some stock has any data, use
stockdb:dates(Stock)
- To get date intersection between multiple stocks, use
stockdb:common_dates([Stock1, Stock2, ...])
- To get some information about file, stockdb instance or stock/date pair, use
stockdb:info(Stockdb)
,stockdb:info(Filename)
,stockdb:info(Stock, Date)
,stockdb:info(Stock, Date, [Key1, Key2, ...])
. Key can be one ofpath, stock, date, version, scale, depth, chunk_size, presence
. Return value is tuplelist. Presence is{ChunkCount, [ChunkNumber1, ChunkNumber2, ...]}
, representing some internal report.