Home

Awesome

Learning Bloom filter

Just one of the things I'm learning. https://github.com/hchiam/learning

Lets you check "definitely never seen before" or "maybe seen before", with very efficient space and time complexity.

Where b = bits and h = hash function(s):

Which is basically O(1) since both b and h are constants. https://brilliant.org/wiki/bloom-filter/#time-and-space-complexity

https://www.youtube.com/watch?v=-SuTGoFYjZs -> Bloom filters are good for checking if something hasn't been seen before or isn't in a list (like usernames or a URL reject list).

https://en.wikipedia.org/wiki/Bloom_filter#Algorithm_description

https://cs.stackexchange.com/questions/92618/in-a-bloom-filter-why-does-the-optimal-number-of-hashes-increase-with-the-num -> use more hash functions to get more "bits" in each "fingerprint" but balance it with competing factors in order to decrease chances of false positives.

https://github.com/Callidon/bloom-filters

git clone https://github.com/hchiam/learning-bloom-filter.git && cd learning-bloom-filter && yarn

or

git clone https://github.com/hchiam/learning-bloom-filter.git && cd learning-bloom-filter && npm install

and then you can run the example file:

node index.js

https://www.youtube.com/watch?v=-SuTGoFYjZs -> optimal Bloom filter specs calculation: node optimal-bloom-filter-specs-calculation.js

/**
 * n = items we plan to put into the filter
 * p = acceptable probability of collisions
 * m = optimal array size m
 * k = number of hash functions needed for optimal filling
 * m = Math.round(Math.abs(n * Math.log(p)) / Math.log(2) ** 2);
 * k = Math.round((m / n) * Math.log(2));
 * note: Math.log() in JavaScript is ln()
 */

// example:
const n = 300000;
const p = 0.01;
const m = Math.round(Math.abs(n * Math.log(p)) / Math.log(2) ** 2);
const k = Math.round((m / n) * Math.log(2));
console.log(`${m} array size needed`);
console.log(`${k} hash function needed`);