Bloom Filters Benchmarking
Bloom filter is a probabilistic data structure that is similar to hash tables and allow the trade-off between false positive rate of querying data and the storage space reduction.
Dillinger P.C., Manolios P., Bloom Filters in Probabilistic Verification. In: Hu A.J., Martin A.K. (eds) Formal Methods in Computer-Aided Design. FMCAD 2004. Lecture Notes in Computer Science, vol 3312. Springer, Berlin, Heidelberg
Lim H, Lee J, Byun H, Yim C. Ternary Bloom Filter Replacing Counting Bloom Filter. IEEE Communications Letters. 2017; 21: 278–281.
Denis Kleyko, Abbas Rahimi, Evgeny Osipov: Autoscaling Bloom Filter: Controlling Trade-off Between True and False Positives. CoRR abs/1705.03934, 2017
C. E. Rothenberg, C. A. B. Macapuna, F. L. Verdi, and M. F. Magalhaes, “The deletable bloom filter: a new member of the bloom family”, IEEE Communications Letters, vol. 14, no. 6, pp. 557–559, 2010
/Experiments
Code of experiment held for comparison between different filters + Results of the experiment./Filters Implementation
Classes of implementation of filters./URLShortener_App
Django Application for URL Shortening
One of the applications that can use Bloom Filters is the URL Shortening Applications. The process is to make a call to the server, which generates a fresh URL and sends it back. A bloom filter can be used to tell if this URL has already been generated earlier, and keep generating new ones till it returns false. As the filter is in memory, this tends to be cheaper than querying a database.
We have implemented a Web Application for URL shortening. You can try it in both cases of using Standard Bloom Filter or directly accessing the database and measure the time difference. Our application shows that we get faster performance using Std. Bloom Filter by around 14.6 times which will get much more by filling the database more with URLs.
Our URL Shortener Can be found here: http://35.229.23.124/
As our application is related to URL shortening:
The plot below shows logarithm of the number of false positives from each filter type Vs the filter size.
The plots below shows logarithm of the number of false positives and false negatives from each filter type Vs number of elements deleted from filters.
Ternary Filter has many indeterminable events which leads to least FPR, and FNR.
Standard Filter has almost similar FPR like other filters and it doesn’t support deletion. So, no False negatives.
AutoScaling Filter has the largest FNR but by deleting more elements its FRP decreases. This can be tuned by having different decision threshold, and binarization values.
Counting Filter has no FNR, and also least FPR by deleting more elements from the filter. However, the filter size is the largest.
Deletable Filter has no false negatives and almost similar FPR as Standard Filter. The allowed deletions are smaller than other filters but it has less size than Counting Filter.