BloomFilters-Benchmarking

Bloom filter is a probabilistic data structure that is similar to hash tables and allow the trade-oﬀ between false positive rate of querying data and the storage space reduction.

Filters Included in Benchmark:

1. Standard Bloom Filter

2. Counting Bloom Filter:

Dillinger P.C., Manolios P., Bloom Filters in Probabilistic Verification. In: Hu A.J., Martin A.K. (eds) Formal Methods in Computer-Aided Design. FMCAD 2004. Lecture Notes in Computer Science, vol 3312. Springer, Berlin, Heidelberg

3. Ternary Bloom Filter:

Lim H, Lee J, Byun H, Yim C. Ternary Bloom Filter Replacing Counting Bloom Filter. IEEE Communications Letters. 2017; 21: 278–281.

4. Auto-Scaling Bloom Filter:

Denis Kleyko, Abbas Rahimi, Evgeny Osipov: Autoscaling Bloom Filter: Controlling Trade-off Between True and False Positives. CoRR abs/1705.03934, 2017

5. Deletable Bloom Filter:

C. E. Rothenberg, C. A. B. Macapuna, F. L. Verdi, and M. F. Magalhaes, “The deletable bloom filter: a new member of the bloom family”, IEEE Communications Letters, vol. 14, no. 6, pp. 557–559, 2010

Repository Contents:

/Experiments
Code of experiment held for comparison between different filters + Results of the experiment.

/Filters Implementation
Classes of implementation of filters.

/URLShortener_App
Django Application for URL Shortening

URL-Shortening Application:

One of the applications that can use Bloom Filters is the URL Shortening Applications. The process is to make a call to the server, which generates a fresh URL and sends it back. A bloom filter can be used to tell if this URL has already been generated earlier, and keep generating new ones till it returns false. As the filter is in memory, this tends to be cheaper than querying a database.

We have implemented a Web Application for URL shortening. You can try it in both cases of using Standard Bloom Filter or directly accessing the database and measure the time difference. Our application shows that we get faster performance using Std. Bloom Filter by around 14.6 times which will get much more by filling the database more with URLs.
Our URL Shortener Can be found here: http://35.229.23.124/

Experiment:

As our application is related to URL shortening:

a dataset of 15k URLs were collected.
Unique Random Codes are generated for each URL.
Each code is converted to a unique integer using ASCII values of characters in code.
12.5k codes are going to be stored in each filter and 5k codes are going to be queried in the filter where 50% of them only are already stored in the filter to count false negatives while the other 50% are used to count false positives.

Results:

The plot below shows logarithm of the number of false positives from each filter type Vs the filter size.

The plots below shows logarithm of the number of false positives and false negatives from each filter type Vs number of elements deleted from filters.

Ternary Filter has many indeterminable events which leads to least FPR, and FNR.
Standard Filter has almost similar FPR like other filters and it doesn’t support deletion. So, no False negatives.
AutoScaling Filter has the largest FNR but by deleting more elements its FRP decreases. This can be tuned by having different decision threshold, and binarization values.
Counting Filter has no FNR, and also least FPR by deleting more elements from the filter. However, the filter size is the largest.
Deletable Filter has no false negatives and almost similar FPR as Standard Filter. The allowed deletions are smaller than other filters but it has less size than Counting Filter.