Wed 21 Jun 2006
Bloom Filters
Posted by dkaz under Programming
One of the more interesting data structures I’ve run into lately is the Bloom Filter. Bloom filters are very good at performing large membership queries very quickly. Some important features of the Bloom Filters were nicely summarized by lectures at the Wash. U. of St.Louis:
- A bloom filter computes k hash functions on an input
- Each resulting hash value is then queried to determine membership
- If each hash function test for membership results in a match, a potential match is reported.
- A bloom filter will never give a false negative, but it can return a false positive.
- To analyze a bloom filter the following parameters are used:
- n = number of strings to be stored
- k = number of hash functions
- m = the size of the bit-array (memory)
- The false positive probability is f = (1/2)^k
- The optimal value of hash functions is k = (m/n)ln2

June 22nd, 2006 at 6:28 pm
We used Bloom filters to implement a feature which disallowed the selection of dictionary words for passwords. It worked very well for this as it uses much less memory than storing 20,000 words (even compressed) and is very fast. A few false positives don’t matter.