One of the more interesting data structures I’ve run into lately is the Bloom Filter. Bloom filters are very good at performing large membership queries very quickly. Some important features of the Bloom Filters were nicely summarized by lectures at the Wash. U. of St.Louis:

  • A bloom filter computes k hash functions on an input
  • Each resulting hash value is then queried to determine membership
  • If each hash function test for membership results in a match, a potential match is reported.
  • A bloom filter will never give a false negative, but it can return a false positive.
  • To analyze a bloom filter the following parameters are used:
    • n = number of strings to be stored
    • k = number of hash functions
    • m = the size of the bit-array (memory)
  • The false positive probability is f = (1/2)^k
  • The optimal value of hash functions is k = (m/n)ln2