Skizze - A probabilistic data-structures service and storage (Alpha)

At my day job we deal with a lot of incoming data for our product, which requires us to be able to calculate histograms and other statistics on the data-stream as fast as possible. One of the best tools for this is Redis, which will give you 100% accuracy in O(1) (except for its HyperLogLog implementation which is a probabilistic data-structure). All in all Redis does a great job. The problem with Redis for me personally is that, when using it for 100 of millions of counters, I could…

Keep reading

Counting flows (Semi-evaluation of CMS, CML and PMC)

Assume we have a stream of events coming in one at a time, and we need to count the frequency of the different types of events in the stream. In other words: We are receiving fruits one at a time in no given order, and at any given time we need to be able to answer how many of a specific fruit did we receive. The most naive implementation is a dictionary in the form of <string, int>, and is most accurate and suitable for streams with limited…

Keep reading