Big Data

Skizze: Behind the Scenes of Alpha 2

Based on the feedback we got for our initial alpha release, we worked on improving Skizze and moving the project forward. To recap, Skizze is a sketch data store to deal with all problems around counting and sketching using probabilistic data-structures. My old time hacking buddy Neil Patel, who is also Xamarin Insights Technical Lead and Architect, blogged about the latest release, and also provided some background on why Skizze exists, and how to get started. This second alpha focuses mainly on improving development and operating experience. It is an…

Keep reading

Skizze progress and REPL

Over the last 3 weeks, based on feedback we proceeded fledging out the concepts and the code behind [Skizze](https://github.com/skizzehq/skizze). [Neil Patel](https://medium.com/@njpatel/) suggested the following: So I've been thinking about the server API. I think we want to choose one thing and do it as well as possible, instead of having six ways to talk to the server. I think that helps to keep things sane and simple overall. Thinking about usage, I can only really imagine Skizze in an environment like…

Keep reading

Skizze - A probabilistic data-structures service and storage (Alpha)

At my day job we deal with a lot of incoming data for our product, which requires us to be able to calculate histograms and other statistics on the data-stream as fast as possible. One of the best tools for this is Redis, which will give you 100% accuracy in O(1) (except for its HyperLogLog implementation which is a probabilistic data-structure). All in all Redis does a great job. The problem with Redis for me personally is that, when using it for 100 of millions of counters, I could…

Keep reading

Counting flows (Semi-evaluation of CMS, CML and PMC)

Assume we have a stream of events coming in one at a time, and we need to count the frequency of the different types of events in the stream. In other words: We are receiving fruits one at a time in no given order, and at any given time we need to be able to answer how many of a specific fruit did we receive. The most naive implementation is a dictionary in the form of <string, int>, and is most accurate and suitable for streams with limited…

Keep reading