Revision history[back]
click to hide/show revision 1
Revision n. 1

Dec 21 '11 at 13:24

Daniel%20Peck's gravatar image

Daniel Peck
21114

Count Sketch and Hashing Functions

Hello everyone, first question here,

I'm doing some work on stream processing within our services, beginning with something simple like hostnames being connected to within the last timeslice, but also which hostnames are most outside of their normal range. Similar to what twitter/google/etc do with trending topics.

I've done some research into algorithms for doing this and it seems that Count Sketch would be a solid approach. However after reading through the paper that introduced it a few times I'm still not quite understanding a few things. Specifically the hash functions (both for distributing to multiple buckets and also the one referred to as 's' in the paper that hashes objects to {-1.+1}) , and how one would go about selecting appropriate ones.

Pseudocode examples very much appreciated. Thank you

click to hide/show revision 2
Revision n. 2

Dec 23 '11 at 18:22

Daniel%20Peck's gravatar image

Daniel Peck
21114

Count Sketch and Hashing Functions

Hello everyone, first question here,

I'm doing some work on stream processing within our services, beginning with something simple like hostnames being connected to within the last timeslice, but also which hostnames are most outside of their normal range. Similar to what twitter/google/etc do with trending topics.

I've done some research into algorithms for doing this and it seems that Count Sketch would be a solid approach. However after reading through the paper that introduced it a few times I'm still not quite understanding a few things. Specifically the hash functions (both for distributing to multiple buckets and also the one referred to as 's' in the paper that hashes objects to {-1.+1}) , and how one would go about selecting appropriate ones.

Pseudocode examples Specific hash examples, and why they'd work well in this situation are very much appreciated. Thank you

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.