|
Hi all, I would like to ask is there a method to determine the cut-off point in a power-law distribution? I am doing some tagging study and I only want to take the top "x" tags, so how do I determine the "x"? Can I just use some arbitrary number (e.g. top 20%) or I can set a threshold say, if count(tag) > 10? |
|
Power laws are heavy tailed (that is, a lot of their probability mass is in infrequent things) so there is no natural cut-off point (unlike, say, normal distributions, where the tail is very light and can be safely ignored). You're better off having some downstream performance metric using which you can optimize the cutoff. |