I have a large set of discrete long probability distributions (500+ items). I have a fairly intense task that requires calculating the KL-divergence between millions of these distributions, which would take several gigabytes to store in memory.

Is there a good way to calculate the distribution divergence of a truncated distribution?

Here's an example: My truncation value is .2

[.5, .3, .1, .05, .05]

[.1, .01, .09, .7, .1]

correspond to "sparse distributions" of

1:.5 2:.3

4:.7

One option, is to relace the missing values with the average of all truncated values. A reconstructed distribution 1 would be: [.5, .3, .067, .067, .067]

Is there a more appropriate way to do things?

asked Jan 03 '11 at 19:15

Clay%20Woolam's gravatar image

Clay Woolam
1111

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.