|
I'm writing a formula for a simulation where I want to draw a random age. What's the best probability distribution to use? Here's my psuedocode: function RandomAge(life_expectancy_at_birth, infant_mortality_rate) { return age } A related question: is there a website with a comprehensive list of phenomena and the best corresponding distribution? (i.e.: Height -- Normal, Time until your next phone call -- Exponential, etc.?) |
|
Look up the actuarial tables. The exponential model is not the best fit, as humans generally die of old age rather than just with an equal probability for every unit of time. That'd be terrifying. The Gompertz-Makeham Distribution is something closer to what you want. |
|
Time until next phone call obeys exponential distribution, until k calls follows Gamma distribution, number of phone calls in a day follows Poisson. You could just look up some website with common probability distributions and for each one see what it's used for. Looking at distribution of ages in US, it doesn't look like any "nice" distribution, as Alexandre says, if you want realistic sampling, you could take actual numbers and sample from that histogram. For instance, taking numbers from http://www.censusscope.org/us/chart_age.html, here's how you could generate age data obeying that distribution in Python |
|
I guess it depends on a lot of things. At first glance, the male/female ratio, recent wars, etc, do change the shape of this distribution a lot. See the wikipedia page on population pyramids for some examples. This page also has more detailed data on US ages, and by glancing at it it's not clear to me an easy fit for an exponential family. You might, however, get away with one of these simplifications:
None of these seems to approach the distribution from real data. If you can get real data, maybe bin it (as in, divide by groups of five years or something like that) and sample a bin by its proportion and then the age uniformly from the ages in that range. You could do that with the US data I linked to earlier. Also, beware of these usual distributions. Height is better modeled as a mixture of gaussians (at least one for male and one for female, maybe more if you have a lot of ethnical variety), exponentials for phone calls ignore the fact that your phone rings a lot more on certain times of the day, etc. It's best to choose as fine-grained a distribution as you can without compromising your model computationally, or making it too hard to specify. |