I'm writing a formula for a simulation where I want to draw a random age. What's the best probability distribution to use?

Here's my psuedocode:

function RandomAge(life_expectancy_at_birth, infant_mortality_rate) { return age }

A related question: is there a website with a comprehensive list of phenomena and the best corresponding distribution? (i.e.: Height -- Normal, Time until your next phone call -- Exponential, etc.?)

asked Aug 05 '10 at 14:30

Breck%20Yunits's gravatar image

Breck Yunits
31116

edited Dec 03 '10 at 07:09

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1893744214333


3 Answers:

Look up the actuarial tables. The exponential model is not the best fit, as humans generally die of old age rather than just with an equal probability for every unit of time. That'd be terrifying. The Gompertz-Makeham Distribution is something closer to what you want.

answered Aug 06 '10 at 01:36

dogy's gravatar image

dogy
3065918

Time until next phone call obeys exponential distribution, until k calls follows Gamma distribution, number of phone calls in a day follows Poisson. You could just look up some website with common probability distributions and for each one see what it's used for.

Looking at distribution of ages in US, it doesn't look like any "nice" distribution, as Alexandre says, if you want realistic sampling, you could take actual numbers and sample from that histogram. For instance, taking numbers from http://www.censusscope.org/us/chart_age.html, here's how you could generate age data obeying that distribution in Python

answered Aug 08 '10 at 12:56

Yaroslav%20Bulatov's gravatar image

Yaroslav Bulatov
1963193458

edited Aug 08 '10 at 16:26

I guess it depends on a lot of things. At first glance, the male/female ratio, recent wars, etc, do change the shape of this distribution a lot. See the wikipedia page on population pyramids for some examples. This page also has more detailed data on US ages, and by glancing at it it's not clear to me an easy fit for an exponential family. You might, however, get away with one of these simplifications:

  • Using a uniform distribution between [1:life expectancy]
  • Use an exponential distribution with gamma = life expectancy
  • Use a gamma distribution and fit your parameters to make it center around a reasonable value

None of these seems to approach the distribution from real data. If you can get real data, maybe bin it (as in, divide by groups of five years or something like that) and sample a bin by its proportion and then the age uniformly from the ages in that range. You could do that with the US data I linked to earlier.

Also, beware of these usual distributions. Height is better modeled as a mixture of gaussians (at least one for male and one for female, maybe more if you have a lot of ethnical variety), exponentials for phone calls ignore the fact that your phone rings a lot more on certain times of the day, etc. It's best to choose as fine-grained a distribution as you can without compromising your model computationally, or making it too hard to specify.

answered Aug 05 '10 at 21:15

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1893744214333

edited Aug 05 '10 at 21:19

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.