|
I have data representing a couple hundred of independent experiments. Each one contains time - how long did the experiment took and outcome: positive and negative. There is 10% of positive outcomes. I'm trying to find a time range, for which probability of a positive event is highest. The time range should be longer than a given value, or should contain more than a given number of experiments. I've tried applying C4.5 algorithm to this problem, but can't find a way to impose the lower limits on time, nor number of events in found range. C4.5 allows for limiting number of events in each leaf, but I care only about the positive range, not the others. I'm trying to invent something myself, but I'm sure some solutions already exist. Could you please point me to them? |
Perhaps look into regularized Poisson regression as part of the
glmnetpackage inR. It's used for modelling time until a class label/categorical outcome.