I have data representing a couple hundred of independent experiments. Each one contains time - how long did the experiment took and outcome: positive and negative. There is 10% of positive outcomes.

I'm trying to find a time range, for which probability of a positive event is highest. The time range should be longer than a given value, or should contain more than a given number of experiments.

I've tried applying C4.5 algorithm to this problem, but can't find a way to impose the lower limits on time, nor number of events in found range. C4.5 allows for limiting number of events in each leaf, but I care only about the positive range, not the others.

I'm trying to invent something myself, but I'm sure some solutions already exist. Could you please point me to them?

asked Feb 07 '14 at 08:33

Szymon%20Sobczak's gravatar image

Szymon Sobczak
1111

Perhaps look into regularized Poisson regression as part of the glmnet package in R. It's used for modelling time until a class label/categorical outcome.

(Feb 09 '14 at 02:47) Jeremiah M
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.