I have a set of examples, which are each annotated with feature data. The examples and features describe the settings of an experiment in an arbitrary domain (e.g. number-of-switches, number-of-days-performed, number-of-participants, etc.). Certain features are fixed (i.e. static), while others I can manually set (i.e. variable) in future experiments. Each example also has a "reward" feature, which is a continuous number bounded between 0 and 1, indicating the success of the experiment as determined by an expert.

Based on this example set, and given a set of static features for a future experiment, how would I determine the optimal value to use for a specific variable so as to maximise the reward?

Also, does this process have a formal name? I've done some research, and this sounds similar to regression analysis, but I'm still not sure if it's the same thing.

asked Apr 05 '11 at 15:14

Cerin's gravatar image

Cerin
402253744


One Answer:

This is indeed a variant of regression. In machine learning circles this is a multivariate regression problem: given the fixed features, predict the values for the variable features to maximize the expected reward value. The usual approach to this problem is learning a function from all features (both variable and fixed) that predicts the reward very well, and then fixing the fixed features and maximizing over the previous features at the time of the experiment. This last step is called "prediction" or "inference".

To solve this problem, first look towards making some simplifying assumptions. For example, are the reward values conditional on each variable feature independently (in which case you can learn a predictor for each variable feature as a subproblem) or does the contribution to the reward of a variable feature depend on the values of the other features (in which case this problem is referred to as "structured learning")?

I suggest you try a simple approach first: try learning a regularized linear regression function given all features (for example, using a ridge regression algorithm) and then, for new examples, you can individually choose the value for each variable feature to maximize reward. Bear in mind that this only works out-of-the-box if the variable features are discrete or have bounded values; if not, you can discretize them as in a histogram (divide the value space into bins of equal sign and use a discrete feature for each bin).

If this doesn't give satisfactory results, you can choose a more complex model in many different ways: using kernels, determining a dependence structure for your variable features and using a conditional random field, using a neural network, etc, but it is always important to first make sure the simpler techniques will work.

answered Apr 05 '11 at 15:35

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

1

Just to expand on what Alexadre said, selecting each variable to give you the highest reward will only work if you all your variables are independent. i.e. there are no interactions between them. There is a whole field I came across while I was in manufacturing about how to design experiments to predict the maximum reward with multiple variables. I don't see why the techniques couldn't be used applied here. Wikipedia gives a good intro to the subject http://en.wikipedia.org/wiki/Design_of_Experiments

(Apr 05 '11 at 15:55) Scott Frye
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.