Suppose I have an algorithm that performs on a dataset but needs some parameter values that should be specified by the user before ruing the algorithm. At the and of execution we can evaluate the performances of the algorithm and see if the chosen parameter values were good or not. There is no way to know a-priori what are the best matching values, unless we perform many execution and evaluations until we found the parameter values that gives the best results. In such a case, might it be useful to use reinforcement learning to automatically adapt the parameter values during the execution ? How ? What may be the states, actions and rewards ?

asked Feb 02 '12 at 06:55

shn's gravatar image

shn
462414759

edited Feb 02 '12 at 13:17

2

I'm not familiar with reinforcement learning, but this problem was tackled in a different way in this paper: Algorithms for Hyper-Parameter Optimization

(Feb 02 '12 at 07:24) Sander Dieleman

I have been toying with the exactly same idea (there must be a reason for that!). I realized that it somehow didn't quite make sense, but never so clearly as now. Thanks :-)

(Apr 15 '12 at 08:05) maxy

2 Answers:

Adding to Sander Dieleman's comment, there is a reason why reinforcement learning as-is is not a good model for hyperparameter optimization. All reinforcement learning algorithms try to deal with situations where an agent's actions are changing the state of the world, and this agent is being rewarded according to these changes in state. In the hyperparameter optimization setting there is no state of the world: the only thing that changes when you evaluate at a certain point is your knowledge about the value of the function at that point. For this reason the actual task is maximizing an unknown function with stochastic observations in a given budget, a task which is explored in the paper Sander mentioned.

answered Feb 02 '12 at 13:37

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Let's suppose that the problem is a clustering algorithm which is dependent on some parameters. If you consider for example that the action is changing (increasing/decreasing) the parameter values, you can consider that the state of the world is the parameter values associated to some cluster representatives. After performing an action, the reward could be obtained by an evaluation measure of clustering. Is that not possible ?

(Feb 02 '12 at 14:36) shn
1

That is possible, but it's adding unnecessary complexity to the problem. What if your state is a given settings of the parameters, actions are changing the state, and rewards are the scores of states? If so, there is really no reason to do tricky reinforcement learning things like credit assignment and whatnot: you can change any parameter you want whenever you want, you observe rewards immediately, and there is no reason to visit the same state multiple times or do temporal difference or really any other reinforcement technique apart from function approximation.

(Feb 02 '12 at 14:38) Alexandre Passos ♦

@AlexandrePassos Can you point me to how to use any simple function approximation in order to automatically adapt the parameter values of the algorithm ?

(Feb 07 '12 at 10:22) shn

The paper Sander cited gives a really good example and discussion of this.

(Feb 07 '12 at 10:24) Alexandre Passos ♦

here is a related paper on basis parameter adaptation (u can extend to other parameter adaptation, though it takes effort): http://people.cs.umass.edu/~mahadeva/papers/aaai2013-mdba.pdf

answered Apr 10 '14 at 22:10

BO%20LIU's gravatar image

BO LIU
21668

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.