|
Let's say I have 2 functions, A and B, which do exactly the same thing but might have different costs depending on input data. At any time I can choose to call A or B and I want to minimize the total cost. So, if I have 10K data elements, I will make 10K calls, each call either to A or to B. It's not possible to predict the cost of a function, so I have to make decisions based on previous calls. What I have tried is to keep a history of recent calls and at each call select the function with the best recent cost. I also need to have some form of exploration, to find better functions. To accomplish this, each N calls, I switch to a random function. In the picture below, y axis is the cost and x axis is the call number. You can see that most of the time, B achieves the best cost. But, after some time, A is better. The yellow line is for the heuristic I decribed above. The small spikes are caused by the exploration. The problems with my heuristic are: the exploration part degrades the cost (most of the time the other function is not better), it is slow to react to cost changes. Although I can tweak it to get better results for some scenarios, I am looking for a more general method. So, which methods could be applied to this? Right now, I am looking at Tabu search and other metaheuristics, but I'm not sure they are suited for this.
|
|
What you have is essentially a two-armed bandit scenario, and the algorithm you're using is called epsilon-greedy. There are other, more efficient, algorithms for this. An optimal strategy which is simple to implement, called UCB (some theoretical slides about it here), is to first try each function randomly. Then, estimate the mean and the standard deviation for the runtime of each function. At each point in time, then, choose one of the functions based on mean + alpha* std, where alpha is a parameter you tune to decide how much you should explore. After a while this will converge on always using the best function. There's a simple extension of this for the case where you have some features of your input that will help you predict each function's runtime. Then, instead of having a mean and a standard deviation you solve a linear regression from the features to the runtime, and use some measure of variance of that to decide what to do. This is called LinUCB. Thank you. This looks promising.
(Mar 29 '12 at 04:54)
aenos
From the provided plot it seems that the cost distributions are highly non-stationary, so the bandit setting is too limited?
(Apr 09 '12 at 12:40)
schaul
If the non-stationarity can be captured with a feature-based representation then something aking to LinUCB should do well. If it can't, I find it hard to think of something online which will behave adequately.
(Apr 09 '12 at 12:53)
Alexandre Passos ♦
I'm not familiar with feature-based representations. Does it mean that I need to be able to tell (predict) when the reward distribution changes? For example, in my plot, towards the end, the A machine suddenly changes behavior. I am probably not able to do this.
(Apr 11 '12 at 05:48)
aenos
Not necessarily you being able to predict, but can you observe at least some characteristics which might act as a proxy for the speed of a machine? For example, if you know its load, swap space, time from last failure, time to respond to last request, etc, you might be able to predict something. Otherwise this looks completely random, and then no strategy will do very good at it (except for redundancy).
(Apr 11 '12 at 06:07)
Alexandre Passos ♦
In my case the reward could be influenced by swap space for example, but it could also be influenced by the input data. I was actually trying to avoid measuring specific characteristics because there could be unknown factors affecting the reward and it would be nice if it just magically worked.
(Apr 11 '12 at 06:20)
aenos
showing 5 of 6
show all
|
