What is the differecne between a "stochastic world" and a "partially observable world", and how does that it change the problem that an RL agent has to solve?
I have read "Reinforcement Learning: An Introduction" by Sutton and Barto. In the textbooks question Noel Welsh comments about it:
I have noticed that many examples use deterministic environments, but I understood that almost all methods also work for stochastic environments. Does it really make a difference to the learning strategy (or the function approximator) of an agent if the environment is not fully observable? I have recently started to read about the concept of causality, is it maybe connected to that?
asked Apr 15 '12 at 07:45
In a stochastic world there is a transition probability distribution which determines which state the agent moves to next given the current state and the action it takes e.g. in the GridWorld, the robot takes an action that is supposed to move it to a neighbouring square but with some probability, it ends up failing and lands in a random square. In general, MDPs have stochastic transitions but they are completely observable, the agent always knows exactly which state it's in.
In a partially observable setting, the current state is unknown and the best we can do is indirectly obtain information about it through an observation model (POMDPs) or a test (PSR's). This adds an extra level of complexity to a planning problem because we have to choose the best action according to our belief state (distribution over all possible states). Also we must take into account our uncertainty in future states when planning (e.g we might not be able to execute plans which depend on us knowing which state we will be in in the future).
answered Apr 26 '12 at 22:05