Deep Q Learning

기원

알파고를 개발한 DeepMind의 "Playing Atari with Deep Reinforcement Learning"이라는 논문에서 처음 소개됨

개념

Deep Q-Networks
Convolutional networks를 이용하여 learning을 하는 방법

The model is a convolutional neural network, whose input is raw pixels and whose output is a value function estimating future rewards.

즉 raw pixel들을 input으로해서 CNN을 function approximator로 이용하여 value function을 output으로 내고있다. 그 value function은 future reward를 추정하는데 이용되는 재료가 된다.

RL and DL

The most successful approaches are trained directly from the raw inputs, using lightweight updates based on stochastic gradient descent.

RL 분야에서 agent가 high-dimensional한 vision 및 language data들을 직접 다루는 것이 어려웠고, 그나마 agent가 다룰 수 있게 hand-crafted features를 이용하는 것이 전부였다. 현실 문제를 풀기 위해서는 반드시 high dimensional data를 다루는 것이 반드시 해야할 과제였다. DL분야에서 RNN, CNN, RBM 등 high dimensinal data를 다루는 방법이 나오고 RL에 이 방법론을 적용하고자 하는 시도를 한다. 두 분야에서는 data inpuut이나 분포 등에서 차이가 존재하기 때문에, RL에 DL을 적용할 때 문제가 발생한다.

ISSUE

reinforcement learning presents several challenges from a deep learning perspective. Firstly, most successful deep learning applications to date have required large amounts of handlabelled training data.

성공적인 딥러닝은 handlabelled training data를 요구하는데, RL은  reward를 통해 학습이 이루어지고, 그 reward도 sparse하고 noisy 심지어는 delay되어 주어진다.

Another issue is that most deep learning algorithms assume the data samples to be independent, while in reinforcement learning one typically encounters sequences of highly correlated states.

현재 state의 상태에 따라 다음 state 결정되기 때문에, state 간 correlation이 크다

https://sumniya.tistory.com/18