2021-04-16-Fri-Study
Apr 16, 2021
»
study
RL
DQN
Stabilizes learning about Q-functions through experience replay and frozen target networks.
It operates in a discrete space.
DPG
Policy is modeled to make deterministic decisions.
When a policy requires a single action, it computes the gradient of the policy function.
DDPG
It is a model-free off-policy actor-critic algorithm that combines DPG with DQN.
Although DQN originally operates in a discrete space, DDPG utilizes an actor-critic framework.
Using deterministic policy, the effect has been extended to a continuous space.
Noise can be added for better exploration.
Soft update is applied to actor network and critical network respectively.
DP4G
Several improvements are applied to reflect the distrubutional concept in the DDPG.
1.Distrubtional Critic
2. N-step returns
3. Multiple Distributed Parallel Actors
4. Prioritized Experienced Replay
MADDPG
So that multiple agents can process tasks with only local information
Expanding the DDPG into an environment of cooperation with each other.
Reinforce Algirutgm
Get data for one episode as a policy,
Update the parameters with that data,
Use the updated policy to get the experience of the next episode and
Also, iterative work to reinforce with that data.
*Proceed until parameters do not change or performance improvement does not occur.
Actor-Critic
Q Actor Critic
An actor-critic methodology that learns policy network and value network together
critic is updated in the direction of learning the value of the current policy function
The actor sees the result through the evaluation of Q and strengthens or weakens it.
Advantage Actor-Critic
Compared to the Q Actor Critic, the Actor-Critic has an advantage over the variability of the gradient estimate.
By reducing it, it helps learning efficiency.
However, it requires 3 pairs of Neural Nets.
TD Actor Critic
Needs one pair less than the Advantage Actor Critic (Q)