2021-04-09-Fri-Study
Apr 9, 2021
»
study
GEO Diff
C
# include <sys/socket.h>
int setsockopt(int socket, int level, int option_name, const void *option_value, socklen_t option_len)
Function to change the attribute value of the created socket
int getsockname(int socket, struct socketaddress *address, socketlength * addresslength)
Socket descriptor gets own address allocated.
------------
RL
Evaluating the value when you don’t know the MDP
We don't know the reward function and the probability of the transition.
1.Monte-Carlo
Since we are dealing with small MDPs, we create a table lookup method,
that is, create a table and record the values in the table.
It proceeds by updating the value little by little.
When you take action, you get a reward
A. Table initialization
N(s) = total number of visits to s, and each visit adds 1
V(s) = Record the total number of returns experienced in the state
B. Gaining experience
s0,a0,r0,s1,a1,r1,s2,a2,r2....sT
C. Update table
Update on all status
D. Value calculation
Have plenty of experience and table updates
Approximate value of each state
= V(s) / N(s) = sum of returns / number of visits