2021-04-09-Fri-Study

» study

GEO Diff

C

# include <sys/socket.h>
int setsockopt(int socket, int level, int option_name, const void *option_value, socklen_t option_len)
    Function to change the attribute value of the created socket

int getsockname(int socket, struct socketaddress *address, socketlength * addresslength)
    Socket descriptor gets own address allocated.

------------

RL

Evaluating the value when you don’t know the MDP

We don't know the reward function and the probability of the transition.
1.Monte-Carlo
    Since we are dealing with small MDPs, we create a table lookup method,
         that is, create a table and record the values ​​in the table.
    It proceeds by updating the value little by little.
    
    When you take action, you get a reward

    A. Table initialization
        N(s) = total number of visits to s, and each visit adds 1
        V(s) = Record the total number of returns experienced in the state
    B. Gaining experience
        s0,a0,r0,s1,a1,r1,s2,a2,r2....sT
    C. Update table
        Update on all status
    D. Value calculation
        Have plenty of experience and table updates
            Approximate value of each state
                = V(s) / N(s) = sum of returns / number of visits