2021-03-29-Mon-Study

» study

RL

What is Reinforcement Learning

There are 3 kinds of maching learning
    1. Supervised Learning
        includes answer

    2. Reinforcement Learning
        fights itself

    3. Unsupervised Learning
        neither both 1 nor 2
        ex) Clustering

Reinforcement Learning

1. Easy but abstract version
    progress through tiral and error

2. Difficult but more accurate
    A learning process that corrects behavior through trial and error 
        to maximize cumulative rewards in sequential decision-making problems

Reward

1. Scalar, Not Vector
2. Does not include "HOW"
3. Could be Sparce and Delay

Agent and Environment

agent_environment

seems like continuous for "Time"
but it is descrete for sequential decision making process -> Time Step

Power of RL

1. Parallelism
2. The Charm of Self-Learning

MarKov

s = state
P = Probability
R = reward
γ = Attenuation factor
a = action
π = Policy
V = Value
G = Return
E[] = Expected value function
P[] = Probability value function
q[] = action value function

Markov Process

Markov_process

MP = (S, P)
Pss' : Transition Probability Matrix
Pss' = P[S(t+1) = s' | s(t) = s]
*The sum of the arrows extending out of one state is 1.

But why is this process called the Markov process?
    The reason whyy is because of Markov Propery
    -> The future is determined solely by the present.
    
    *Non-Markov State 
    ex) Consider the condition of the driver who is driving
            Let's say you take a picture at a specific point in 
            time and you only need to drive with that picture.
            -> You can't make decision whether to break or accelerate with just this picture

        But it can be a little markov state by offering 10 photos from the last 10 seconds, 
            even though it is not completely Markov

Markov Reward Process

Markov_Reward_Process

MRP = (S,P,R,γ)

Why γ is needed?
    γ can be prevented from having an infinite value, can reflect a person's preference, 
        and can reflect uncertainty about the future.

R = E[Rt | St = s]
Episode = S0,R0,S1,R1,S2,R2....,St,Rt
G = Rt+1 + γ*Rt+2 + γ^(2)*Rt+3 + ...

Strictly speaking, reinforcement learning is incorrect to say that its purpose is to learn to maximize rewards. 
    **Reinforcement learning is learning to maximize returns, not rewards.

Episode "Sampling"
    Since each episode is visited in a different state, sampling is necessary. 
        A methodology that infers a certain value through sampling such as the monte-carlo approach is used.

Markov Decision Process

markov_decision_process

MDP = (S,A,P,R,γ)    
A: action
P: P^(a)ss' = p[St+1 = s' | St = s, At = a]
R: R^(a)s = E[Rt+1 | St = s, At = a]

For Continous Decision Making, we need Decision. 
    Therefore here comes "agent" who influences state

Policy is needed to make reward max by selecting action

π(a|s) = p[At = a|St = s] : Probability of choosing action a in state s

Vπ(s) = Eπ[Gt|St = s]

qπ(s,a) = Eπ[G|St=s, At = a]

Solving MDP
Prediction
    The problem of evaluating the value of each state given π
Control
    The problem of finding the optimal policy π*
        When following π*, Value function becomes "optimal Value function"

c


file function --> include <stdio.h> for c, <cstdio> for cpp    
    FILE* fopen (const char* fileName, const char* fileMode)

        FILE *(NAME) = fopen(("file name"),(mode))
            mode -> r, w, a, r+, w+, a+ 
                r: read mode
                w: write mode
                a: append mode
                r+ read, write, if no file = error
                w+ read, write, if no file = make file
                a+ read, write, if no file = make file
    fclose(const char* fileName)
    
    int fputc(int c, FILE *stream); 
        save one letter into steram, if save success -> return c, if not -> return EOF

    int fgetc(FILE *stream);  
        read one letter from pointed steram, if read success -> 
        return letter, if not(reached to the EOF) -> return EOF
    
            ex)

            while(EOF != (ch = fgetc(ptr_file))) 
            {
                fputc(ch, stdout);
            }
    
    char *fgets(char * restrict s, int n, FILE * restrict stream)
        
        This function reads a string from the specified stream.

        The first argument of this function is the address where the read string will be stored, 
            and the third argument is a pointer to a FILE structure variable to determine the stream.

        Reads one or fewer characters than the maximum number of input characters 
            passed in as the second argument, or until the end of the file is reached.

        This function returns the address where the read string is stored 
            if the read is successful, and NULL if the end of the file is reached or the read fails.

    int fputs(const char * restrict s, FILE * restrict stream);  
        
        This is a function that prints (saves) a string to the specified stream.
        
        The first argument of this function is the address of the string to be written, 
            and the second argument is a pointer to a FILE structure variable to determine the stream.

        If the stream passed as an argument is stdout, it prints a string to the monitor, 
            and if it is a file, it saves the string to the file.

        This function returns a non-negative value if the write (save) is successful, 
            and EOF if the save fails.

            ex) 
                int main(void)
            {
                char str[100];

                FILE* ptr_src = fopen("text_readonly.txt", "r");
                FILE* ptr_dst = fopen("text_writeonly.txt", "w");

                while(fgets(str, 100, ptr_src) != NULL) // NULL -> stdlib
                {
                    fputs(str, ptr_dst);                
                }

                return 0;
            }  

    int fscanf(FILE * restrict stream, const char * restrict format, ...);  

        This function reads a string from the specified stream 
            using various format conversion characters.

        The second argument is the format of the string to be read.

        This function returns the number of variables read 
            if the read is successful, and EOF if the read fails.

    int fprintf(FILE * restrict stream, const char * restrict format, ...);  

        This is a function that prints (saves) a string 
            using various format conversion characters in the specified stream.

        The first argument of this function is a pointer to a FILE structure variable to determine the stream, 
            and the second argument is the format of the string to be output.

        This function returns the size of the saved string in bytes 
            if the write (save) is successful, and a negative number if it fails.
        

            ex)
            #include <stdio.h>
            #include <stdlib.h>

            int main(void)
            {
                int scan_num, int_num;
                double double_num;
                char str[100];

                FILE* ptr_file = fopen("text_fscanf.txt", "r");

                while(scan_num = (fscanf(ptr_file, "%d %lf %s", &int_num, &double_num, str)) != EOF)
                {         
                    fprintf(stdout, "%d %f %s\n", int_num, double_num, str);
                }
                return 0;
            }

feof((NAME)): check whether file pointer is the end of the file

sscanf
    int sscanf (const char * buffer ,const char *format,...);  
        Function to read by specifying the format from the buffer
    
    ex)
    
    #pragma warning(disable:4996) // warning about not recommended function use
    #include <stdio.h>
    
    int main (void)
    {
        char buffer[256]="name:Kildong Hong num:12 age:20";
        char name[30];
        int num;
        int age;
    
        //Set the contents of the buffer to the name, number, and age variables
        sscanf(buffer,"name:%s num:%d age:%d",name,&num,&age);
        //Print name, number, and age
        printf("Name:%s Number:%d Age:%d \n",name,num,age);
        return 0;
    }
    return : 
    Name:kildong hong Number:12 Age:20
    
    //*int -> &, char -> none

strtok: string truncation
    char* strtok(char* str, char* delimiters);
    <string.h> for c, <cstring> for cpp

    *strtok scans the target string from left to right. Whenever it finds a delimiter, 
    it replaces the delimiter with a "null" termination character and returns the extracted token. 

    ex)
    #include<stdio.h>
    #include<string.h>    //C++ : <cstring>
    
    int main(void)
    {
        char str[] = "Kim,Park,Lee,Choi,Seo";
    
        char *ptr = strtok(str, ",");    
    
        while (ptr != NULL)
        {
            printf("%s\n", ptr);        
            ptr = strtok(NULL, ",");
        }
    
        return 0;
    }
 
    return:

    Kim
    Park
    Lee
    Choi
    Seo

    "\t":  refers to one tab horizontal space

Linux:

Command

pwd: shows which directory you are working in

There are two directory
    Absolute Directory: all paths from the top-level directory to the target directory
        (*always keep in mind that there is a always a top-level directory (/) at the beginning )

    Relative Directory: move based on where you are currently(*mark as ".")

    cd ../../ means go backward twice from your directory
    cd means home directory
    cd / means root directory
    cd ./ means relative directory

Makefile

Make is an automation tool primarily used by Linux when building programs.

The file that describes the build process with the standard syntax for make is Makefile.

If you save a script named Makefile in the working directory of the program being developed, 
conveniently, you can have make interpret it and automatically perform the build.

*Macro is used to make name for convinient

*.PHONY is used to prevent that "make" misunderstands that it is already ended target.

make

Link rules are grammars that define how to compile and link targets.

Here, target means the (target) file to be created as a result of the build,

The rule for each target consists of a set of targets, dependencies, and instructions.

    -Red Box: This is the target. Targets are separated by a colon after the name. 
    (The colon is the most prominent feature in the 'rule' grammar.)

    -Blue box: This is a dependency. 
        The materials required to make the target are listed separated by spaces.

    -Green box: Bash command to build the target.