2021-03-29-Mon-Study
Mar 29, 2021
»
study
RL
What is Reinforcement Learning
There are 3 kinds of maching learning
1. Supervised Learning
includes answer
2. Reinforcement Learning
fights itself
3. Unsupervised Learning
neither both 1 nor 2
ex) Clustering
Reinforcement Learning
1. Easy but abstract version
progress through tiral and error
2. Difficult but more accurate
A learning process that corrects behavior through trial and error
to maximize cumulative rewards in sequential decision-making problems
Reward
1. Scalar, Not Vector
2. Does not include "HOW"
3. Could be Sparce and Delay
Agent and Environment
seems like continuous for "Time"
but it is descrete for sequential decision making process -> Time Step
Power of RL
1. Parallelism
2. The Charm of Self-Learning
MarKov
s = state
P = Probability
R = reward
γ = Attenuation factor
a = action
π = Policy
V = Value
G = Return
E[] = Expected value function
P[] = Probability value function
q[] = action value function
Markov Process
MP = (S, P)
Pss' : Transition Probability Matrix
Pss' = P[S(t+1) = s' | s(t) = s]
*The sum of the arrows extending out of one state is 1.
But why is this process called the Markov process?
The reason whyy is because of Markov Propery
-> The future is determined solely by the present.
*Non-Markov State
ex) Consider the condition of the driver who is driving
Let's say you take a picture at a specific point in
time and you only need to drive with that picture.
-> You can't make decision whether to break or accelerate with just this picture
But it can be a little markov state by offering 10 photos from the last 10 seconds,
even though it is not completely Markov
Markov Reward Process
MRP = (S,P,R,γ)
Why γ is needed?
γ can be prevented from having an infinite value, can reflect a person's preference,
and can reflect uncertainty about the future.
R = E[Rt | St = s]
Episode = S0,R0,S1,R1,S2,R2....,St,Rt
G = Rt+1 + γ*Rt+2 + γ^(2)*Rt+3 + ...
Strictly speaking, reinforcement learning is incorrect to say that its purpose is to learn to maximize rewards.
**Reinforcement learning is learning to maximize returns, not rewards.
Episode "Sampling"
Since each episode is visited in a different state, sampling is necessary.
A methodology that infers a certain value through sampling such as the monte-carlo approach is used.
Markov Decision Process
MDP = (S,A,P,R,γ)
A: action
P: P^(a)ss' = p[St+1 = s' | St = s, At = a]
R: R^(a)s = E[Rt+1 | St = s, At = a]
For Continous Decision Making, we need Decision.
Therefore here comes "agent" who influences state
Policy is needed to make reward max by selecting action
π(a|s) = p[At = a|St = s] : Probability of choosing action a in state s
Vπ(s) = Eπ[Gt|St = s]
qπ(s,a) = Eπ[G|St=s, At = a]
Solving MDP
Prediction
The problem of evaluating the value of each state given π
Control
The problem of finding the optimal policy π*
When following π*, Value function becomes "optimal Value function"
c
file function --> include <stdio.h> for c, <cstdio> for cpp
FILE* fopen (const char* fileName, const char* fileMode)
FILE *(NAME) = fopen(("file name"),(mode))
mode -> r, w, a, r+, w+, a+
r: read mode
w: write mode
a: append mode
r+ read, write, if no file = error
w+ read, write, if no file = make file
a+ read, write, if no file = make file
fclose(const char* fileName)
int fputc(int c, FILE *stream);
save one letter into steram, if save success -> return c, if not -> return EOF
int fgetc(FILE *stream);
read one letter from pointed steram, if read success ->
return letter, if not(reached to the EOF) -> return EOF
ex)
while(EOF != (ch = fgetc(ptr_file)))
{
fputc(ch, stdout);
}
char *fgets(char * restrict s, int n, FILE * restrict stream)
This function reads a string from the specified stream.
The first argument of this function is the address where the read string will be stored,
and the third argument is a pointer to a FILE structure variable to determine the stream.
Reads one or fewer characters than the maximum number of input characters
passed in as the second argument, or until the end of the file is reached.
This function returns the address where the read string is stored
if the read is successful, and NULL if the end of the file is reached or the read fails.
int fputs(const char * restrict s, FILE * restrict stream);
This is a function that prints (saves) a string to the specified stream.
The first argument of this function is the address of the string to be written,
and the second argument is a pointer to a FILE structure variable to determine the stream.
If the stream passed as an argument is stdout, it prints a string to the monitor,
and if it is a file, it saves the string to the file.
This function returns a non-negative value if the write (save) is successful,
and EOF if the save fails.
ex)
int main(void)
{
char str[100];
FILE* ptr_src = fopen("text_readonly.txt", "r");
FILE* ptr_dst = fopen("text_writeonly.txt", "w");
while(fgets(str, 100, ptr_src) != NULL) // NULL -> stdlib
{
fputs(str, ptr_dst);
}
return 0;
}
int fscanf(FILE * restrict stream, const char * restrict format, ...);
This function reads a string from the specified stream
using various format conversion characters.
The second argument is the format of the string to be read.
This function returns the number of variables read
if the read is successful, and EOF if the read fails.
int fprintf(FILE * restrict stream, const char * restrict format, ...);
This is a function that prints (saves) a string
using various format conversion characters in the specified stream.
The first argument of this function is a pointer to a FILE structure variable to determine the stream,
and the second argument is the format of the string to be output.
This function returns the size of the saved string in bytes
if the write (save) is successful, and a negative number if it fails.
ex)
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int scan_num, int_num;
double double_num;
char str[100];
FILE* ptr_file = fopen("text_fscanf.txt", "r");
while(scan_num = (fscanf(ptr_file, "%d %lf %s", &int_num, &double_num, str)) != EOF)
{
fprintf(stdout, "%d %f %s\n", int_num, double_num, str);
}
return 0;
}
feof((NAME)): check whether file pointer is the end of the file
sscanf
int sscanf (const char * buffer ,const char *format,...);
Function to read by specifying the format from the buffer
ex)
#pragma warning(disable:4996) // warning about not recommended function use
#include <stdio.h>
int main (void)
{
char buffer[256]="name:Kildong Hong num:12 age:20";
char name[30];
int num;
int age;
//Set the contents of the buffer to the name, number, and age variables
sscanf(buffer,"name:%s num:%d age:%d",name,&num,&age);
//Print name, number, and age
printf("Name:%s Number:%d Age:%d \n",name,num,age);
return 0;
}
return :
Name:kildong hong Number:12 Age:20
//*int -> &, char -> none
strtok: string truncation
char* strtok(char* str, char* delimiters);
<string.h> for c, <cstring> for cpp
*strtok scans the target string from left to right. Whenever it finds a delimiter,
it replaces the delimiter with a "null" termination character and returns the extracted token.
ex)
#include<stdio.h>
#include<string.h> //C++ : <cstring>
int main(void)
{
char str[] = "Kim,Park,Lee,Choi,Seo";
char *ptr = strtok(str, ",");
while (ptr != NULL)
{
printf("%s\n", ptr);
ptr = strtok(NULL, ",");
}
return 0;
}
return:
Kim
Park
Lee
Choi
Seo
"\t": refers to one tab horizontal space
Linux:
Command
pwd: shows which directory you are working in
There are two directory
Absolute Directory: all paths from the top-level directory to the target directory
(*always keep in mind that there is a always a top-level directory (/) at the beginning )
Relative Directory: move based on where you are currently(*mark as ".")
cd ../../ means go backward twice from your directory
cd means home directory
cd / means root directory
cd ./ means relative directory
Makefile
Make is an automation tool primarily used by Linux when building programs.
The file that describes the build process with the standard syntax for make is Makefile.
If you save a script named Makefile in the working directory of the program being developed,
conveniently, you can have make interpret it and automatically perform the build.
*Macro is used to make name for convinient
*.PHONY is used to prevent that "make" misunderstands that it is already ended target.
Link rules are grammars that define how to compile and link targets.
Here, target means the (target) file to be created as a result of the build,
The rule for each target consists of a set of targets, dependencies, and instructions.
-Red Box: This is the target. Targets are separated by a colon after the name.
(The colon is the most prominent feature in the 'rule' grammar.)
-Blue box: This is a dependency.
The materials required to make the target are listed separated by spaces.
-Green box: Bash command to build the target.