CS285 | Lecture 1 Introduction and Course Overview
08 Sep 2022
When image object bin is given, let’s train the robot arm what coordinate to move. we can think two approaches.
- understand how object is rigid, soft, heavy, and designing solution.
- set tuple of (image, coordinate), and train with supervised learning.
RL can be used with good successful behavior data instead of Ground truth data. (In other words, we need to tell what experience is possible outcome in real-world.
What is Reinforcement learning?
- Mathematical formalism for learning-based decision making.
- Approach for learning decision making and control from experience.
How is different from other machine learning topics?
In standard SL model, we need assume like this.
- i.i.d data (previous output doesn’t influence future inptus)
- known ground truth outputs in training
RL doesn’t assume like above.
- Data is not i.i.d.: previous outputs influence future inputs
- Ground truth answer is not known, only know if we succeed or failed. → more generally we know the reward.
Agent makes decision and env respond to the decisions with observation as form of state/reward. We iterate this process multiple time and make sequential deicison. Left pictures are example of action/observation/rewaed.
Why should we care about deep RL?
In short, we are using deep learning to handle flexibility, complexity, unpredictability of real-world.
In open world setting, you need to model that generalize efficiently to un seen data.
RL is mathematical way of thinking about sequential decision making.
previously, we should manually designed feature but by using deep learning, we can automatically learned features.
What does end-to-end learning means for sequential decision making?
- not end-to end case
recogntion part of problem is trained with label and then solve control problem and repeat it.
- end-to-end case
using end-to-end, it can directly forward by final performance.
summary) Deep models are what allow RL algorithm to solve complex problem end to ends.
- Basic RL assume that it can interact with system and receive ground truth reward.
- Learning to predict and using prediction to act → instead of learning behavior directly, why you attempt to learn is some representation of how the world works. (= model based RL)
- Imitation Learning doesn’t learn from the reward supervision, learn by attempting to mimic human(expert) behavior.
- More than Imitation : Inferring intentions → infer the goal to find more effective way using RL.
- Prediction for real-world control (=domain of model-based RL) : not learn a policy but learn a model that predict what will happen next in given observation.
—
Alan Truing said like this.
Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain.