The Effect of Discounting Actor-loss in Actor-Critic Algorithm

JORDI YAPUTRA

Informasi Dasar

22.04.1049
006.3
Karya Ilmiah - Skripsi (S1) - Reference

Abstract

We analyze and present an experimental approach to see the effect of limiting the Temporal Difference (TD) error in estimating actor-loss on an actor-critic-based agent. The limitation is done by reducing the loss value of an actor to the factor of an epsilon ? constant. In this experiment, we chose four epsilon values, i.e., 0.01, 0.1, 0.5, and 1.0, where 1.0 means no discount at all. In the experiment, we spawn four agents to solve a trivial task for humans in a custom lightweight Windows Operating System (OS)-like simulation. Each agent receives inputs of the simulation’s screen image and controls the cursor inside the simulation to reach for any rendered red circles. After 50 episodes, 50,000 steps in total, each agent achieved about the same success rate with slight differences. The agent given an epsilon value of 0.01 achieved the highest success rate, higher than one without discount learning (epsilon=1.0), although not much.

Index Terms—Reinforcement Learning, Actor-Critic, Temporal Difference Learning, Convolutional Neural Network, Artificial Intelligence

Subjek

ARTIFICIAL INTELLIGENCE
 

Katalog

The Effect of Discounting Actor-loss in Actor-Critic Algorithm
 
 
Indonesia

Sirkulasi

Rp. 0
Rp. 0
Tidak

Pengarang

JORDI YAPUTRA
Perorangan
SUYANTO
 

Penerbit

Universitas Telkom, S1 Informatika
Bandung
2022

Koleksi

Kompetensi

 

Download / Flippingbook

 

Ulasan

Belum ada ulasan yang diberikan
anda harus sign-in untuk memberikan ulasan ke katalog ini