Informasi Umum

Kode

22.04.1049

Klasifikasi

006.3 - Special Computer Methods- Artificial intelligence

Jenis

Karya Ilmiah - Skripsi (S1) - Reference

Subjek

Artificial Intelligence

Informasi Lainnya

Abstraksi

Abstract

We analyze and present an experimental approach to see the effect of limiting the Temporal Difference (TD) error in estimating actor-loss on an actor-critic-based agent. The limitation is done by reducing the loss value of an actor to the factor of an epsilon ? constant. In this experiment, we chose four epsilon values, i.e., 0.01, 0.1, 0.5, and 1.0, where 1.0 means no discount at all. In the experiment, we spawn four agents to solve a trivial task for humans in a custom lightweight Windows Operating System (OS)-like simulation. Each agent receives inputs of the simulation’s screen image and controls the cursor inside the simulation to reach for any rendered red circles. After 50 episodes, 50,000 steps in total, each agent achieved about the same success rate with slight differences. The agent given an epsilon value of 0.01 achieved the highest success rate, higher than one without discount learning (epsilon=1.0), although not much.

Index Terms—Reinforcement Learning, Actor-Critic, Temporal Difference Learning, Convolutional Neural Network, Artificial Intelligence

Koleksi & Sirkulasi

Tersedia 1 dari total 1 Koleksi

Anda harus log in untuk mengakses flippingbook

Pengarang

Nama JORDI YAPUTRA
Jenis Perorangan
Penyunting SUYANTO
Penerjemah

Penerbit

Nama Universitas Telkom, S1 Informatika
Kota Bandung
Tahun 2022

Sirkulasi

Harga sewa IDR 0,00
Denda harian IDR 0,00
Jenis Non-Sirkulasi