OpenAI Baselines: ACKTR & A2C

Published:Aug 18, 2017 07:00
1 min read
OpenAI News

Analysis

The article announces the release of two new reinforcement learning algorithms, ACKTR and A2C, as part of OpenAI's Baselines. It highlights A2C as a synchronous and deterministic variant of A3C, achieving comparable performance. ACKTR is presented as a more sample-efficient alternative to TRPO and A2C, with a computational cost slightly higher than A2C.

Reference

A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.