OpenAI Baselines: ACKTR & A2C
Analysis
The article announces the release of two new reinforcement learning algorithms, ACKTR and A2C, as part of OpenAI's Baselines. It highlights A2C as a synchronous and deterministic variant of A3C, achieving comparable performance. ACKTR is presented as a more sample-efficient alternative to TRPO and A2C, with a computational cost slightly higher than A2C.
Key Takeaways
- •OpenAI released ACKTR and A2C as part of their Baselines.
- •A2C is a synchronous, deterministic version of A3C with similar performance.
- •ACKTR is more sample-efficient than TRPO and A2C, with slightly higher computational cost than A2C.
Reference
“A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.”