GEM Activations: The Smooth New Functions Outperforming GELU in Transformers and CNNs

research#activations🔬 Research|Analyzed: Apr 24, 2026 04:07
Published: Apr 24, 2026 04:00
1 min read
ArXiv Neural Evo

Analysis

This research introduces an exciting advancement in neural network architecture by proposing the Geometric Monomial (GEM) family of activation functions. By achieving ReLU-like performance with purely rational arithmetic and smooth gradients, GEM effectively solves optimization hurdles in deep architectures. Impressively, the SE-GEM and E-GEM variants have successfully surpassed the industry-standard GELU function on major benchmarks like CIFAR-10 and GPT-2, marking a significant milestone for model optimization.
Reference / Citation
View Original
"On CIFAR-10 + ResNet-56, SE-GEM surpasses GELU (92.51% vs 92.44%) -- the first GEM-family activation to outperform GELU."
A
ArXiv Neural EvoApr 24, 2026 04:00
* Cited for critical analysis under Article 32.