GEM Activations: The Smooth New Functions Outperforming GELU in Transformers and CNNs
research#activations🔬 Research|Analyzed: Apr 24, 2026 04:07•
Published: Apr 24, 2026 04:00
•1 min read
•ArXiv Neural EvoAnalysis
This research introduces an exciting advancement in neural network architecture by proposing the Geometric Monomial (GEM) family of activation functions. By achieving ReLU-like performance with purely rational arithmetic and smooth gradients, GEM effectively solves optimization hurdles in deep architectures. Impressively, the SE-GEM and E-GEM variants have successfully surpassed the industry-standard GELU function on major benchmarks like CIFAR-10 and GPT-2, marking a significant milestone for model optimization.
Key Takeaways
- •SE-GEM successfully beats the widely-used GELU activation function on the CIFAR-10 image classification benchmark.
- •GEM activations achieve top-tier results in NLP, lowering perplexity on a 124M parameter GPT-2 model.
- •The research uncovers a fascinating tradeoff: CNNs perform best with N=1, while Transformer models prefer N=2.
Reference / Citation
View Original"On CIFAR-10 + ResNet-56, SE-GEM surpasses GELU (92.51% vs 92.44%) -- the first GEM-family activation to outperform GELU."
Related Analysis
research
Review: Deep Learning from Scratch — Mastering the Theory and Implementation with Python
Apr 24, 2026 05:05
researchPioneering Historical AI Models: Exploring the Best Architectures for Training from Scratch
Apr 24, 2026 04:32
researchEmpowering Peacebuilders: Collaborative AI Tackles Online Hate Speech and Polarization
Apr 24, 2026 04:08