[Model Release] Genesis-152M-Instruct: Exploring Hybrid Attention + TTT at Small Scale
Analysis
This article announces the release of Genesis-152M-Instruct, a small language model designed for research purposes. It focuses on exploring the interaction of recent architectural innovations like GLA, FoX, TTT, µP, and sparsity within a constrained data environment. The key question addressed is how much architectural design can compensate for limited training data at a 150M parameter scale. The model combines several ICLR 2024-2025 ideas and includes hybrid attention, test-time training, selective activation, and µP-scaled training. While benchmarks are provided, the author emphasizes that this is not a SOTA model but rather an architectural exploration, particularly in comparison to models trained on significantly larger datasets.
Key Takeaways
“How much can architecture compensate for data at ~150M parameters?”