分析
阿里巴巴 Qwen 团队的 Qwen 3.5 系列在开源模型领域取得了令人印象深刻的飞跃。 这不仅仅是更大的模型; Qwen 3.5 展示了一个令人兴奋的架构转变, 推动了生成式人工智能的边界!
关于moe的新闻、研究和更新。由AI引擎自动整理。
"我们很高兴推出 ~12 倍更快的混合专家 (MoE) 训练,通过我们新的自定义 Triton 内核和数学优化(无精度损失)实现 >35% 的 VRAM 减少和 ~6 倍更长的上下文。"
"In the past year, leading models from the Chinese community had almost unanimously moved toward Mixture-of-Experts (MoE) architectures..."
"Experimental results across AHD tasks with varying objectives and problem scales show that E2OC consistently outperforms state-of-the-art AHD and other multi-heuristic co-design frameworks, demonstrating strong generalization and sustained optimization capability."
"Hey everyone, I made uncensored versions of the new GLM 4.7 Flash from Z.ai."
"Zhipu AI describes GLM-4.7-Flash as a 30B-A3B MoE model and presents it as the strongest model in the 30B class, designed for lightweight deployment..."
"due to being a hybrid transformer+mamba model, it stays fast as context fills"