Search:
Match:
5 results

RepetitionCurse: DoS Attacks on MoE LLMs

Published:Dec 30, 2025 05:24
1 min read
ArXiv

Analysis

This paper highlights a critical vulnerability in Mixture-of-Experts (MoE) large language models (LLMs). It demonstrates how adversarial inputs can exploit the routing mechanism, leading to severe load imbalance and denial-of-service (DoS) conditions. The research is significant because it reveals a practical attack vector that can significantly degrade the performance and availability of deployed MoE models, impacting service-level agreements. The proposed RepetitionCurse method offers a simple, black-box approach to trigger this vulnerability, making it a concerning threat.
Reference

Out-of-distribution prompts can manipulate the routing strategy such that all tokens are consistently routed to the same set of top-$k$ experts, which creates computational bottlenecks.

Analysis

This paper addresses critical challenges of Large Language Models (LLMs) such as hallucinations and high inference costs. It proposes a framework for learning with multi-expert deferral, where uncertain inputs are routed to more capable experts and simpler queries to smaller models. This approach aims to improve reliability and efficiency. The paper provides theoretical guarantees and introduces new algorithms with empirical validation on benchmark datasets.
Reference

The paper introduces new surrogate losses and proves strong non-asymptotic, hypothesis set-specific consistency guarantees, resolving existing open questions.

Analysis

This paper introduces TEXT, a novel model for Multi-modal Sentiment Analysis (MSA) that leverages explanations from Multi-modal Large Language Models (MLLMs) and incorporates temporal alignment. The key contributions are the use of explanations, a temporal alignment block (combining Mamba and temporal cross-attention), and a text-routed sparse mixture-of-experts with gate fusion. The paper claims state-of-the-art performance across multiple datasets, demonstrating the effectiveness of the proposed approach.
Reference

TEXT achieves the best performance cross four datasets among all tested models, including three recently proposed approaches and three MLLMs.

Research#llm🏛️ OfficialAnalyzed: Dec 25, 2025 23:50

Are the recent memory issues in ChatGPT related to re-routing?

Published:Dec 25, 2025 15:19
1 min read
r/OpenAI

Analysis

This post from the OpenAI subreddit highlights a user experiencing memory issues with ChatGPT, specifically after updates 5.1 and 5.2. The user notes that the problem seems to be exacerbated when using the 4o model, particularly during philosophical conversations. The AI appears to get "re-routed," leading to repetitive behavior and a loss of context within the conversation. The user suspects that the memory resets after these re-routes. This anecdotal evidence suggests a potential bug or unintended consequence of recent updates affecting the model's ability to maintain context and coherence over extended conversations. Further investigation and confirmation from OpenAI are needed to determine the root cause and potential solutions.

Key Takeaways

Reference

"It's as if the memory of the chat resets after the re-route."

Research#LLM🔬 ResearchAnalyzed: Jan 10, 2026 13:03

RoBoN: Scaling LLMs at Test Time Through Routing

Published:Dec 5, 2025 08:55
1 min read
ArXiv

Analysis

This ArXiv paper introduces RoBoN, a novel method for efficiently scaling Large Language Models (LLMs) during the test phase. The technique focuses on routing inputs to a selection of LLMs and choosing the best output, potentially improving performance and efficiency.
Reference

The paper presents a method called RoBoN (Routed Online Best-of-n).