MultiRisk: Controlling AI Behavior with Score Thresholding

Paper #llm 🔬 Research|Analyzed: Jan 3, 2026 08:54•

Published: Dec 31, 2025 03:25

•

1 min read

Analysis

This paper addresses the critical problem of controlling the behavior of generative AI systems, particularly in real-world applications where multiple risk dimensions need to be managed. The proposed method, MultiRisk, offers a lightweight and efficient approach using test-time filtering with score thresholds. The paper's contribution lies in formalizing the multi-risk control problem, developing two dynamic programming algorithms (MultiRisk-Base and MultiRisk), and providing theoretical guarantees for risk control. The evaluation on a Large Language Model alignment task demonstrates the effectiveness of the algorithm in achieving close-to-target risk levels.