research#llm🔬 ResearchAnalyzed: Jan 29, 2026 05:02

Gap-K%: A Novel Approach to Detecting Pretraining Data in Large Language Models

Published:Jan 29, 2026 05:00
1 min read
ArXiv ML

Analysis

This research introduces a groundbreaking method, Gap-K%, for identifying the pretraining data used in Generative AI Large Language Models (LLMs). The innovative approach leverages the log probability gap between a model's top-1 prediction and the target token, leading to state-of-the-art performance in data detection.

Reference / Citation
View Original
"In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining."
A
ArXiv MLJan 29, 2026 05:00
* Cited for critical analysis under Article 32.