Semantic Tree Inference with LLM Embeddings

Published:Dec 29, 2025 13:55
1 min read
ArXiv

Analysis

This paper introduces a novel method for uncovering hierarchical semantic relationships within text corpora using a nested density clustering approach on Large Language Model (LLM) embeddings. It addresses the limitations of simply using LLM embeddings for similarity-based retrieval by providing a way to visualize and understand the global semantic structure of a dataset. The approach is valuable because it allows for data-driven discovery of semantic categories and subfields, without relying on predefined categories. The evaluation on multiple datasets (scientific abstracts, 20 Newsgroups, and IMDB) demonstrates the method's general applicability and robustness.

Reference

The method starts by identifying texts of strong semantic similarity as it searches for dense clusters in LLM embedding space.