Semantic Tree Inference with LLM Embeddings
Analysis
Key Takeaways
- •Proposes a nested density clustering approach for inferring hierarchical semantic trees from text corpora.
- •Utilizes LLM embeddings to capture semantic relationships.
- •Enables data-driven discovery of semantic categories without predefined categories.
- •Evaluated on scientific abstracts, 20 Newsgroups, and IMDB datasets, demonstrating robustness.
- •Highlights potential applications in scientometrics and topic evolution.
“The method starts by identifying texts of strong semantic similarity as it searches for dense clusters in LLM embedding space.”