S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test
Analysis
This paper introduces S$^3$IT, a new benchmark designed to evaluate embodied social intelligence in AI agents. The benchmark focuses on a seat-ordering task within a 3D environment, requiring agents to consider both social norms and physical constraints when arranging seating for LLM-driven NPCs. The key innovation lies in its ability to assess an agent's capacity to integrate social reasoning with physical task execution, a gap in existing evaluation methods. The procedural generation of diverse scenarios and the integration of active dialogue for preference acquisition make this a challenging and relevant benchmark. The paper highlights the limitations of current LLMs in this domain, suggesting a need for further research into spatial intelligence and social reasoning within embodied agents. The human baseline comparison further emphasizes the gap in performance.
Key Takeaways
- •Introduces S$^3$IT, a new benchmark for evaluating embodied social intelligence.
- •Focuses on a seat-ordering task requiring consideration of social norms and physical constraints.
- •Highlights the limitations of current LLMs in integrating spatial intelligence and social reasoning.
“The integration of embodied agents into human environments demands embodied social intelligence: reasoning over both social norms and physical constraints.”