Voice: Visual oracle for interaction, conversation, and explanation
 Donggang Jia -
 Alexandra Irger -
 Lonni Besançon -
 Ondřej Strnad -
 Deng Luo -
 Johanna Björklund -
 Alexandre Kouyoumdjian -
 Anders Ynnerman -
 Ivan Viola -

 Screen-reader Accessible PDF
 Download preprint PDF
 DOI: 10.1109/TVCG.2025.3579956
Room: Hall E1
Keywords
Visualization, Data visualization, Oral communication, Biology, Biological system modeling, Three-dimensional displays, Solid modeling, Real-time systems, Prototypes, Interviews
Abstract
We present VOICE, a novel approach to science communication that connects large language models’ conversational capabilities with interactive exploratory visualization. VOICE introduces several innovative technical contributions that drive our conversational visualization framework. Based on the collected design requirements, we introduce a two-layer agent architecture that can perform task assignment, instruction extraction, and coherent content generation. We employ fine-tuning and prompt engineering techniques to tailor agents’ performance to their specific roles and accurately respond to user queries. Our interactive text-to-visualization method generates a flythrough sequence matching the content explanation. In addition, natural language interaction provides capabilities to navigate and manipulate 3D models in real-time. The VOICE framework can receive arbitrary voice commands from the user and respond verbally, tightly coupled with a corresponding visual representation, with low latency and high accuracy. We demonstrate the effectiveness of our approach by implementing a proof-of-concept prototype and applying it to the molecular visualization domain: analyzing three 3D molecular models with multiscale and multi-instance attributes. Finally, we conduct a comprehensive evaluation of the system, including quantitative and qualitative analyses on our collected dataset, along with a detailed public user study and expert interviews. The results confirm that our framework and prototype effectively meet the design requirements and cater to the needs of diverse target users.