IEEE VIS 2025 Content: Voice: Visual oracle for interaction, conversation, and explanation

Voice: Visual oracle for interaction, conversation, and explanation

Donggang Jia -

Alexandra Irger -

Lonni Besançon -

Ondřej Strnad -

Deng Luo -

Johanna Björklund -

Alexandre Kouyoumdjian -

Anders Ynnerman -

Ivan Viola -

Image not found
Screen-reader Accessible PDF

Room: Hall E1

Keywords

Visualization, Data visualization, Oral communication, Biology, Biological system modeling, Three-dimensional displays, Solid modeling, Real-time systems, Prototypes, Interviews

Abstract

We present VOICE, a novel approach to science communication that connects large language models’ conversational capabilities with interactive exploratory visualization. VOICE introduces several innovative technical contributions that drive our conversational visualization framework. Based on the collected design requirements, we introduce a two-layer agent architecture that can perform task assignment, instruction extraction, and coherent content generation. We employ fine-tuning and prompt engineering techniques to tailor agents’ performance to their specific roles and accurately respond to user queries. Our interactive text-to-visualization method generates a flythrough sequence matching the content explanation. In addition, natural language interaction provides capabilities to navigate and manipulate 3D models in real-time. The VOICE framework can receive arbitrary voice commands from the user and respond verbally, tightly coupled with a corresponding visual representation, with low latency and high accuracy. We demonstrate the effectiveness of our approach by implementing a proof-of-concept prototype and applying it to the molecular visualization domain: analyzing three 3D molecular models with multiscale and multi-instance attributes. Finally, we conduct a comprehensive evaluation of the system, including quantitative and qualitative analyses on our collected dataset, along with a detailed public user study and expert interviews. The results confirm that our framework and prototype effectively meet the design requirements and cater to the needs of diverse target users.