IEEE VIS 2024 Content: Charting EDA: How Visualizations and Interactions Shape Analysis in Computational Notebooks.

Charting EDA: How Visualizations and Interactions Shape Analysis in Computational Notebooks.

Dylan Wootton - MIT, Cambridge, United States

Amy Rae Fox - MIT, Cambridge, United States

Evan Peck - University of Colorado Boulder, Boulder, United States

Arvind Satyanarayan - MIT, Cambridge, United States

Screen-reader Accessible PDF

Room: Bayshore V

2024-10-16T17:45:00ZGMT-0600Change your timezone on the schedule page
2024-10-16T17:45:00Z
Exemplar figure, described by caption below
A diagram illustrating a mixed-methods study of Exploratory Data Analysis (EDA) practices. The left section shows 13 data scientists conducting two EDAs, first with static charts, then with static and interactive charts. Think-aloud utterances and interaction traces are collected from these sessions. The middle section depicts how this data is processed: utterances are coded via content analysis to create observations, which are combined with interaction data to form a comprehensive dataset of EDA sessions. EDA metrics such as revisit rate and hover time are computed from this dataset. The right section demonstrates a formal description of EDA sessions, showing examples of how participants' actions and observations are encoded, including creating visualizations, commenting on distributions, and identifying relationships using various chart types. This systematic approach combines qualitative data collection with quantitative analysis to provide insights into EDA behaviors and strategies.
Fast forward
Keywords

Interaction Design, Methodologies, HumanQual, HumanQuant.

Abstract

Interactive visualizations are powerful tools for Exploratory Data Analysis (EDA), but how do they affect the observations analysts make about their data? We conducted a qualitative experiment with 13 professional data scientists analyzing two datasets with Jupyter notebooks, collecting a rich dataset of interaction traces and think-aloud utterances. By qualitatively coding participant utterances, we introduce a formalism that describes EDA as a sequence of analysis states, where each state is comprised of either a representation an analyst constructs (e.g., the output of a data frame, an interactive visualization, etc.) or an observation the analyst makes (e.g., about missing data, the relationship between variables, etc.). By applying our formalism to our dataset, we identify that interactive visualizations, on average, lead to earlier and more complex insights about relationships between dataset attributes compared to static visualizations. Moreover, by calculating metrics such as revisit count and representational diversity, we uncover that some representations serve more as "planning aids" during EDA rather than tools strictly for hypothesis-answering. We show how these measures help identify other patterns of analysis behavior, such as the "80-20 rule", where a small subset of representations drove the majority of observations. Based on these findings, we offer design guidelines for interactive exploratory analysis tooling and reflect on future directions for studying the role that visualizations play in EDA.