IEEE VIS 2024 Content: Can GPT-4 Models Detect Misleading Visualizations?

Can GPT-4 Models Detect Misleading Visualizations?

Jason Huang Alexander - University of Massachusetts Amherst, Amherst, United States

Priyal H Nanda - University of Masssachusetts Amherst, Amherst, United States

Kai-Cheng Yang - Northeastern University, Boston, United States

Ali Sarvghad - University of Massachusetts Amherst, Amherst, United States

Screen-reader Accessible PDF

Room: Bayshore VI

2024-10-17T18:12:00ZGMT-0600Change your timezone on the schedule page
2024-10-17T18:12:00Z
Exemplar figure, described by caption below
We evaluated the accuracy of three OpenAI GPT-4 models in detecting misleading visualizations. Our findings suggest that this approach could serve as a valuable complementary method for addressing misleading visualizations.
Fast forward
Keywords

Misleading visualizations, GPT-4, large vision language model, misinformation

Abstract

The proliferation of misleading visualizations online, particularly during critical events like public health crises and elections, poses a significant risk of misinformation. This work investigates the capability of GPT-4 models (4V, 4o, and 4o mini) to detect misleading visualizations. Utilizing a dataset of tweet-visualization pairs with various visual misleaders, we tested these models under four experimental conditions with different levels of guidance. Our results demonstrate that GPT-4 models can detect misleading visualizations with moderate accuracy without prior training (naive zero-shot) and that performance considerably improves by providing the model with the definitions of misleaders (guided zero-shot). Our results indicate that a single prompt engineering technique does not necessarily yield the best results for all types of misleaders. We found that guided few-shot was more effective for reasoning misleaders, while guided zero-shot performed better for design misleaders. This study underscores the feasibility of using large vision-language models to combat misinformation and emphasizes the importance of optimizing prompt engineering to enhance detection accuracy.