2024 VIS Area Curation Committee Executive Summary

Introduction

This report summarizes the findings, recommendations, and process by the VIS Area Curation Committee (ACC) regarding the areas and keywords used for paper submissions to IEEE VIS 2024. It is based on the 2021, 2022, and 2023 ACC committee reports, updated with the 2024 data. According to the Charter, the goal of this committee is to analyze and report how submissions made use of the areas and keywords to describe their contribution. It is important to understand when these descriptors no longer adequately cover the breadth of research presented at VIS.

We use submission and bidding information from VIS 2024 to analyze recent trends following the move to an area model. We also conducted a survey among authors who submitted to VIS about their satisfaction with the area model.

The full data and source code to rebuild this project are available here.

Committee members 2024: Jean-Daniel Fekete (co-chair), Alexander Lex (co-chair), Helwig Hauser, Ingrid Hotz, David Laidlaw, Torsten Möller, Michael Papka, Danielle Szafir, Yingcai Wu.
Committee members 2023: Steven Drucker (chair), Jean-Daniel Fekete, Ingrid Hotz, David Laidlaw, Alexander Lex, Torsten Möller, Michael Papka, Hendrik Strobelt, Shigeo Takahashi.
Committee members 2022: Steven Drucker (chair), Ingrid Hotz, David Laidlaw, Heike Leitte, Torsten Möller, Carlos Scheidegger, Hendrik Strobelt, Shigeo Takahashi, Penny Rheingans.
Committee members 2021: Alex Endert (chair), Steven Drucker (next chair), Issei Fujishiro, Christoph Garth, Heidi Lam, Heike Leitte, Carlos Scheidegger, Hendrik Strobelt, Penny Rheingans.

Last edited: 2024-10-16.

Executive Summary

Overall, the area model appears to be successful.
There is substantial growth in the Application area and we hence recommend to take action to address the load for APCs of the application area.
Acceptance rates have declined this year, caused by lower first-round acceptance rates and more than usual second-round rejects (some of which are likely one-time effects).
Variability of acceptance rates between areas is substantial.

Otherwise, our analysis suggests that submissions are relatively balanced across areas, keywords are (with a small exception) well distributed, and the unified PC appears to provide broad and overlapping coverage.

Recommendations

The main issue that needs addressing for 2025 is the increasing size of the Applications area. The goal of area size is to be approximately 100 submissions, with a lower bound of 50 and an upper bound of 150. After several years of gradual growth, (2021: 100; 2023: 123), the application area has grown to 154 papers in 2024. The Systems & Rendering area is slightly above the lower bound with 51 papers, as is the Data Transformations area with 53 papers.

We see three possible remedies for this uneven workload:

Option 1: Area Split

The most obvious choice would be to split the areas that have grown too large. While this would be sensible for areas with a “natural” fault line, like Theoretical and Empirical, it seems more fraught for Applications. It is unclear how a sensible split of Applications could be executed that is not confusing to authors. One possibility would be to split by application domain (e.g., Applications in AI/ML and all other Applications), but this would lead to ambiguity for a number of papers.

Option 2: Additional APCs for Large Areas

The main motivation for keeping areas to similar sizes is to manage workload for APCs. An alternative approach to a split would be to add additional APCs for large areas (e.g., a third APC for applications). While it would reduce workload slightly, there are several downsides to this approach:

It would necessitate recruiting another senior researcher as an APC, and recruiting tends to be difficult.
It would grow the overall size of the APC while the number of papers is not growing substantially (it is still lower than in the outlier year 2020).
It might make decision-making more complicated.

Option 3: Introducing Secondary Areas and Moving Papers

An alternative approach would be to introduce a secondary area for all paper submissions, and then balance papers after the initial submission. Author feedback shows that a second-choice area would be desirable.

The advantages of this approach:

It could not only be used to alleviate issues in large areas, but also move papers to areas that do not receive enough submissions, creating an overall more balanced load.
It would be a relatively minor change from author’s perspective.

The downsides of this approach:

It adds an additional step to the review process. While papers are currently moved between areas on rare occasion, this would make this a standard part of the process.
Authors could be discontent if their paper isn’t reviewed in their primary area.

The ACC currently favors Option 3 as it seems most promising in balancing areas. However, we recommend eliciting feedback from APCs and OPCs on these options.

Survey Results

Regarding authors satisfaction related to the areas from the IEEE VIS Areas Feedback Survey 2024:

For our survey, we received 103 responses. This is not a particularly high return rate, but it’s a good absolute number.
We received most responses from authors, who submitted to areas 1 (theory…), 2 (applications), and 4 (representations…), and less from areas 3 (systems), 5 (data transformations), and 6 (analytics).
Very few (6) found that they could not find a suitable area, a fair share of the authors (25 of 103), however, state that they could also have submitted to another area.
8 authors, who submitted to applications, would also have submitted to another area.
Area 6 came up most often as a possible alternative area.
At large, it seems that most authors do not see the immediate need to change the areas.
Quite a few of the authors (18 of 103) thought that area 4 needs an improvement.
Also quite a few (12) thought that area 6 should be improved.
When asked regarding merging/splitting areas, “splitting area 2 (applications)” came up most often.
The authors sent a pretty strong signal that indicating a second-choice area would be welcome.

Quantitative Analysis

Code

import itertools

import pandas as pd
import numpy as np

# Import the necessaries libraries
import plotly.offline as pio
import plotly.graph_objs as go
import plotly.express as px
# [jdf] no need to specify the renderer but, for interactive use, init_notebook should be called
# pio.renderers.default = "jupyterlab"
# Set notebook mode to work in offline
# pio.init_notebook_mode()
# pio.init_notebook_mode(connected=True)
width = 750

import sqlite3

#### Data Preparation

# static data – codes -> names etc.
staticdata = dict(
    decision = { 
        'C': 'Confer vs. cond Accept', # relevant for the 2020 and 2021 data have a different meaning
        'A': 'Accept', # for the 2020 data
        'A2': 'Accept', # after the second round, should be 120 in 2022
        'R': 'Reject', # reject after the first round -- should be 322 in 2022
        'R2': 'Reject in round 2', # reject after the second round -- should be 2 in 2022
        'R-2nd': 'Reject in round 2', 
        'DR-S': 'Desk Reject (Scope)', # should be 7 in 2022
        'DR-P': 'Desk Reject (Plagiarism)', # should be 4 in 2022
        'AR-P': 'Admin Reject (Plagiarism)', # should be 1 in 2022
        'DR-F': 'Desk Reject (Format)', # should be 4 in 2022
        'R-Strong': 'Reject Strong', # cannot resubmit to TVCG for a year
        'T': 'Reject TVCG fasttrack', # Explicitly invited to resubmit to TVCG, status in major revision
    },
    FinalDecision = { # Just flatten to Accept and Reject
        'C': 'Accept', 
        'A': 'Accept', # for the 2020 data
        'A2': 'Accept', # after the second round, should be 120 in 2022
        'R': 'Reject', # reject after the first round -- should be 322 in 2022
        'R2': 'Reject', # reject after the second round -- should be 2 in 2022
        'R-2nd': 'Reject', 
        'DR-S': 'Reject', # should be 7 in 2022
        'DR-P': 'Reject', # should be 4 in 2022
        'AR-P': 'Reject', # should be 1 in 2022
        'DR-F': 'Reject', # should be 4 in 2022
        'R-Strong': 'Reject',
        'T': 'Reject',
    },
    area = {
        'T&E': 'Theoretical & Empirical',
        'App': 'Applications',
        'S&R': 'Systems & Rendering',
        'R&I': 'Representations & Interaction',
        'DTr': 'Data Transformations',
        'A&D': 'Analytics & Decisions',
    },
    bid = { 
        0: 'no bid',
        1: 'want',
        2: 'willing',
        3: 'reluctant',
        4: 'conflict'
    },
    stat = {
        'Prim': 'Primary', 
        'Seco': 'Secondary'
    },
    keywords = pd.read_csv("../data/2021/keywords.csv", sep=';'), # 2021 is correct as there was no new keywords file in 2022
    colnames = {
        'confsubid': 'Paper ID',
        'rid': 'Reviewer',
        'decision': 'Decision',
        'area': 'Area',
        'stat': 'Role',
        'bid': 'Bid'
    }
)

dbcon = sqlite3.connect('../data/vis-area-chair.db') #[jdf] assume data is in ..

submissions_raw20 = pd.read_sql_query('SELECT * from submissions WHERE year = 2020', dbcon, 'sid')
submissions_raw21 = pd.read_sql_query('SELECT * from submissions WHERE year = 2021', dbcon, 'sid')
submissions_raw22 = pd.read_sql_query('SELECT * from submissions WHERE year = 2022', dbcon, 'sid')
submissions_raw23 = pd.read_sql_query('SELECT * from submissions WHERE year = 2023', dbcon, 'sid')
submissions_raw24 = pd.read_sql_query('SELECT * from submissions WHERE year = 2024', dbcon, 'sid')
submissions_raw = pd.read_sql_query('SELECT * from submissions', dbcon, 'sid')
#print(submissions_raw24)

submissions = (submissions_raw
    .join(
        pd.read_sql_query('SELECT * from areas', dbcon, 'aid'), 
        on='aid'
    )
    .assign(Keywords = lambda df: (pd
        .read_sql_query('SELECT * FROM submissionkeywords', dbcon, 'sid')
        .loc[df.index]
        .join(
            pd.read_sql_query('SELECT * FROM keywords', dbcon, 'kid'), 
            on='kid'
        )
        .keyword
        .groupby('sid')
            .apply(list)
    ))
    .assign(**{'# Keywords': lambda df: df.Keywords.apply(len)})
    .assign(**{'FinalDecision': lambda df: df['decision']})
    .replace(staticdata)
    .rename(columns = staticdata['colnames'])
    .drop(columns = ['legacy', 'aid'])
#    .set_index('sid')
#    .set_index('Paper ID')
# note -- I changed the index, since 'Paper ID' was not unique for multiple years.
# By not setting the index to 'Paper ID' the index remains with 'sid'.
# However, 'sid' is used as a unique index in the creation of the database anyways.
)

# replace the old 'Paper ID' with a unique identifier, so that the code from 2021 will work
submissions = submissions.rename(columns = {'Paper ID':'Old Paper ID'})
submissions.reset_index(inplace=True)
submissions['Paper ID'] = submissions['sid']
submissions = submissions.set_index('Paper ID')
#submissions colums: (index), sid (unique id), Paper ID (unique), Old Paper ID, Decision, year, Area, Keywords (as a list), # Keywords

all_years = submissions['year'].unique()

#rates_decision computes the acceptance rates (and total number of papers) per year
#rates_decision: (index), Decision, year, count, Percentage
rates_decision = (submissions
    .value_counts(['Decision', 'year'])
    .reset_index()
    # .rename(columns = {0: 'count'})
)
rates_decision['Percentage'] = rates_decision.groupby(['year'])['count'].transform(lambda x: x/x.sum()*100)
rates_decision = rates_decision.round({'Percentage': 1})
#rates_decision computes the acceptance rates (and total number of papers) per year
#rates_decision: (index), Decision, year, count, Percentage
rates_decision_final = (submissions
    .value_counts(['FinalDecision', 'year'])
    .reset_index()
    # .rename(columns = {0: 'count'})
)
rates_decision_final['Percentage'] = rates_decision_final.groupby(['year'])['count'].transform(lambda x: x/x.sum()*100)
rates_decision_final = rates_decision_final.round({'Percentage': 1})
#submissions
#bids_raw: (index), Reviewer ID, sid (unique paper identifier over mult years), match score, bid of the reviewer, role of the reviewer, Paper ID
bids_raw = (pd
    .read_sql_query('SELECT * from reviewerbids', dbcon)
    .merge(submissions_raw['confsubid'], on='sid')
    .replace(staticdata)
    .rename(columns = staticdata['colnames'])
)
#bids_raw

## Renaming Paper ID to Old Paper ID, setting Paper ID to sid, keeping all 3 for now...
bids_raw = bids_raw.rename(columns = {'Paper ID':'Old Paper ID'})
bids_raw['Paper ID'] = bids_raw['sid']
# bids = Reviewer, sid, Bid (how the reviewer bid on this paper)
#      doesn't include review/sid that were not bid for [.query('Bid != "no bid"')]
bids = (bids_raw
    .query('Bid != "no bid"')
# Paper ID is not unique over multiple years!
#    .drop(columns = ['sid'])
#    [['Reviewer','Paper ID', 'Bid']]
    [['Reviewer','sid', 'Paper ID', 'Bid']]
    .reset_index(drop = True)
)

# matchscores becomes a table to reviewer/sid with the match scores
# many of these will be "NaN" since we now have multiple years together.
# we need to check whether the reviewer IDs remain unique across the years!
matchscores = (bids_raw
# Paper ID is not unique over multiple years!
#    [['Reviewer','Paper ID','match']]
    [['Reviewer','sid','Paper ID','match']]
# Paper ID is not unique over multiple years!
#    .set_index(['Reviewer', 'Paper ID'])
    .set_index(['Reviewer', 'Paper ID'])
    .match
    .unstack(level=1)
)

# assignments = Reviewer, sid, Role (primary, secondary)
#      doesn't include review/sid that were not assigned [.query('Role != ""')]
assignments = (bids_raw
    .query('Role != ""')
# Paper ID is not unique over multiple years!
#    [['Reviewer', 'Paper ID', 'Role']]
    [['Reviewer', 'sid', 'Paper ID', 'Role']]
    .reset_index(drop = True)
)

del dbcon

#### Plot Defaults

acc_template = go.layout.Template()

acc_template.layout = dict(
    font = dict( 
        family='Fira Sans',
        color = 'black',
        size = 13
    ),
    title_font_size = 14,
    plot_bgcolor = 'rgba(255,255,255,0)',
    paper_bgcolor = 'rgba(255,255,255,0)',
    margin = dict(pad=10),
    xaxis = dict(
        title = dict( 
            font = dict( family='Fira Sans Medium', size=13 ),
            standoff = 10
        ),
        gridcolor='lightgray',
        gridwidth=1,
        automargin = True,
        fixedrange = True,
    ),
    yaxis = dict(
        title = dict( 
            font = dict( family='Fira Sans Medium', size=13 ),
            standoff = 10,
        ),
        gridcolor='lightgray',
        gridwidth=1,
        automargin = True,
        fixedrange = True,
    ),
    legend=dict(
        title_font_family="Fira Sans Medium",
    ),
    colorway = px.colors.qualitative.T10,
    hovermode = 'closest',
    hoverlabel=dict(
        bgcolor="white",
        bordercolor='lightgray',
        font_color = 'black',
        font_family = 'Fira Sans'
    ),
)

acc_template.data.bar = [dict(
    textposition = 'inside',
    insidetextanchor='middle',
    textfont_size = 12,
)]

px.defaults.template = acc_template

px.defaults.category_orders = {
    'Decision': list(staticdata['decision'].values()),
    'FinalDecision':  list(staticdata['FinalDecision'].values()),
    'Area': list(staticdata['area'].values()),
    'Short Name': staticdata['keywords']['Short Name'].tolist(),
}

config = dict(
    displayModeBar = False,
    scrollZoom = False,
    responsive = False
)

def aspect(ratio):
    return { 'width': width, 'height': int(ratio*width) }

# useful data sub-products

#k_all columns: (index), Paper ID, Old Paper ID, Decision, year, Area, Keywords (as a list), # Keywords, Keyword, Category, Subcategory, Short Name, Description
k_all = (submissions
    .join(submissions['Keywords']
        .explode()
        .rename('Keyword')
    )
    .reset_index(level = 0)
    .merge(staticdata['keywords'], on='Keyword')
)

# (Old) Paper ID is not unique, however, the 'sid' is (which is the current index)
#k_all.reset_index(inplace=True)
#k_all.rename(columns = {'sid':'Paper ID'},inplace = True)
#k_all = k_all.merge(staticdata['keywords'], on='Keyword')
#k_all

#k_total columns: Category, Subcategory, Short Name, Keyword, Description, #Submissions, year
#  counts the total number of submissions per keyword and year
k_total = staticdata['keywords'].merge(
    k_all.value_counts(['Short Name','year'])
         .rename('# Submissions')
         .reset_index(),
#    on = 'Short Name',
    how = 'right'
#    how = 'outer'
)

#k_cnt: how often was a particular keyword used among all submissions within a year????
#k_cnt columns: (index), Short Name, year, c, Category, Subcategory, Keyword, Description
# not clear how k_cnt and k_total differ!
k_cnt = (k_all
    .value_counts(['Short Name','year'], sort=False)
    .rename('c')
    .to_frame()
    .reset_index()
    .merge(staticdata['keywords'], on='Short Name')
)

Submissions

The number of submissions peaked in 2020 at 585 papers, which is likely caused by the pandemic and the one-month extension to the deadline given because of it. The years 2021 and 2022 saw lower numbers of submissions with 442 and 460 respectively. Submissions increased in the year 2023 (539) and 2024 (544), and are now almost back to the peak of 2020.

Code

totals = rates_decision_final.groupby('year')['count'].sum().reset_index()

Code

fig = px.bar(totals,
    y='year',
    x='count', 
    orientation = 'h',
    labels={'count':'Number of Submissions', 'year':'Year'},
    text = 'count',
).update_layout(
    yaxis=dict(autorange="reversed", tickmode='linear'),
    title = 'Submissions Numbers since 2020',
    xaxis_title = 'Number of Submissions',
    **aspect(0.35)
)

fig.show(config=config)

Acceptance Rates

Acceptance rates have fluctuated lightly from 2020-2023 (26.8%, 24.9%, 26.1% and 25.8%) though there was a dip (24.4%) in 2021. For 2024, we see a rather sharp drop off to 22.4%, which is partially caused by a lower first-round acceptance rate (23.2%) and amplified by 3 second-round rejects.

This trend might mean that the reviewers want higher-quality articles, which would be good if the research field was becoming more stable and reached a steady state, or that they become more conservative, which would be detrimental to the development of novel less consensual research directions. We hope the VSC will enquire more deeply and provide guidelines to the OPC and reviewers to steer the conference in the right direction.

Code

fig = px.bar(rates_decision_final,
    x = 'Percentage',
    y = 'year',
    barmode = 'stack',
    orientation = 'h',
    color = 'FinalDecision',
    text = 'Percentage',
    custom_data = ['FinalDecision','count'],
).update_layout(
    yaxis=dict(autorange="reversed", tickmode='linear'),
    title = 'Acceptance Rates since 2020',
    xaxis_title = 'Percentage of Submissions',
    **aspect(0.35)
).update_traces(
    hovertemplate = '%{customdata[1]} submissions in %{y} have decision %{customdata[0]}<extra></extra>',
).show(config=config)

Distributions in Areas

Code

tmp = (submissions
    .value_counts(['Area', 'year'])
    .reset_index()
    .rename(columns = {0: 'count'})
)

data=[]
count=0
recent_years = [2021, 2022, 2023, 2024]
for my_year in recent_years:
    count=count+1
    trace1=go.Bar(
        x=tmp[tmp['year']==my_year]["Area"],
        y=tmp[tmp['year']==my_year]['count'],
        customdata = tmp[tmp['year']==my_year],
        hovertemplate="%{y} papers were submitted in",
        name=f"{my_year}",
        offsetgroup=count,
    )
    data.append(trace1)


fig2 = go.Figure(
    data=data,
    layout=go.Layout(
        title="Comparing # submissions 2021, 2022. 2023, and 2024",
        xaxis_title="Areas",
        template="plotly_white"
    )
)
fig2.show()

Submissions across the (reformulated) areas are relatively stable between 2021 and 2024 with some notable exceptions. Applications has been a large area since the start of the area model (100 submissions in 2021), but has seen growth in 2023 (123 submissions) and especially in 2024 (149). Applications is hence three times as large as the smaller areas Data Transformations and Systems & Rendering, indicating an uneven load for the area paper chairs.

Priority: Intellectual Homogeneity Priority: Balanced Area Size (workload)

Divide Applications Areas along subject lines
Add an administrative division of Application Areas at random
Add additional Area Paper Chairs

The other larger areas, Theoretical & Empirical has seen a slight dip in 2024; while Representations & Interaction has seen a rise from 86 submissions to 108 papers. However, these numbers remain in the desired range of submissions handled by a team of APCs.

Acceptance Rates in Areas

Code

recent_submissions = submissions[submissions['year'] != 2020]
tmptotal = (recent_submissions
    .value_counts(['Area', 'year'])
    .reset_index()
    .rename(columns = {'count': 'total'})
)
tmp = (recent_submissions
    .value_counts(['Area', 'FinalDecision', 'year'])
    .reset_index()
    # .rename(columns = {0: 'count'})
)
tmpfinal = pd.merge(left=tmp, right=tmptotal, on=['Area','year'])
tmpfinal['percentage']= round(tmpfinal['count']/tmpfinal['total'] *1000)/10.0
fig = px.bar(tmpfinal,
    x = 'year',
    y = 'percentage',
    barmode = 'stack',
    orientation = 'v',
    color = 'FinalDecision',
    text = 'percentage',
    custom_data = ['FinalDecision'],
    facet_col='Area',
    category_orders = {"year": [2021,2022, 2023, 2024]},
    facet_col_spacing=0.06, # default is 0.03
    ).update_layout(
        title = 'Submissions by area and year',
        xaxis_title = 'year',
        legend=dict(
            yanchor="top",
            y=1,  # Adjust legends y-position
            xanchor="left",
            x=1.08,  # ... and x-position to avoid overlapping
        ),
        **aspect(0.8)
    ).update_xaxes(type='category').update_traces(
        hovertemplate = '%{y}% of submissions in %{x} have decision %{customdata[0]}<extra></extra>',
    )
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
for i,a in enumerate(fig.layout.annotations):
    if (i%2):
        a.update(yshift=-15)

# Add horizontal line at 75% for each subplot
fig.add_shape(
    type="line",
    x0=0, x1=1,  # from the left to the right of the plot
    y0=75, y1=75,  # at y = 75% on the y-axis
    xref='paper',  # relative to the entire plot width
    yref='y',  # relative to the y-axis
    line=dict(color="Darkgray", width=2),
)
# Add a label next to the line at 75%
fig.add_annotation(
    x=1,  # Position near the end of the plot (right side)
    y=75,  # Position at 75% on the y-axis
    xref='paper',  # Relative to the entire plot width
    yref='y',  # Relative to the y-axis
    text="75% Threshold",  # The label text
    showarrow=False,  # No arrow, just text
    font=dict(size=12, color="Black"),  # Customize the font size and color
    xanchor='left',  # Anchor the text to the left side of the x-position
    yanchor='middle'  # Center the text vertically on the y-position
)
fig.show(config=config)

Acceptance rates have been fairly consistent across areas in 2021, but not so in 2022, 2023, and 2024.

Generally, Theoretical & Empirical seems to have higher acceptance rates than other areas. Analytics & Decisions seems to become substantially more selective every year, accepting only 16.9% of all submissions in 2024. Systems & Rendering fluctuates over time, with 33.3% accepted in 2023 but only 16.7% in 2024. It is notable that Systems & Rendering is one of the smallest areas, hence, these fluctuations are caused by a relatively small number of papers.

Keywords

And frequencies of the use of keywords range from 5 to 120. The keywords with the highest number of occurrences are not very useful for categorizing papers, but they are very meaningful, and differentiation works effectively with accompanying keywords. We believe that having five papers that use a keyword is sufficient to warrant retaining it.

Code

# do a manual histogram to include non-specified keywords
# 
k_total['Submission %'] = k_total.groupby(['year'])['# Submissions'].transform(lambda x: x/x.sum()*100)
k_total['Year'] = k_total['year'].astype(str)  # to get categorical colors
k_year = k_total.pivot(index="year", values="Submission %", columns="Short Name").T

px.scatter(k_total,
    y = 'Short Name',
    x = 'Submission %',  # 'Submission %',
    color = 'Year',
    category_orders={"Year": ["2024", "2023", "2022", "2021", "2020"]}
    # facet_row='year',
    # category_orders={'year':  reversed([2020, 2021, 2022, 2023, 2024])},
).update_traces(
    hovertemplate = "'%{x}' specified in %{y} submissions<extra></extra>",
).update_layout(
    yaxis_tickfont_size = 8,
    yaxis_dtick = 1,
    yaxis_tickmode = 'linear',
    # yaxis_dtick = 50,
    hovermode = 'closest',
    title = 'Frequency of keywords across submissions',
    **aspect(1)
).show(config=config)