Abstract

Background: Zero-sum beliefs—the perception that one group’s gains necessarily result in another group’s losses—are important predictors of political attitudes. However, the referents for zero-sum beliefs as economic or social identity remain underexplored in relation to political ideology, party affiliation, and voting behavior in contemporary elections.

Method: We conducted a comprehensive analysis examining three dimensions of zero-sum beliefs (general, economic, and social identity). Using Kruskal-Wallis tests on eleven zero-sum beliefs, we investigated how political party affiliation and racial/ethnic identity influenced endorsement of zero-sum beliefs across multiple domains. Subsequently, we examined whether these zero-sum belief patterns predicted self-reported voting for Donald Trump versus Kamala Harris in the 2024 presidential election.

Results: Political party affiliation was a significant predictor for all eight zero-sum social identity beliefs, but none of the economic or general beliefs. Republican voters and certain racial/ethnic groups demonstrated higher endorsement of zero-sum social identity beliefs. A logistic regression shows that after controlling for political ideology, a composite of zero-sum social identity beliefs explains voting behavior in the 2024 presidential election, with stronger zero-sum social identity thinking associated with Trump support and lower zero-sum social identity beliefs predicting Harris support. Other sociodemographic factors and zero-sum economic thinking were not significant predictors.

Discussion: Zero-sum social identity beliefs may represent a competitive core belief underlying contemporary political party affiliation and candidate preference. These findings affirm prior work that zero-sum thinking about economics differ from social identities, with similar levels of agreement on zero-sum economic beliefs across political parties but significantly different levels of agreement on zero-sum social identity beliefs by party affiliation. To the best of our knowledge, this study is the first to show that zero-sum thinking about social identities predicts voter preference in the 2024 election. Ultimately, future work needs to examine how to reduce zero-sum social identity thinking.

Predicting Voting Behavior for 2024 Presidential Candidate by Machine Learning

In [1]:

Show the code


# Load necessary libraries
library(ggplot2)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Show the code


library(tidyr)
library(ggrain)

Registered S3 methods overwritten by 'ggpp':
  method                  from   
  heightDetails.titleGrob ggplot2
  widthDetails.titleGrob  ggplot2

Show the code


library(rmarkdown)
library(readr)
library(dplyr, warn.conflicts = FALSE)
library(haven)
library(rempsyc)

Suggested APA citation: Thériault, R. (2023). rempsyc: Convenience functions for psychology. 
Journal of Open Source Software, 8(87), 5466. https://doi.org/10.21105/joss.05466

Show the code


library(knitr)
library(broom)
library(ggdist)
library(devtools)

Loading required package: usethis

Show the code


library(apaTables)
library(ggpubr)
library(psych)


Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha

Show the code


library(forcats)
library(corrplot)

corrplot 0.95 loaded

In [2]:

Show the code

select_data <- read.csv("/cloud/project/data/select_data.csv")

Predicting Model (Decision Tree & Random Forest)

Decision Tree and Random Forest Analysis

To further validate these findings and examine the predictive power of our variables using a different analytical approach, we employed a series of machine learning techniques. Our analysis proceeded in three stages:

Stage 1: Initial Decision Tree We first constructed a simple decision tree to identify the primary predictors and their splitting thresholds for Trump voting behavior. This provided an interpretable baseline model showing how the algorithm naturally segments voters.
Stage 2: Extended Decision Tree with Cross-Validation We then built a more complex decision tree incorporating additional demographic variables and used cross-validation to determine the optimal model complexity. Through this process, we found that the best performing tree is the 2-split model, which achieved a cross-validation error of 0.24. This suggests that despite having access to multiple demographic and ideological variables, the most predictive model requires only two key splits to effectively classify voters.
Stage 3: Random Forest Analysis Finally, we employed a Random Forest ensemble method to capture potential non-linear relationships and interactions while providing robust variable importance measures. This approach confirmed our regression findings by identifying ZEROSUM_IDENTITY and POLITICALBELIEFS as the most important predictors, with substantially higher importance scores than all other variables.

This machine learning approach serves as an independent validation of our regression based findings, using fundamentally different algorithms to examine the same relationships and providing additional confidence in our substantive conclusions about voting behavior predictors.

In [3]:

Show the code


# create new variable
select_data <- select_data %>%
  mutate(TRUMPVOTE = case_when(
    VOTE2024 == 1 ~ 1,
    VOTE2024 == 2 ~ 0,
    TRUE ~ NA
  ))

In [4]:

Show the code


select_data <- select_data %>%
  mutate(
    ZEROSUM_ECONOMIC = (ZEROSUM_2 + ZEROSUM_3)/2,
    ZEROSUM_IDENTITY = (ZEROSUM_4 + ZEROSUM_5 + ZEROSUM_6 + ZEROSUM_7 + ZEROSUM_8 + ZEROSUM_9 + ZEROSUM_10 + ZEROSUM_11)/8
  )

In [5]:

Show the code

select_data <- select_data %>%
  mutate(TRUMPVOTE = factor(TRUMPVOTE, levels = c(0, 1)))  # 0 = non-Trump, 1 = Trump

In [6]:

Show the code


library(rpart)
library(rpart.plot)

dt_model <- rpart(TRUMPVOTE ~ POLITICALBELIEFS + ZEROSUM_ECONOMIC + ZEROSUM_IDENTITY + ZEROSUM_1 + GENDER_MALE +
    RELIGIOUS_YES + RACE_BLACK + RACE_ASIAN + RACE_OTHER + EDUCATION_HIGH + SOCIALSTATUS,
                  data = select_data,
                  method = "class")

rpart.plot(dt_model, extra = 104)

Figure x. Decision Tree Analysis of Voting Predictors.

The only variable the tree uses is ZEROSUM_IDENTITY, suggesting it is the most important predictor in our model.
If a respondent scores below 3.3 on ZEROSUM_IDENTITY, they are much more likely not to vote for Trump (84%).
If they score 3.3 or higher, they are much more likely to vote for Trump (87%).

In [7]:

Show the code


dt_model <- rpart(
  TRUMPVOTE ~ POLITICALBELIEFS + ZEROSUM_ECONOMIC + ZEROSUM_IDENTITY + ZEROSUM_1 + GENDER_MALE +
    RELIGIOUS_YES + RACE_BLACK + RACE_ASIAN + RACE_OTHER + EDUCATION_HIGH + SOCIALSTATUS,
  data = select_data,
  method = "class",
  control = rpart.control(
    cp = 0.001,         # smaller = deeper tree
    minsplit = 10,      # smaller = allows more splits
    maxdepth = 5        # allow up to 5 levels deep
  )
)

rpart.plot(dt_model, extra = 104)

Figure x. Extended Decision Tree with Demographic Variables.

This expanded decision tree incorporates demographic variables (gender and race) alongside the core predictors. The tree shows how demographic factors interact with ideological variables to refine predictions, with male respondents and those from “other” racial categories showing higher Trump support within similar ideological profiles.

In [8]:

Show the code


dt_model <- rpart(
  TRUMPVOTE ~ POLITICALBELIEFS + ZEROSUM_ECONOMIC + ZEROSUM_IDENTITY + ZEROSUM_1 + GENDER_MALE +
    RELIGIOUS_YES + RACE_BLACK + RACE_ASIAN + RACE_OTHER + EDUCATION_HIGH + SOCIALSTATUS,
  data = select_data,
  method = "class",
  control = rpart.control(cp = 0.001)
)

printcp(dt_model)


Classification tree:
rpart(formula = TRUMPVOTE ~ POLITICALBELIEFS + ZEROSUM_ECONOMIC + 
    ZEROSUM_IDENTITY + ZEROSUM_1 + GENDER_MALE + RELIGIOUS_YES + 
    RACE_BLACK + RACE_ASIAN + RACE_OTHER + EDUCATION_HIGH + SOCIALSTATUS, 
    data = select_data, method = "class", control = rpart.control(cp = 0.001))

Variables actually used in tree construction:
[1] POLITICALBELIEFS ZEROSUM_IDENTITY

Root node error: 50/101 = 0.49505

n=101 (21 observations deleted due to missingness)

     CP nsplit rel error xerror     xstd
1 0.720      0      1.00   1.22 0.098302
2 0.040      1      0.28   0.36 0.076921
3 0.001      2      0.24   0.32 0.073390

The best tree is the 2-split model with Cross-validation error (0.24)

In [9]:

Show the code


library(randomForest)

randomForest 4.7-1.2

Type rfNews() to see new features/changes/bug fixes.


Attaching package: 'randomForest'

The following object is masked from 'package:psych':

    outlier

The following object is masked from 'package:dplyr':

    combine

The following object is masked from 'package:ggplot2':

    margin

Show the code


library(tidyr)

# need to drop NA to get accuracy
select_data <- select_data %>%
  drop_na(TRUMPVOTE, POLITICALBELIEFS, ZEROSUM_ECONOMIC, ZEROSUM_IDENTITY, ZEROSUM_1,
          GENDER_MALE, RELIGIOUS_YES, RACE_BLACK, RACE_ASIAN, RACE_OTHER, EDUCATION_HIGH, SOCIALSTATUS)

# split into training and testing sets
set.seed(123)
train_idx <- sample(seq_len(nrow(select_data)), size = 0.7 * nrow(select_data))
train <- select_data[train_idx, ]
test  <- select_data[-train_idx, ]

# Fit random forest model
rf_model <- randomForest(
  TRUMPVOTE ~ POLITICALBELIEFS + ZEROSUM_ECONOMIC + ZEROSUM_IDENTITY + ZEROSUM_1 + GENDER_MALE +
    RELIGIOUS_YES + RACE_BLACK + RACE_ASIAN + RACE_OTHER + EDUCATION_HIGH + SOCIALSTATUS,
  data = train,
  na.action = na.roughfix,
  ntree = 500
)

# Predict on test set
pred <- predict(rf_model, newdata = test)


# Confusion matrix
table(Predicted = pred, Actual = test$TRUMPVOTE)

         Actual
Predicted  0  1
        0 13  3
        1  2 12

Show the code


# Accuracy
mean(pred == test$TRUMPVOTE)

[1] 0.8333333

Show the code


# Variable importance
varImpPlot(rf_model)

Figure x. Variable Importance in Random Forest Model.

Zero-sum identity beliefs and political beliefs emerge as the most important predictors, with Mean Decrease Gini values around 9-12, substantially higher than other variables. This ranking confirms our regression results that these two variables are the main drivers of Trump’s voting behavior, while demographic and other ideological variables play a secondary role.

In [10]:

Show the code


library(yardstick)


Attaching package: 'yardstick'

The following object is masked from 'package:readr':

    spec

Show the code


library(ggplot2)
library(dplyr)

# Create data frame for predictions and actual values
conf_df <- data.frame(
  truth = test$TRUMPVOTE,
  prediction = pred
)

# Create confusion matrix object
conf_mat_obj <- conf_mat(conf_df, truth = truth, estimate = prediction)

# Visualize it
autoplot(conf_mat_obj, type = "heatmap") +
  scale_fill_gradient(low = "white", high = "steelblue") +
  labs(title = "Confusion Matrix: Random Forest",
       x = "Predicted",
       y = "Actual")

Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.

Figure x. Random Forest Model Performance.

The confusion matrix shows the random forest model’s prediction accuracy on the test data. The model achieved an overall accuracy of 83.33%, correctly classifying 13 of 16 non-Trump voters and 12 of 14 Trump voters. The model experienced two false negatives (predicting Trump voters as non-Trump voters) and three false positives (predicting non-Trump voters as Trump voters), indicating strong but not perfect prediction performance.