---
title: Logistic Regression
subtitle: predicting voter choice 2024
author:
  - name: Aashia Khan
    corresponding: true
    roles:
      - Conceptualization
      - Investigation
      - Methodology
      - Project administration
      - Resources
      - Supervision
    affiliations: 
      - Binghamton University
  - name: Zihan Hei
    corresponding: false
    orcid: 0009-0007-1617-2666 
    roles:
      - Software
      - Formal analysis
      - Visualization
      - Writing - original draft
    affiliations:
      - Binghamton University
  - name: Jeff John
    corresponding: false
    roles:
      - Methodology
      - Project administration
      - Writing - original draft
    affiliations:
      - Binghamton University
  - name: Shane McCarty
    orcid: 0000-0001-8930-7049
    corresponding: true
    roles:
      - Conceptualization
      - Investigation
      - Methodology
      - Project administration
      - Resources
      - Supervision
      - Writing - review & editing
    affiliations:
      - Binghamton University
      - Promote Care & Prevent Harm
keywords:
  - zero sum beliefs
  - social identities
  - political affiliation
  - racial identity
abstract: |
  *Background:* Zero-sum beliefs—the perception that one group's gains necessarily result in another group's losses—are important predictors of political attitudes. However, the referents for zero-sum beliefs as economic or social identity remain underexplored in relation to political ideology, party affiliation, and voting behavior in contemporary elections.
  
  *Method:* We conducted a comprehensive analysis examining three dimensions of zero-sum beliefs (general, economic, and social identity). Using Kruskal-Wallis tests on eleven zero-sum beliefs, we investigated how political party affiliation and racial/ethnic identity influenced endorsement of zero-sum beliefs across multiple domains. Subsequently, we examined whether these zero-sum belief patterns predicted self-reported voting for Donald Trump versus Kamala Harris in the 2024 presidential election.
  
  *Results:* Political party affiliation was a significant predictor for all eight zero-sum social identity beliefs, but none of the economic or general beliefs. Republican voters and certain racial/ethnic groups demonstrated higher endorsement of zero-sum social identity beliefs. A logistic regression shows that after controlling for political ideology, a composite of zero-sum social identity beliefs explains voting behavior in the 2024 presidential election, with stronger zero-sum social identity thinking associated with Trump support and lower zero-sum social identity beliefs predicting Harris support. Other sociodemographic factors and zero-sum economic thinking were not significant predictors.
  
  *Discussion:* Zero-sum social identity beliefs may represent a competitive core belief underlying contemporary political party affiliation and candidate preference. These findings affirm prior work that zero-sum thinking about economics differ from social identities, with similar levels of agreement on zero-sum economic beliefs across political parties but significantly different levels of agreement on zero-sum social identity beliefs by party affiliation. To the best of our knowledge, this study is the first to show that zero-sum thinking about social identities predicts voter preference in the 2024 election. Ultimately, future work needs to examine how to reduce zero-sum social identity thinking. 

plain-language-summary: |
  Many people hold "zero-sum beliefs"—the idea that when one group succeeds, another group must fail. This study examined how these beliefs relate to politics and voting in the 2024 presidential election. We found that while people across political parties hold similar zero-sum beliefs about economic issues, Republicans were much more likely than Democrats to hold zero-sum beliefs about social identities (such as undocumented vs. citizens, White vs. Black, LGBTQ vs. Religious). These social identity zero-sum beliefs were strong predictors of self-reported voting behavior: people with stronger zero-sum social identity beliefs were more likely to vote for Donald Trump, while those with weaker zero-sum social identity beliefs were more likely to vote for Kamala Harris. Post election discourse suggests the economy -- and not social identity -- was the primary driver of voters. Yet, our results show that zero-sum beliefs about economics didn't predict candidate preference. But, the social identity zero-sum beliefs were the second most important factor after accounting for political ideology as the primary predictor. This suggests that how people think about competition between different social groups is a key psychological factor driving political divisions and voting choices in America today.
date: last-modified
number-sections: true
format:
  html:
    toc: true
    code-fold: true
    code-summary: "Show the code"
    comments:
      hypothesis: true
  pdf:
    number-sections: true
    code-fold: true
    code-summary: "Show the code"
    keep-tex: true
---

## Predicting Voting Behavior for 2024 Presidential Candidate

### Factor Analysis of Zero-Sum Beliefs

```{r}
#| label: load library
#| echo: false
#| output: false
#| warning: false
#| error: false
#| results: false

# Load necessary libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(ggrain)  
library(rmarkdown)
library(readr)
library(dplyr, warn.conflicts = FALSE)
library(haven)
library(rempsyc)
library(knitr)
library(broom)
library(ggdist)
library(devtools)
library(apaTables)
library(ggpubr)
library(psych)
library(forcats)
library(corrplot)
```


```{r}
#| label: import select_data
select_data <- read.csv("/cloud/project/data/select_data.csv")
```

```{r}
#| label: factor analysis

library(psych)
# create data frame of ZEROSUM variables for factor analysis
df.ZEROSUM <- select_data[, c("ZEROSUM_1", "ZEROSUM_2", "ZEROSUM_3", "ZEROSUM_4", 
                                  "ZEROSUM_5", "ZEROSUM_6", "ZEROSUM_7", "ZEROSUM_8", 
                                  "ZEROSUM_9", "ZEROSUM_10", "ZEROSUM_11")]

# Or using dplyr to select variables
# zerosum_vars <- select_data %>% select(ZEROSUM_1:ZEROSUM_11)

# Check the correlation matrix first
cor_matrix <- cor(df.ZEROSUM, use = "complete.obs")
print(cor_matrix)

# Determine number of factors using scree plot and parallel analysis
scree(df.ZEROSUM)
fa.parallel(df.ZEROSUM, fa = "fa")

# Run 2-factor factor analysis (adjust nfactors based on scree plot/parallel analysis)
fa_result <- fa(df.ZEROSUM, 
                nfactors = 2,  # adjust this number based on your analysis
                rotate = "promax",
                fm = "ml")  # maximum likelihood

# View results
print(fa_result)
fa_result$loadings

# Get factor scores
factor_scores <- fa_result$scores

```

The results of the factor analysis – an unsupervised machine learning technique – support a two factor model with a promax rotation. The first item loads equally on each factor and will not be included in the composite construction. Based on the items, we named the first factor as `ZEROSUM_ECONOMIC` and the second factor as `ZEROSUM_IDENTITY` to correspond with the two different referents of economic (e.g., wealth vs. poor) and social identity (e.g., racial minorities vs. white people), respectively.

```{r}
#| label: create two composites based on factor analysis

select_data <- select_data %>%
  mutate(
    ZEROSUM_ECONOMIC = (ZEROSUM_2 + ZEROSUM_3)/2,
    ZEROSUM_IDENTITY = (ZEROSUM_4 + ZEROSUM_5 + ZEROSUM_6 + ZEROSUM_7 + ZEROSUM_8 + ZEROSUM_9 + ZEROSUM_10 + ZEROSUM_11)/8
  )
```

```{r}
#| label: alpha identity
library(psych)
# Alpha for ZEROSUM_IDENTITY (8 items)
alpha_identity <- psych::alpha(select_data[, c("ZEROSUM_4", "ZEROSUM_5", "ZEROSUM_6", 
                                         "ZEROSUM_7", "ZEROSUM_8", "ZEROSUM_9", 
                                         "ZEROSUM_10", "ZEROSUM_11")])
print(alpha_identity)

```

```{r}
#| label: alpha economic

library(psych)

# Alpha for ZEROSUM_ECONOMIC (2 items)
alpha_economic <- psych::alpha(select_data[, c("ZEROSUM_2", "ZEROSUM_3")])
print(alpha_economic)

```

### Logistic Regression (predicting voter choice 2024)

```{r}
#| label: transform VOTE2024 to TRUMPVOTE

# create new variable
select_data <- select_data %>%
  mutate(TRUMPVOTE = case_when(
    VOTE2024 == 1 ~ 1,
    VOTE2024 == 2 ~ 0,
    TRUE ~ NA
  ))
select_data
```

```{r}
#| label: logistic regression v1 version 1

# Fit the logistic regression model
logregmodel.v1 <- glm(TRUMPVOTE ~ POLITICALBELIEFS + AGE + SOCIALSTATUS + ZEROSUM_IDENTITY + ZEROSUM_ECONOMIC + ZEROSUM_1,
             data = select_data, 
             family = binomial)

# View the results
summary(logregmodel.v1)
```

```{r}
#| label: APA format logistic regression v1 version 1 APA

# Fit the logistic regression model
logregmodel.v1 <- glm(TRUMPVOTE ~ POLITICALBELIEFS + AGE + SOCIALSTATUS + ZEROSUM_IDENTITY + ZEROSUM_ECONOMIC + ZEROSUM_1,
             data = select_data, 
             family = binomial)

# Tidy model output
logit_table <- broom::tidy(logregmodel.v1, conf.int = TRUE) %>%
  rename(Predictor = term,
         B = estimate,
         SE = std.error,
         z = statistic,
         p = p.value,
         CI_lower = conf.low,
         CI_upper = conf.high)

# Display APA-style table
nice_table(logit_table, 
           title = c("Table 12", "Logistic Regression Predicting Trump Vote"), 
           highlight = 0.05, 
           stars = TRUE)
```

```{r}
#| label: logistic regression v2 version 2

# Fit the logistic regression model
logregmodel.v2 <- glm(TRUMPVOTE ~ POLITICALBELIEFS + ZEROSUM_ECONOMIC + ZEROSUM_IDENTITY + ZEROSUM_1 + GENDER_MALE +
                  RELIGIOUS_YES + RACE_BLACK + RACE_ASIAN + RACE_OTHER + EDUCATION_HIGH + SOCIALSTATUS,
             data = select_data, 
             family = binomial)

# View the results
summary(logregmodel.v2)
```

```{r}
#| label: APA format logistic regression v2 version 2 APA

# Fit the logistic regression model
logregmodel.v2 <- glm(TRUMPVOTE ~ POLITICALBELIEFS + ZEROSUM_ECONOMIC + ZEROSUM_IDENTITY + ZEROSUM_1 + GENDER_MALE +
    RELIGIOUS_YES + RACE_BLACK + RACE_ASIAN + RACE_OTHER + EDUCATION_HIGH + SOCIALSTATUS,
             data = select_data, 
             family = binomial)

# Tidy model output
logit_table <- broom::tidy(logregmodel.v2, conf.int = TRUE) %>%
  rename(Predictor = term,
         B = estimate,
         SE = std.error,
         z = statistic,
         p = p.value,
         CI_lower = conf.low,
         CI_upper = conf.high)

# Display APA-style table
nice_table(logit_table, 
           title = c("Table 12", "Logistic Regression Predicting Trump Vote"), 
           highlight = 0.05, 
           stars = TRUE)
```


```{r}
#| label: logregmodel.v1 coeff

# Create prediction data for one variable (holding others at mean)
pred_data <- with(select_data, 
  data.frame(
    POLITICALBELIEFS = seq(min(POLITICALBELIEFS, na.rm = TRUE), 
                          max(POLITICALBELIEFS, na.rm = TRUE), length = 100),
    AGE = mean(AGE, na.rm = TRUE),
    SOCIALSTATUS = mean(SOCIALSTATUS, na.rm = TRUE),
    ZEROSUM_IDENTITY = mean(ZEROSUM_IDENTITY, na.rm = TRUE),
    ZEROSUM_ECONOMIC = mean(ZEROSUM_ECONOMIC, na.rm = TRUE),
    ZEROSUM_1 = mean(ZEROSUM_1, na.rm = TRUE)
  ))

# Get predictions with standard errors
predictions <- predict(logregmodel.v1, pred_data, type = "link", se.fit = TRUE)

# Convert to probabilities and calculate confidence intervals
pred_data$predicted_prob <- plogis(predictions$fit)
pred_data$lower_ci <- plogis(predictions$fit - 1.96 * predictions$se.fit)
pred_data$upper_ci <- plogis(predictions$fit + 1.96 * predictions$se.fit)

# Plot with confidence intervals and proper labels
plot.TRUMPVOTE.POLITICIALBELIEFS <- ggplot(pred_data, aes(x = POLITICALBELIEFS, y = predicted_prob)) +
  geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci), alpha = 0.3, fill = "purple") +
  geom_line(color = "purple", size = 1) +
  scale_x_continuous(
    breaks = 1:7,
    labels = c("Far Left /\nLeftist", "Very Liberal", "Liberal", "Moderate", 
               "Conservative", "Very\nConservative", "Alt-Right /\nFar-Right")
  ) +
  labs(title = "Predicted Probability of Trump Vote by Political Beliefs",
       subtitle = "With 95% Confidence Intervals",
       x = "Political Beliefs", y = "Predicted Probability") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9))

# Print and save the plots
print(plot.TRUMPVOTE.POLITICIALBELIEFS)
ggsave("plot12:TRUMPVOTE.POLITICIALBELIEFS.png", 
       plot = plot.TRUMPVOTE.POLITICIALBELIEFS, 
       width = 10, 
       height = 8, 
       dpi = 300)

```

```{r}
#| label: plot ZEROSUM_IDENTITY

# Create prediction data for ZEROSUM_IDENTITY (holding others at mean)
pred_data_identity <- with(select_data, 
  data.frame(
    ZEROSUM_IDENTITY = seq(min(ZEROSUM_IDENTITY, na.rm = TRUE), 
                          max(ZEROSUM_IDENTITY, na.rm = TRUE), length = 100),
    POLITICALBELIEFS = mean(POLITICALBELIEFS, na.rm = TRUE),
    AGE = mean(AGE, na.rm = TRUE),
    SOCIALSTATUS = mean(SOCIALSTATUS, na.rm = TRUE),
    ZEROSUM_ECONOMIC = mean(ZEROSUM_ECONOMIC, na.rm = TRUE),
    ZEROSUM_1 = mean(ZEROSUM_1, na.rm = TRUE)
  ))

# Get predictions with standard errors
predictions_identity <- predict(logregmodel.v1, pred_data_identity, type = "link", se.fit = TRUE)

# Convert to probabilities and calculate confidence intervals
pred_data_identity$predicted_prob <- plogis(predictions_identity$fit)
pred_data_identity$lower_ci <- plogis(predictions_identity$fit - 1.96 * predictions_identity$se.fit)
pred_data_identity$upper_ci <- plogis(predictions_identity$fit + 1.96 * predictions_identity$se.fit)

# Plot with confidence intervals and proper labels
plot.TRUMPVOTE.ZEROSUM_IDENTITY <- ggplot(pred_data_identity, aes(x = ZEROSUM_IDENTITY, y = predicted_prob)) +
  geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci), alpha = 0.3, fill = "red") +
  geom_line(color = "red", size = 1) +
  scale_x_continuous(
    breaks = 1:7,
    labels = c("Strongly\nDisbelieve", "Disbelieve", "Somewhat\nDisbelieve", 
               "Neither\nDisbelieve\nnor Believe", "Somewhat\nBelieve", 
               "Believe", "Strongly\nBelieve")
  ) +
  labs(title = "Predicted Probability of Trump Vote by Zero-Sum IDENTITY Beliefs",
       subtitle = "With 95% Confidence Intervals",
       x = "Zero-Sum IDENTITY Beliefs", y = "Predicted Probability") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9))

# Print and save the plots
print(plot.TRUMPVOTE.ZEROSUM_IDENTITY)
ggsave("plot13:TRUMPVOTE.ZEROSUM_IDENTITY.png", 
       plot = plot.TRUMPVOTE.ZEROSUM_IDENTITY, 
       width = 10, 
       height = 8, 
       dpi = 300)

```

```{r}
#| label: plot ZEROSUM_ECONOMIC

# Create prediction data for ZEROSUM_ECONOMIC (holding others at mean)
pred_data_econ <- with(select_data, 
  data.frame(
    ZEROSUM_ECONOMIC = seq(min(ZEROSUM_ECONOMIC, na.rm = TRUE), 
                          max(ZEROSUM_ECONOMIC, na.rm = TRUE), length = 100),
    POLITICALBELIEFS = mean(POLITICALBELIEFS, na.rm = TRUE),
    AGE = mean(AGE, na.rm = TRUE),
    SOCIALSTATUS = mean(SOCIALSTATUS, na.rm = TRUE),
    ZEROSUM_IDENTITY = mean(ZEROSUM_IDENTITY, na.rm = TRUE),
    ZEROSUM_1 = mean(ZEROSUM_1, na.rm = TRUE)
  ))

# Get predictions with standard errors
predictions_econ <- predict(logregmodel.v1, pred_data_econ, type = "link", se.fit = TRUE)

# Convert to probabilities and calculate confidence intervals
pred_data_econ$predicted_prob <- plogis(predictions_econ$fit)
pred_data_econ$lower_ci <- plogis(predictions_econ$fit - 1.96 * predictions_econ$se.fit)
pred_data_econ$upper_ci <- plogis(predictions_econ$fit + 1.96 * predictions_econ$se.fit)

# Plot with confidence intervals and proper labels
plot.TRUMPVOTE.ZEROSUM_ECONOMIC <- ggplot(pred_data_econ, aes(x = ZEROSUM_ECONOMIC, y = predicted_prob)) +
  geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci), alpha = 0.3, fill = "blue") +
  geom_line(color = "blue", size = 1) +
  scale_x_continuous(
    breaks = 1:7,
    labels = c("Strongly\nDisbelieve", "Disbelieve", "Somewhat\nDisbelieve", 
               "Neither\nDisbelieve\nnor Believe", "Somewhat\nBelieve", 
               "Believe", "Strongly\nBelieve")
  ) +
  labs(title = "Predicted Probability of Trump Vote by Zero-Sum Economic Beliefs",
       subtitle = "With 95% Confidence Intervals",
       x = "Zero-Sum Economic Beliefs", y = "Predicted Probability") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9))

# Print and save the plots
print(plot.TRUMPVOTE.ZEROSUM_ECONOMIC)
ggsave("plot13:TRUMPVOTE.ZEROSUM_ECONOMIC.png", 
       plot = plot.TRUMPVOTE.ZEROSUM_ECONOMIC, 
       width = 10, 
       height = 8, 
       dpi = 300)

```

$$
\begin{aligned}
\log\left(\frac{\hat{P}(\text{TRUMPVOTE})}{1 - \hat{P}(\text{TRUMPVOTE})}\right)
&= \beta_0 + \beta_1 \cdot \text{POLITICALBELIEFS} + \beta_2 \cdot \text{AGE} \\
&\quad + \beta_3 \cdot \text{SOCIALSTATUS} + \beta_4 \cdot \text{ZEROSUM_ECONOMIC} \\
&\quad + \beta_5 \cdot \text{ZEROSUM_IDENTITY} + \beta_6 \cdot \text{ZEROSUM_1}
\end{aligned}
$$ {#eq-logistic-regression-v1}


```{r}
#| label: corr

library(corrplot)

# Select the variables for correlation matrix
cor_vars <- select_data %>%
  select(TRUMPVOTE, ZEROSUM_ECONOMIC, ZEROSUM_IDENTITY, ZEROSUM_1:ZEROSUM_11, POLITICALBELIEFS)

# Create correlation matrix (using complete observations)
cor_matrix <- cor(cor_vars, use = "complete.obs")

# Print the correlation matrix
print(cor_matrix)

# Visualize with corrplot
corrplot(cor_matrix, 
         method = "color",
         type = "upper",
         order = "hclust",
         tl.cex = 0.8,
         tl.col = "black",
         tl.srt = 45,
         addCoef.col = "black",
         number.cex = 0.7)

# Alternative visualization with different style
corrplot(cor_matrix, 
         method = "circle",
         type = "full",
         order = "original",
         tl.cex = 0.8,
         tl.col = "black",
         tl.srt = 45,
         col = colorRampPalette(c("blue", "white", "red"))(100))
```