2021 TAMIDS Data Science Competition
Award
Finalist Team.
1st Placed Undergraduate Team ($1,500 Price Award).
Special Award: Best Use of Additional Data ($250 Price Award).
Team
Songlin Xie, Fang Shu.
Award Webpage
Objective
Build models to discover the insights of campaign finance on electoral outcomes.
Identify effective metrics for campaign donations and spending.
Provide suggestions for future campaigns or political parties on where funds should be directed.
Data
The two datasets provided by the Texas A&M Institute of Data Science included detailed county presidential election vote information.
Datasets of campaign operating expenditure were retrieved from the Federal Election Commission (FEC) website.
Datasets of presidential voting results were retrieved from CNN news.
Visualization Dashboards
The first interactive dashboard was created by Jupyter Notebook. This interactive United States map allowed users to hover around a State (as in Figure 2) and observe the count of Democrat votes, Republican votes, and their relative percentages.
The second interactive dashboard was built with R Shiny, which presented Senate elections.
Model Selection
Two models were built - 2020 Republican voting results and 2020 Democratic voting results. The predicted values were the percentage of voting by each state; the predictors were previous years’ voting results and different categories’ spending amounts. After cleaning the data, for 2020 Republican voting results, the data frame has 51 observations and 50 variables; for 2020 Democratic voting results, the data frame has 51 observations and 131 variables.
The main issue of directly using regression was that the number of the variables was much larger than the number of observations, which creates singularity issues. Therefore, to prevent singularity and to minimize the number of selected variables, forward stepwise regression was used for building the models. In addition, to validate the model, the cross-validation method was used to discuss the effectiveness of the models, and the number of the folds for cross validation was set to be 10.
Because the goal of this model was to look for effective predictors, 11 predictors were used to fit the reduced polynomial models even though the forward stepwise method suggested fewer predictors.
Democrat Forward Select Model and Republican Forward Select Model, respectively:
Reduced polynomial models for Democrat and Republican, respectively:
Diagnostic plots for Democrat reduced polynomial model and for Republican reduced polynomial model, respectively:
Coefficient plots for Democrat model and Republican model, respectively:
Other Factors
The voter turnout rate and the voting eligible population
The VEP Turnout Rate versus the number of new Covid cases in each State in October, 2020
polynomial regression to predict the number of votes with predictors of “Covid cases”, “Men”, “Women”, “Black”, “White”.
Conclusion
The main effect on the final election results was previous election results. Therefore, it was important to focus on those states with smaller margins of voting. Most of those states are referred to as “swing states”.
We found that money definitely plays a role in determining presidential election outcomes, though the role of money was not significant in Senate elections.
Some categories of expenditures significantly have a higher impact on the final results such as technology usage, media usage, interpreting the campaign, etc.
All campaigns spent the most on advertising. Some advertisements have positive effects, such as portrait photographing and website advertising. However, as suggested by the data, many advertisements have negative effects, such as cell phone calling and messaging.
Other important positive factors of expenditures include hosting and campaigning, fundraising, labor payments, and interpretation.
The race/ethnicity of the voter plays a stronger role in affecting the election outcome than the Covid pandemic or voter gender does.
In addition, it is noticeable that the popular votes for Democrats were generally higher than Republicans in the past elections. Therefore, influencing State Legislations and diluting votes are also important methods in winning presidential elections.