Election Blog
Week 10: Model Evaluation
Recap of My Model and Predictions
Prior to the 2024 presidential election, I developed a weighted ensemble model using super learning. The weights for this ensemble were determined from the out-of-sample performance of multiple OLS models incorporating various predictor variables, including lagged vote share, economic indicators, polling data, demographic variables, incumbency consideration, and their combinations.
My final forecast predicted Kamala Harris would win the national two-party popular vote with 56.76%, but lose the Electoral College vote, securing only 226 votes. The model accurately predicted the results of the battleground states, including Arizona, Georgia, Michigan, Nevada, North Carolina, Pennsylvania, and Wisconsin, all of which voted for Donald Trump.
Electoral College Vote Share Evaluation
Below is the electoral college map showing the election outcome, which matched my model’s predictions.
The following bubble map illustrates county-level results by total number of votes casted in each county, to better visualize the vote distribution. Democratic vote share is concentrated in highly populated urban areas, while Republican vote share cover a broader geographic area, as shown by the widespread red bubbles.
My prediction error for each state is demonstrated in the graph below. All values still result in the correct prediction for the party winner of each state’s two-party popular vote.
National Two-Party Popular Vote Share Evaluation
## Bias: -7.638292
Despite predicting the the electoral college vote share correctly, my prediction for the national two-party popular vote share went terribly wrong as of November 18th, where I overestimated The Democratic Party’s popular vote by 7.64 percentage points.
Therefore, I will focus on evaluating the reason for inaccuracy in my national two-party popular vote share model. I hypothesize a few reasons for my model’s inaccuracy and propose corresponding changes.
1. Unique circumstance of incumbency
The Harris-Trump matchup presented a unique incumbency scenario: Harris was the sitting vice president, while Trump was a former president. My initial model did not fully account for this dynamic.
Change: Replace the \(IncumbentPresident\)
variable with \(PrevAdmin\)
to better capture the influence of prior administrations in such scenarios.
2. Polling model shortcomings
My polling model’s equation is $$D\_pv2p = D\_NetLatest538PollAverage + D\_NetMean538PollAverage(30 weeks) + NetLatestJobApproval + NetMeanJobApproval(June-Oct)$$
for Democratic vote share, which I then repeat for Republican vote share separately, then rescaling them to a 100%. Predictors like net job approval and polling averages are meaningful only when used to model the incumbent party’s vote share, which is not the case in my model that uses Democratic Party and Republican Party vote share as response variables.
Change: Re-run the polling model with the inclusion of dummy variables for \(PrevAdmin\)
and \(IncumbentParty\)
.
3. Poor predictability of demographics
The demographic model exhibited high variability in out-of-sample MSE, leading to a low weight in the ensemble model.
Change: Remove demographics as a predictor variable to simplify the model and reduce variability.
Year | In-Sample Economy MSE | In-Sample Polling MSE | In-Sample Combined MSE | In-Sample Ensemble MSE | Out-of-Sample Economy MSE | Out-of-Sample Polling MSE | Out-of-Sample Combined MSE | Out-of-Sample Ensemble MSE | Economy Weight | Polling Weight | Combined Weight |
---|---|---|---|---|---|---|---|---|---|---|---|
2020 | 4.837059 | 1.9122978 | 0 | 1.5612281 | 2.317541e+04 | 13.6184627 | 65477.451298 | 0.000000 | 0.1391090 | 0.7895167 | 0.0713743 |
2016 | 12.296768 | 2.3444192 | 0 | 3.5292405 | 8.347200e-03 | 4.7241477 | 378.947779 | 0.000000 | 0.4697574 | 0.4789688 | 0.0512738 |
2012 | 12.148008 | 0.5810492 | 0 | 1.4386519 | 4.862322e+00 | 39.7400609 | 23.692840 | 0.000000 | 0.3319988 | 0.2255234 | 0.4424777 |
2008 | 3.710408 | 1.4376414 | 0 | 0.4612257 | 2.049025e+03 | 33.7935739 | 393.869633 | 0.000000 | 0.2239442 | 0.3751625 | 0.4008934 |
2004 | 12.268307 | 2.0717113 | 0 | 8.1153974 | 3.388892e-01 | 5.9350622 | 9.560224 | 0.000000 | 0.8102364 | 0.1754739 | 0.0142897 |
2000 | 12.188676 | 2.4406611 | 0 | 3.7118069 | 1.979935e+00 | 0.0577291 | 378.793680 | 0.000000 | 0.5017074 | 0.4563862 | 0.0419064 |
1996 | 8.265306 | 2.4359492 | 0 | 2.1421979 | 4.938677e+01 | 0.2216226 | 27.600333 | 0.000000 | 0.0410015 | 0.9304671 | 0.0285315 |
1992 | 7.884249 | 2.4281347 | 0 | 2.2868915 | 6.602771e+01 | 0.4192285 | 511.775086 | 0.000000 | 0.0045835 | 0.9693198 | 0.0260966 |
1988 | 11.576491 | 2.1061987 | 0 | 0.0000000 | 9.176773e+00 | 5.2244662 | 2.737042 | 2.737042 | 0.0000000 | 0.0000000 | 1.0000000 |
1984 | 4.109527 | 2.4359492 | 0 | 0.4110531 | 1.407379e+02 | 0.2216226 | 27.600333 | 0.000000 | 0.1941586 | 0.3371914 | 0.4686500 |
1980 | 8.974900 | 0.1700332 | 0 | 1.2043133 | 5.151202e+02 | 87.3821839 | 4381.192756 | 0.000000 | 0.3489344 | 0.4656557 | 0.1854098 |
Year | Party | Predicted Vote Share (%) | Winner |
---|---|---|---|
2024 | Democrat | 49.0203 | FALSE |
2024 | Republican | 50.9797 | TRUE |
## Bias: 0.1039336
Incorporating the changes above, my revised model predicted the national two-party popular vote share with 0.1 percentage points error.
Other than the changes I have done above, I propose future modifications for election prediction models.
1. Accounting for voter turnout
Hypothesis: Lower turnout among traditionally Democratic voters, possibly due to dissatisfaction with the administration’s handling of difficult issues such as the Gaza conflict, reduced Harris’s vote share.
Test: Compare turnout rates by demographic groups in 2024 to previous elections using voter file data and assess whether historically Democratic demographics (such as younger voters and minority ethnic groups) had a decline in turnout.
2. Economic reality versus perception
I will incorporate economic variables that more accurately measures voters’ perception of the economy, as people might still be recovering from the impact of COVID-19 economic downturn, so unemployment, GDP and RDPI figures may not reflect the full extent of how voters perceive the economy.
Hypothesis: My economic predictors (such as unemployment, GDP, S&P 500 volume, and real disposable personal income) failed to capture voters’ subjective perceptions of the economy. For example, while unemployment rates were low, the impact of food price inflation or the lingering effects of the COVID-19 pandemic might have weighed more heavily on voters’ decisions, given that April 2022 food prices inflation rate rose up to 10.8%, for instance.
Test: Analyze survey data on voters’ perception of economic conditions and correlate these perceptions with vote choices.
I will also expand economic variables:
- Use additional economic predictors, such as food price inflation and median wage growth, to capture the direct impact of economic stressors on voters.
- Extend the time frame for economic data to include the entire incumbent party’s term, not just the election year, as significantly poor economic conditions in 2022 and 2023 may have caused voter dissatistaction which carried on to 2024.
3. Adjusting for airwar according to latest trends
Hypothesis: While traditional airwar analysis focuses on campaign spending on television advertisements, modern media platforms such as podcasts, social media, and celebrity endorsements may play a significant role in shaping public opinion, especially among younger voters who are tech-savvy or older voters who have a lot of time to spend on their devices. For example, Trump’s appearance on Joe Rogan’s podcast, Harris’s endorsement by Taylor Swift, or interactions on platforms like FaceBook, Instagram and TikTok may influence voter sentiment in ways that are not captured by traditional ad spending metrics.
Test: Collect data on the following metrics and compare them to the candidates’ vote share in specific demographic groups (such as younger voters) to evaluate their predictive power.
Metrics include:
- Audience size for each candidate’s media appearances on platforms such as podcasts, late-night shows, and endorsements by public figures
- Engagement metrics such as the number of likes, shares, and comments for content related to each candidate
- Sentiment analysis of audience comments using natural language processing to assess voter sentiment