Week 9: Final Election Prediction

ShuXin Ho

2024/11/03

Election Blog

Week 9: Final Election Prediction

Model formula: $$pv2p_t = \beta_0 + \beta_1\cdot{Economy_t} + \beta_2\cdot{Polling_t} + \beta_3\cdot{Demographics_t} + \beta_4\cdot{Incumbency_t}$$

\(Economy_t\): As Sides & Vavreck (2013) emphasized, fundamentals, such as the state of the economy, play a stronger role than campaign dynamics in determining election outcomes. This is supported by Achen & Bartels (2017), who argued that voters often vote retrospectively based on the economy.

I ran a random forest model to identify the most important economic indicators influencing vote share. To account for historical trends effectively, I incorporated data from 1980 onward, the beginning of the Reagan era, when election factors in significant realignments in American politics, so my model can reflect current ideological divides.

For my super learning model, I selected the five most predictive economic indicators: unemployment rate, GDP growth rate, S&P 500 volume, consumer sentiment index, and RDP growth in the second quarter of the election year.

\(Polling_t\): Gelman & King (1993) highlighted that polling closer to the election tends to be more accurate, as it reflects voters’ settled and informed preferences. Tien & Lewis-Beck (2017) also similarly argued that long-view (historical and theoretical) models align closer with the actual popular vote result.

In my model, I used the mean FiveThirtyEight poll averages from three months before the election day and the final poll average immediately before the election. I also obtained Gallup presidential job approval ratings to calculate mean net approval since June and latest net approval ratings in the same manner, informed by Abramowitz’s Time-for-Change Model.

\(Demographics_t\): According to Kim & Zilinsky (2024)’s partisan identification is the strongest and most stable predictor of vote choice. Consistent with this finding, my random forest model reveals that party identification outperform other demographic variables in predictive power.

Therefore, I include partisan affiliation among registered voters as a demographic predictor in my model.

\(Incumbency_t\): Incumbent presidents are typically at an advantage, be it through name recognition, public attention through media coverage, headstart in campaigning without the concern of primary elections, and power for pork-barrel spending. In my model, I do not consider Kamala Harris as an incumbent president (even though she is the current vice president), but I take into account that she is from the incumbent party.

\(AirWar_t\), \(GroundGame_t\), and \(Shocks_t\): After analyzing these variables in my previous blog posts, I found no significant interactions between these dynamic, volatile campaign elements and the actual election outcome. Therefore, I exclude them from my final prediction model.

I use super learning to develop a weighted ensemble where weights are determined by out-of-sample performance of each OLS models with different combination of variables. The tables below show the in-sample and out-of-sample MSEs. The large out-of-sample MSE for the economy model in 2020 is likely due to the outliers in economic indicators as a result of COVID-19, whereas that in 2008 can be attributed to the global financial crisis.

The variation in weights demonstrates the model’s adaptability based on each election’s specific economic and political context. For example, in 2020, the economic data was weighted less heavily, as COVID-19 impacted economic indicators without drastically affecting vote share. In 1992, Clinton’s campaign famously emphasized, “It’s the economy, stupid,” and economic models might have underemphasized public sentiment about Bush’s perceived lack of connection to economic hardship, something that polling picked up through indicators like approval ratings and direct questions about voter preferences.

Table 1: In-Sample and Out-of-Sample MSEs with Ensemble Weights by Year (National)
YearIn-Sample Economy MSEIn-Sample Polling MSEIn-Sample Demographics MSEIn-Sample Combined MSEIn-Sample Ensemble MSEOut-of-Sample Economy MSEOut-of-Sample Polling MSEOut-of-Sample Demographics MSEOut-of-Sample Combined MSEOut-of-Sample Ensemble MSEEconomy WeightPolling WeightDemographics WeightCombined Weight
20204.8372.875821.2805.018523175.41220.77344.211065497.9750.00000.06170.43490.46440.0390
201612.2972.888021.53012.29680.00830.46880.6188378.6790.00831.00000.00000.00000.0000
201212.1481.868320.9902.56314.862319.63478.144123.6880.00000.23470.16560.21600.3837
20083.7101.723219.3101.75562049.025233.568931.8964393.8090.00000.18410.27820.27800.2596
200412.2682.740521.2004.67600.33892.58185.64849.5610.00000.27920.22410.31690.1798
200012.1892.913821.5705.73911.97990.37730.0064378.6790.00000.33950.31890.30570.0359
19968.2652.747719.9202.048249.38684.640624.483027.5980.00000.15200.74340.05310.0515
19927.8842.923920.5602.923966.02770.023514.7096511.6200.02350.00001.00000.00000.0000
198811.5762.576020.1100.00009.17684.804821.26322.7372.73680.00000.00000.00001.0000
19844.1092.896615.9400.7683140.73790.858678.806127.5950.00000.11130.30890.14240.4374
19808.9750.999921.3604.1401515.120226.49803.91814381.1670.00000.21660.32190.35180.1098

Because the current vote shares do not add up to 100%, I rescaled them to 100%. Overall, I predict that the Democratic Party will receive 56.76% of the national two-party popular vote share, with a 90% prediction interval between 56.5% and 54.55%.

YearPartyPredicted Vote Share (%)Winner
2024Democrat56.76TRUE
2024Republican43.24FALSE

Part 2: Electoral College Vote Share

Model formula: $$pv2p_t = \beta_0 + \beta_1\cdot{pv2p_{t-1}} + \beta_2\cdot{pv2p_{t-2}} + \beta_3\cdot{Economy_t} + \beta_4\cdot{Polling_t} + \beta_5\cdot{Incumbency_t}$$

In my state-level model predicting the Democratic Party’s popular vote share, I use a similar set of variables as in my national-level model. This includes economic indicators, polling data, demographics, and incumbency status.

\(pv2p_{t-1} + {pv2p_{t-2}}\): I incorporate the Democratic Party’s vote share from the previous two elections in each state to account for the specific political climate and voter sentiment at the state level.

\(Economy_t\): Including the \(Economy_t\) variable at the state level is tricky as it’s unclear whether voters prioritize sociotropic concerns (national economic indicators) or individual concerns (state economic indicators). To explore this, I compared the significance of these two types of indicators in predicting vote share using a mixed-effects model. Accounting for each state’s baseline political preference, higher state unemployment rates are associated with a significant decrease in the incumbent party’s vote share. This supports the theory that voters are responsive to local economic conditions. Therefore, I included state unemployment as an economic indicator in my model.

VariableEstimateStd. ErrorDFt-valuep-value
(Intercept)49.53865.16524529.59090.0000
state_gdp0.03360.23434520.14320.8862
state_unemployment-1.68020.5862452-2.86610.0043
natl_gdp-0.23940.3273452-0.73130.4650
natl_unemployment0.25470.46714520.54520.5859
natl_consumer_sentiment-0.03570.0465452-0.76920.4422

\(Demographics_t\): Due to lack of a time series data of voter’s party registration and identification at the state-level, I omit this variable from my state-level analysis.

I use super learning to develop a weighted ensemble where weights are determined by out-of-sample performance of each OLS models with different combination of variables.

Table 2: In-Sample and Out-of-Sample MSEs with Ensemble Weights by Year (State)
YearIn-Sample Lagged Vote MSEIn-Sample Economy MSEIn-Sample Polling MSEIn-Sample Combined MSEIn-Sample Ensemble MSEIn-Sample Lagged Vote MSEOut-of-Sample Economy MSEOut-of-Sample Polling MSEOut-of-Sample Combined MSEOut-of-Sample Ensemble MSELagged Vote WeightEconomy WeightPolling WeightCombined Weight
202037.3462.016.3801.9933.4099.625132.647.553734.792.05640.07730.00000.85360.0691
201636.6257.406.3521.5333.48716.59370.626.38927.376.13890.13410.00000.86590.0000
201236.6357.285.8942.0742.86417.305190.0417.4975589.232.20540.00000.00000.95030.0497
200835.1361.546.4901.8894.79034.72252.3310.11521.489.74090.00000.08570.91430.0000
200437.2159.716.8461.9514.52910.91958.251.54114.390.94920.19590.00000.80410.0000
200035.0857.676.5621.7953.91234.44868.554.67326.023.63060.15540.00000.84460.0000

Similar to what I did for the national two-party popular vote share, I rescaled the predicted vote share to 100%. The columns in grey show the original values that do not add up to 100%.

StateRescaled Predicted Democratic Vote Share (%)Rescaled Predicted Republican Vote Share (%)WinnerLower Bound (D)Mean (D)Upper Bound (D)Lower Bound (R)Mean (R)Upper Bound (R)
Arizona47.7352.27Republican47.2047.5847.9451.7552.1152.48
California62.4237.58Democrat61.4661.9762.5236.8537.3237.80
Colorado55.1344.87Democrat54.4654.8655.2744.1744.6545.14
Florida45.5754.43Republican45.0045.3945.7853.8954.2254.57
Georgia48.2351.77Republican47.8348.1948.5351.3151.7252.09
Indiana39.7860.22Republican39.4039.8640.3459.8860.3360.76
Maryland64.6635.34Democrat63.3463.9964.6434.2734.9735.56
Massachusetts63.9336.07Democrat62.7063.2963.9135.1135.7136.25
Michigan49.4450.56Republican49.0149.3549.7050.1250.4850.82
Minnesota52.0847.92Democrat51.6051.9652.3247.3847.8148.22
Missouri41.2558.75Republican40.8941.3441.7958.4358.8659.29
Montana39.0160.99Republican38.5439.0439.5760.5961.0361.49
Nebraska39.6760.33Republican39.3239.8340.3260.0760.5761.02
Nevada48.8551.15Republican48.3248.6649.0150.6050.9551.29
New Hampshire51.6648.34Democrat51.0151.4451.8647.6948.1448.56
New Mexico52.8447.16Democrat52.0952.5152.9346.4446.8747.27
New York58.4441.56Democrat57.5658.0058.4440.8441.2441.65
North Carolina48.2651.74Republican47.8848.2548.6451.3651.7352.09
Ohio44.2955.71Republican43.9344.3244.7055.3755.7456.08
Pennsylvania48.8851.12Republican48.4648.8049.1550.7051.0451.39
South Carolina42.3557.65Republican41.9242.3542.7857.2257.6558.04
Texas44.8555.15Republican44.3944.7545.1454.6455.0355.40
Utah36.9063.10Republican36.4036.9637.5262.7563.2163.66
Virginia52.5947.41Democrat51.9552.3352.6946.7747.1747.61
Washington59.3140.69Democrat58.5159.0059.4839.9840.4840.97
Wisconsin49.3250.68Republican48.9549.3149.6750.3050.6751.00

For states with insufficient data, and therefore not included in this prediction, I assume their electoral votes will go to the party projected by expert predictions from Sabato Crystall Ball, as shown in the map below. there are no discrepancies between my predictions and those from Sabato’s Crystal Ball, and I predict that all swing states will vote Republican.

Therefore, my final prediction for the electoral college vote distribution by party is shown in the map below:

Table 3: Total Number of Electors by Party (2024)
PartyTotal Electors
Democrat226
Republican312

I predict that the Democratic Party will win the national two-party popular vote with a share of 56.76% compared to 43.24% for the Republican Party. However, despite this popular vote advantage, I anticipate the Democratic Party will lose the electoral college vote, receiving 226 votes compared to the Republican Party’s 312 votes. This would result in the Republican Party winning the Presidency and Vice Presidency.