Election Blog

Week 9: Final Election Prediction

Model formula: $$pv2p_t = \beta_0 + \beta_1\cdot{Economy_t} + \beta_2\cdot{Polling_t} + \beta_3\cdot{Demographics_t} + \beta_4\cdot{Incumbency_t}$$

$Economy_t$: As Sides & Vavreck (2013) emphasized, fundamentals, such as the state of the economy, play a stronger role than campaign dynamics in determining election outcomes. This is supported by Achen & Bartels (2017), who argued that voters often vote retrospectively based on the economy.

I ran a random forest model to identify the most important economic indicators influencing vote share. To account for historical trends effectively, I incorporated data from 1980 onward, the beginning of the Reagan era, when election factors in significant realignments in American politics, so my model can reflect current ideological divides.

For my super learning model, I selected the five most predictive economic indicators: unemployment rate, GDP growth rate, S&P 500 volume, consumer sentiment index, and RDP growth in the second quarter of the election year.

$Polling_t$: Gelman & King (1993) highlighted that polling closer to the election tends to be more accurate, as it reflects voters’ settled and informed preferences. Tien & Lewis-Beck (2017) also similarly argued that long-view (historical and theoretical) models align closer with the actual popular vote result.

In my model, I used the mean FiveThirtyEight poll averages from three months before the election day and the final poll average immediately before the election. I also obtained Gallup presidential job approval ratings to calculate mean net approval since June and latest net approval ratings in the same manner, informed by Abramowitz’s Time-for-Change Model.

$Demographics_t$: According to Kim & Zilinsky (2024)’s partisan identification is the strongest and most stable predictor of vote choice. Consistent with this finding, my random forest model reveals that party identification outperform other demographic variables in predictive power.

Therefore, I include partisan affiliation among registered voters as a demographic predictor in my model.

$Incumbency_t$: Incumbent presidents are typically at an advantage, be it through name recognition, public attention through media coverage, headstart in campaigning without the concern of primary elections, and power for pork-barrel spending. In my model, I do not consider Kamala Harris as an incumbent president (even though she is the current vice president), but I take into account that she is from the incumbent party.

$AirWar_t$, $GroundGame_t$, and $Shocks_t$: After analyzing these variables in my previous blog posts, I found no significant interactions between these dynamic, volatile campaign elements and the actual election outcome. Therefore, I exclude them from my final prediction model.

I use super learning to develop a weighted ensemble where weights are determined by out-of-sample performance of each OLS models with different combination of variables. The tables below show the in-sample and out-of-sample MSEs. The large out-of-sample MSE for the economy model in 2020 is likely due to the outliers in economic indicators as a result of COVID-19, whereas that in 2008 can be attributed to the global financial crisis.

The variation in weights demonstrates the model’s adaptability based on each election’s specific economic and political context. For example, in 2020, the economic data was weighted less heavily, as COVID-19 impacted economic indicators without drastically affecting vote share. In 1992, Clinton’s campaign famously emphasized, “It’s the economy, stupid,” and economic models might have underemphasized public sentiment about Bush’s perceived lack of connection to economic hardship, something that polling picked up through indicators like approval ratings and direct questions about voter preferences.

Table 1: In-Sample and Out-of-Sample MSEs with Ensemble Weights by Year (National)
Year	In-Sample Economy MSE	In-Sample Polling MSE	In-Sample Demographics MSE	In-Sample Ensemble MSE	Out-of-Sample Economy MSE	Out-of-Sample Polling MSE	Out-of-Sample Demographics MSE	Out-of-Sample Combined MSE	Out-of-Sample Ensemble MSE	Economy Weight	Polling Weight	Demographics Weight	Combined Weight
2020	4.837	2.8758	21.28	5.0185	23175.4122	0.7734	4.2110	65497.975	0.0000	0.0617	0.4349	0.4644	0.0390
2016	12.297	2.8880	21.53	12.2968	0.0083	0.4688	0.6188	378.679	0.0083	1.0000	0.0000	0.0000	0.0000
2012	12.148	1.8683	20.99	2.5631	4.8623	19.6347	8.1441	23.688	0.0000	0.2347	0.1656	0.2160	0.3837
2008	3.710	1.7232	19.31	1.7556	2049.0252	33.5689	31.8964	393.809	0.0000	0.1841	0.2782	0.2780	0.2596
2004	12.268	2.7405	21.20	4.6760	0.3389	2.5818	5.6484	9.561	0.0000	0.2792	0.2241	0.3169	0.1798
2000	12.189	2.9138	21.57	5.7391	1.9799	0.3773	0.0064	378.679	0.0000	0.3395	0.3189	0.3057	0.0359
1996	8.265	2.7477	19.92	2.0482	49.3868	4.6406	24.4830	27.598	0.0000	0.1520	0.7434	0.0531	0.0515
1992	7.884	2.9239	20.56	2.9239	66.0277	0.0235	14.7096	511.620	0.0235	0.0000	1.0000	0.0000	0.0000
1988	11.576	2.5760	20.11	0.0000	9.1768	4.8048	21.2632	2.737	2.7368	0.0000	0.0000	0.0000	1.0000
1984	4.109	2.8966	15.94	0.7683	140.7379	0.8586	78.8061	27.595	0.0000	0.1113	0.3089	0.1424	0.4374
1980	8.975	0.9999	21.36	4.1401	515.1202	26.4980	3.9181	4381.167	0.0000	0.2166	0.3219	0.3518	0.1098

Because the current vote shares do not add up to 100%, I rescaled them to 100%. Overall, I predict that the Democratic Party will receive 56.76% of the national two-party popular vote share, with a 90% prediction interval between 56.5% and 54.55%.

Year	Party	Predicted Vote Share (%)	Winner
2024	Democrat	56.76	TRUE
2024	Republican	43.24	FALSE

Model formula: $$pv2p_t = \beta_0 + \beta_1\cdot{pv2p_{t-1}} + \beta_2\cdot{pv2p_{t-2}} + \beta_3\cdot{Economy_t} + \beta_4\cdot{Polling_t} + \beta_5\cdot{Incumbency_t}$$

In my state-level model predicting the Democratic Party’s popular vote share, I use a similar set of variables as in my national-level model. This includes economic indicators, polling data, demographics, and incumbency status.

$pv2p_{t-1} + {pv2p_{t-2}}$: I incorporate the Democratic Party’s vote share from the previous two elections in each state to account for the specific political climate and voter sentiment at the state level.

$Economy_t$: Including the $Economy_t$ variable at the state level is tricky as it’s unclear whether voters prioritize sociotropic concerns (national economic indicators) or individual concerns (state economic indicators). To explore this, I compared the significance of these two types of indicators in predicting vote share using a mixed-effects model. Accounting for each state’s baseline political preference, higher state unemployment rates are associated with a significant decrease in the incumbent party’s vote share. This supports the theory that voters are responsive to local economic conditions. Therefore, I included state unemployment as an economic indicator in my model.

Variable	Estimate	Std. Error	DF	t-value	p-value
(Intercept)	49.5386	5.1652	452	9.5909	0.0000
state_gdp	0.0336	0.2343	452	0.1432	0.8862
state_unemployment	-1.6802	0.5862	452	-2.8661	0.0043
natl_gdp	-0.2394	0.3273	452	-0.7313	0.4650
natl_unemployment	0.2547	0.4671	452	0.5452	0.5859
natl_consumer_sentiment	-0.0357	0.0465	452	-0.7692	0.4422

$Demographics_t$: Due to lack of a time series data of voter’s party registration and identification at the state-level, I omit this variable from my state-level analysis.

I use super learning to develop a weighted ensemble where weights are determined by out-of-sample performance of each OLS models with different combination of variables.

Table 2: In-Sample and Out-of-Sample MSEs with Ensemble Weights by Year (State)
Year	In-Sample Lagged Vote MSE	In-Sample Economy MSE	In-Sample Polling MSE	In-Sample Combined MSE	In-Sample Ensemble MSE	In-Sample Lagged Vote MSE	Out-of-Sample Economy MSE	Out-of-Sample Polling MSE	Out-of-Sample Combined MSE	Out-of-Sample Ensemble MSE	Lagged Vote Weight	Economy Weight	Polling Weight	Combined Weight
2020	37.34	62.01	6.380	1.993	3.409	9.625	132.64	7.553	734.79	2.0564	0.0773	0.0000	0.8536	0.0691
2016	36.62	57.40	6.352	1.533	3.487	16.593	70.62	6.389	27.37	6.1389	0.1341	0.0000	0.8659	0.0000
2012	36.63	57.28	5.894	2.074	2.864	17.305	190.04	17.497	5589.23	2.2054	0.0000	0.0000	0.9503	0.0497
2008	35.13	61.54	6.490	1.889	4.790	34.722	52.33	10.115	21.48	9.7409	0.0000	0.0857	0.9143	0.0000
2004	37.21	59.71	6.846	1.951	4.529	10.919	58.25	1.541	14.39	0.9492	0.1959	0.0000	0.8041	0.0000
2000	35.08	57.67	6.562	1.795	3.912	34.448	68.55	4.673	26.02	3.6306	0.1554	0.0000	0.8446	0.0000

Similar to what I did for the national two-party popular vote share, I rescaled the predicted vote share to 100%. The columns in grey show the original values that do not add up to 100%.

State	Rescaled Predicted Democratic Vote Share (%)	Rescaled Predicted Republican Vote Share (%)	Winner	Lower Bound (D)	Mean (D)	Upper Bound (D)	Lower Bound (R)	Mean (R)	Upper Bound (R)
Arizona	47.73	52.27	Republican	47.20	47.58	47.94	51.75	52.11	52.48
California	62.42	37.58	Democrat	61.46	61.97	62.52	36.85	37.32	37.80
Colorado	55.13	44.87	Democrat	54.46	54.86	55.27	44.17	44.65	45.14
Florida	45.57	54.43	Republican	45.00	45.39	45.78	53.89	54.22	54.57
Georgia	48.23	51.77	Republican	47.83	48.19	48.53	51.31	51.72	52.09
Indiana	39.78	60.22	Republican	39.40	39.86	40.34	59.88	60.33	60.76
Maryland	64.66	35.34	Democrat	63.34	63.99	64.64	34.27	34.97	35.56
Massachusetts	63.93	36.07	Democrat	62.70	63.29	63.91	35.11	35.71	36.25
Michigan	49.44	50.56	Republican	49.01	49.35	49.70	50.12	50.48	50.82
Minnesota	52.08	47.92	Democrat	51.60	51.96	52.32	47.38	47.81	48.22
Missouri	41.25	58.75	Republican	40.89	41.34	41.79	58.43	58.86	59.29
Montana	39.01	60.99	Republican	38.54	39.04	39.57	60.59	61.03	61.49
Nebraska	39.67	60.33	Republican	39.32	39.83	40.32	60.07	60.57	61.02
Nevada	48.85	51.15	Republican	48.32	48.66	49.01	50.60	50.95	51.29
New Hampshire	51.66	48.34	Democrat	51.01	51.44	51.86	47.69	48.14	48.56
New Mexico	52.84	47.16	Democrat	52.09	52.51	52.93	46.44	46.87	47.27
New York	58.44	41.56	Democrat	57.56	58.00	58.44	40.84	41.24	41.65
North Carolina	48.26	51.74	Republican	47.88	48.25	48.64	51.36	51.73	52.09
Ohio	44.29	55.71	Republican	43.93	44.32	44.70	55.37	55.74	56.08
Pennsylvania	48.88	51.12	Republican	48.46	48.80	49.15	50.70	51.04	51.39
South Carolina	42.35	57.65	Republican	41.92	42.35	42.78	57.22	57.65	58.04
Texas	44.85	55.15	Republican	44.39	44.75	45.14	54.64	55.03	55.40
Utah	36.90	63.10	Republican	36.40	36.96	37.52	62.75	63.21	63.66
Virginia	52.59	47.41	Democrat	51.95	52.33	52.69	46.77	47.17	47.61
Washington	59.31	40.69	Democrat	58.51	59.00	59.48	39.98	40.48	40.97
Wisconsin	49.32	50.68	Republican	48.95	49.31	49.67	50.30	50.67	51.00

For states with insufficient data, and therefore not included in this prediction, I assume their electoral votes will go to the party projected by expert predictions from Sabato Crystall Ball, as shown in the map below. there are no discrepancies between my predictions and those from Sabato’s Crystal Ball, and I predict that all swing states will vote Republican.

Therefore, my final prediction for the electoral college vote distribution by party is shown in the map below:

Table 3: Total Number of Electors by Party (2024)
Party	Total Electors
Democrat	226
Republican	312

I predict that the Democratic Party will win the national two-party popular vote with a share of 56.76% compared to 43.24% for the Republican Party. However, despite this popular vote advantage, I anticipate the Democratic Party will lose the electoral college vote, receiving 226 votes compared to the Republican Party’s 312 votes. This would result in the Republican Party winning the Presidency and Vice Presidency.

Week 9: Final Election Prediction

ShuXin Ho

2024/11/03

Election Blog

Week 9: Final Election Prediction

Week 9: Final Election Prediction

ShuXin Ho

2024/11/03

Election Blog

Week 9: Final Election Prediction

Part 1: National Two-Party Popular Vote Share

Part 2: Electoral College Vote Share