Week 5: Who Votes, and How Do They Vote? An Overview of Demographics

ShuXin Ho

2024/10/06

Election Blog

Week 5: Who Votes, and How Do They Vote? An Overview of Demographics

In an increasingly polarized nation, or as Lynn Vavreck coined it, “calcified,” demographics have become more predictive in understanding voting patterns.

Who Votes?

Not every single citizen is registered to vote, and not every registered voter will turnout. The graph below shows the voter turnout in different states in the presidential elections from 1980 to 2020.

Individuals with higher education levels are more likely to vote because education provides information about the democratic process and instills a sense of civic virtue (Wolfinger & Rosenstone, 1980; Shaw & Petrocik, 2020). Older voters, especially retirees, are also more likely to participate due to their interest in preserving government programs like Social Security (Shaw & Petrocik, 2020). Rosenstone & Hansen (1993) also suggest that white and wealthier demographics are more likely to engage in politics and voting generally.

How Do They Vote?

Political scientists widely agree that party affiliation is the most significant predictor of voting behavior, though this piece of information is not always available due to different states’ data collection policies and voters’ choice to disclose themselves as independent. While demographic factors like education level, age, gender, race, and income also play a role, they are generally seen as secondary to party identification.

To explore how well demographic factors predict voting behavior, I built both logistic regression and random forest models using the American National Election Studies (ANES) dataset in 2020. My analysis focuses on core demographics like age, gender, race, education, and income to predict whether someone is likely to vote for the Democratic or Republican candidate in the U.S. presidential election. By comparing the performance of these two models, I aim to identify which method better captures the relationship between voter demographics and vote choice. The logistic regression model offers a more interpretable framework, while the random forest model capture non-linear interactions between explanatory variables to achieve greater predictive accuracy.

PredictorEstimateStd. ErrorZ valueP-value
(Intercept)3.04668940.38793007.85371000.0000000
age30-290.26054950.19325601.34820900.1775912
age40-490.28149130.19718681.42753620.1534254
age50-640.47908220.18043522.65514860.0079273
age65-740.25132140.21006001.19642660.2315301
age75+0.28736450.25030411.14806140.2509432
ageBelow 180.46890210.32989021.42138850.1552038
genderFemale-0.02324850.0968415-0.24006700.8102783
raceBlack non-Hispanic-1.84330060.2820778-6.53472370.0000000
raceHispanic-0.61791150.1828442-3.37944240.0007263
raceOther or multiple races, non-Hispanic0.03055570.16614500.18390990.8540842
educationHigh school-0.07522610.3247652-0.23163210.8168238
educationSome college-0.08113670.3121902-0.25989530.7949446
educationCollege+-0.85791100.3130217-2.74073980.0061301
income7-33 percentile-0.12644120.1715965-0.73685190.4612124
income34-67 percentile0.14734290.14280701.03176240.3021834
income68 to 95 percentile0.11063220.15206750.72752070.4669070
income96 to 100 percentile-0.20140690.2263415-0.88983660.3735536
religionCatholic-0.21368320.1241851-1.72068270.0853084
religionJewish-0.49827950.3542658-1.40651320.1595718
religionOther-0.56809560.1153176-4.92635560.0000008
attend_churchAlmost every week - often-0.20957270.1842234-1.13760130.2552870
attend_churchOnce or twice a month-0.20239070.2035687-0.99421300.3201192
attend_churchA few times a year - seldom-0.28083820.1827159-1.53702110.1242881
attend_churchNever-0.76534640.1438191-5.32159160.0000001
work_statusNot employed-0.15168650.1792080-0.84642680.3973147
work_statusRetired0.10272620.15538200.66112010.5085353
work_statusHomemaker0.18503550.22232380.83227960.4052511
work_statusStudent-1.43887410.5425110-2.65224840.0079958
party_identificationIndependent-2.49645010.1111003-22.47023380.0000000
party_identificationNo preference; none; neither-2.37689681.4335381-1.65806320.0973047
party_identificationOther-2.24167180.2383706-9.40414390.0000000
party_identificationDemocrat-5.26930010.1612323-32.68142160.0000000

In the logistic regression model, several demographic predictors show statistically significant relationships with vote choice. For instance, Black non-Hispanic voters are 84.17% less likely to vote Republican than White non-Hispanic voters, holding other variables constant. Consistent with the literature, people who think of themselves as Democrats are 99.49% less likely to vote Republican than those who identify as Republicans. Students are 76.28% less likely to vote Republican, whereas homemakers are 20.33% more likely to vote Republican, compared to employed individuals. Those who attend church less frequently are significantly less likely to vote Republican compared to those who attend every week regularly. Interestingly, variables like age, gender, education level, and income do not have any meaningful impact on vote choice in this model, as their p-values are above the typical 5% significance level.

The overall accuracy of the logistic model for in-sample predictions is 85.79%, while its out-of-sample accuracy is 86.23%, showing fairly high accuracy.

The random forest model is consistent with the logistic regression model in showing that party identification, race, and church attendance are important predictors for an individual’s vote choice. However, the model rated age as fairly important and work status as less important, which differs from the results of the logistic regression model.

For the random forest model, the in-sample accuracy is 84.52% whereas the out-of-sample accuracy stood at 84.77%—slightly lower than the logistic regression model, suggesting that the random forest method may have a poorer predictive performance when considering complex interactions between demographic factors with low model interpretability.

Narrowing down states of interest using expert predictions

As seen in the map above containing expert predictions by Sabato’s Crystal Ball, most states are already categorized as likely or safe for one party. For simplicity, I’ll assume that all electoral votes from these states will go to their predicted party. Although Maine and Nebraska have a different system (allocating electoral votes by congressional district), I will assume their overall votes cancel each other out, as Maine leans Democratic and Nebraska is solidly Republican.

Therefore, I will focus my state-level demographic analysis on the seven battleground states: Arizona, Georgia, Michigan, Nevada, North Carolina, Pennsylvania, and Wisconsin.

Using Pennsylvania as an example

The following plots display the demographic distribution in the state of Pennsylvania as an example, using its 1% voterfile data.

Comparison of demographic data in all battleground states

The following plots compares the demographic distribution in all seven battleground states: Arizona, Georgia, Michigan, Nevada, North Carolina, Pennsylvania, and Wisconsin.

Code developed with the assistance of ChatGPT.