Regression Analysis: Demographics and 2008 Election Results

This article began as a little game I was playing to see if I could derive election results from a regression analysis of basic demographic variables. Plugging back in the relation derived in my strongest one-variable regression (see below for what that variable was), and retrodicting election results at the state level, I underestimated Obama's returns by 19 votes. Oddly, plugging back in the second strongest variable (state-level population density), I overestimated Obama's returns by only 1 vote.

I didn't expect any big demographic surprises: I think we all knew well before the election that Manhattan was more likely to vote for Obama than a Mormon county in rural Idaho. Furthermore, if what you're looking for is some algorithm for predicting presidential elections in general, then a) what Nate Silver does over at is more useful because he based his predictions on pre-election poll outcomes and b) no one knows what the landscape will look like in 2012. The GOP and Democratic parties are brands that get different spokespeople every four to eight years, and if circumstances give the GOP a better spokesperson and better conditions in '12 then all these trends could be out the window. (For some idea of how middle voters shifted, see Andrew Gelman's Democrat-to-Democrat scatter-plot comparisons between elections to see how the brand changes with conditions.

Because we don't know what the 2012 zeitgeist will mean in terms of Americans' political affinity, we can only base our predictions on demographics. Still, there are many Americans who are "ethnic" Republicans or Democrats and would probably physically die if they voted for the other party, regardless of their values or demographics. Therefore, I think the real value of my analysis is for Republicans who want to see the folks in the middle who voted for Obama, and whether there will be more of them or less of them four years from now.

This is my longest blog post ever, so if you're easily bored skip ahead to the end ("What This Means for the GOP"). But I think the analysis is interesting on its own merits. The variables are listed below in decreasing order of importance.


I looked at returns in terms of percent vote for Obama at the state and county levels against six individual demographic factors: population density, racial make-up, religion (broken down by faith), income, education, and age. I did not look at individual voter poll data, only returns at the state and county level. Technical caveats for all results that you can skip if you want to get to the good stuff:

- I didn't have county-level data for every variable and the county-level correlations were often weaker anyway, so the multiple regression with all variables was performed at the state level. If you know where I can find county-level data for any variables where I don't already have it, please comment or email.

- Returns are expressed as percent of votes for Obama of all votes; I don't make the mistake that Obama + McCain = 100%, although I do assume in my discussions of left vs. right movement of voters that the percentage of votes for parties further left than Democrat are negligible.

- State-level: I pooled all Nebraska's returns; Nebraska reports as three separate districts

- County-level: I excluded Alaska from all county-level analyses. Alaska doesn't have counties and does not report returns by any sub-state level jurisdictions; Louisiana has parishes, but does report returns by them.

- County-level: I excluded Kalawao County, Hawaii; no returns reported


I had to rewrite the article partly because I was expecting population density, and not religion, to show the strongest correlation with voting. I was intrigued to find that I was wrong.

Most data for this part of the analysis came from the Pew Forum on Religion and Public Life Religious Landscape Survey 2008(1). The Pew paper provides data only at the state level, and not for Alaska and Hawaii. Consequently I only did the analysis at the state level, and for Alaska and Hawaii I estimated the figures from other sources(2).

I included any religion which has a share of at least 10% statewide in at least one state. This excluded all non-Christian religions. Religions which I did include were evangelicals, mainline Protestant denominations (reported as a single figure), black churches (reported as a single figure), Catholic, and Mormon. I also looked at the correlation between the total population of all these religions in a state and the percent voting for Obama in that state, as well as one combination between two religions (which I will discuss further).

Evangelicals had the strongest correlation of any individual religion (described by a second-order polynomial), but there was still a substantial outlier (low Obama returns, but also low evangelism). This was Utah. Indeed, although only 3 states have a Mormon population of greater than 10% of the overall population, they were all outliers on the evangelical curve (low evangelism, low Obama returns), and having only 3 states out of 50 with Mormons above 10% can't give a good signal on the Mormon-only curve. Because evanglism and Mormonism typically don't occur in the same state I created another group, "Evangelicals + Mormons", and found that it had the srongest correlation of all (R=0.831), described by a logarithmic function:

The logarithmic curve for Evangelical + Mormon was only slightly stronger than a linear relationship (0.828). This combined index is the strongest relationship of any variable I looked at, but in the end did not predict the electoral vote as well as population density.

What's immediately interesting is that some religions had strong negative correlations (evangelicals and Mormons, especially when taken together), some were flat or very weak (mainline Protestant denominations and black churches), while Catholic was actually positively correlated with voting for Obama (linear relationship with R=0.537). It's been known at least in the last two elections religion (defined by service attendance) was a good predictor for individuals voting Republican; here we have supporting data at the state level.

That said, there are no majority Catholic or black church states to test whether it's only religious dominance (and not the political aggressiveness of any specific religion) that affects voting patterns. I tried to create an index of religious dominance for each state by multiplying the standard deviation of religions in the state by the highest percentage of any one religion. This index had a negative correlation similar to or slightly weaker than Mormonism alone (which are only a majority in one state and only above 10% in 3). Therefore it appears to be specifically evangelicalism and Mormonism that are most strongly correlated with voting patterns.

It's worth pointing out in the discussion of religion and voting that there's an interesting article called "McCain's Atheist Problem" at stating that McCain trailed Obama in the atheist vote by 25% (and that's about 2.5% of the total electorate; a campaign strategist would tear his hair out if his candidate did something to lose 2.5% of the electorate). Anecdotal evidence: the day that Palin was announced, that number went up 0.00001% when I switched over myself.


My first inspiration for this investigation was an observation on a network news website (which I can't now dig up) months prior to the election that population density is a major contributor to political leanings (same speculation here on FOXNews in 2004).

Of course, population density was only the second-best predictor at both the state and county level for percentage of Obama votes by county, although retrodicting using the regression equation did give a better electoral outcome than the evangelical+Mormon index. Data came from the 2007 Census estimates(3). Using an exponential model for both state and county levels showed a power function trend with R of 0.578 and 0.427 respectively.

Looking at fine-grained election maps, you can usually guess where the denser population is based on the blue vs. red returns:

However, sometimes the discontinuities have other causes. For example, notice the blue spots in the middle of South Dakota:

Every single one of those six mid-state blue counties is occupied mostly or totally by an Indian reservation, and the reservation counties all have densities lower than the state average and similar to surrounding counties, meaning that all other things being equal, they should be as red or redder. This is not surprising, since historically, Native Americans have voted Democrat. Similarly, you can compare the county-level voting results in Mississippi to the percentage of African-Americans in Mississippi counties:

Neither of these observations is surprising, but they have to be taken into a account in any model predicting American voting patterns based on demographics; this may also explain why the R for the county-level population density regression is lower than for the state-level, because states are much bigger, so race and other factors can differ much more within a county than within a state. Unfortunately, still in 2008 there's an income gap between whites and non-whites, so it's hard to disentangle race and income as influences on voting.


I thought the tendency of non-white Americans to vote Democrat would be more pronounced in this election given that the Democratic candidate was not white. Census technicality: my definition of non-whiteness at the state level includes Hispanic ethnicity/white race population. Data again comes from U.S. Census estimates for 2007 population(3). At the state level the trend is best described by a second-order polynomial with R=0.371.

Interestingly, at the county level but not including Hispanic whites as non-white, R=0.4 for a linear trend between non-whiteness and voting for Obama. (The reason I didn't include Hispanic as non-white at the county level is because of the annoying way the US Census tracks this data; if you know where I can get this data more easily, let me know; otherwise feel free to cut and paste for 3,115 counties yourself.) In any event, non-whiteness alone, at either level, did not predict outcome as well as population density.

4. Per Capita Income

The GOP used to be seen as the party of the rich, but following the Southern Strategy blue collar workers began voting Republican, becoming a force to be reckoned with in 1980s. County income data came from 2005 census estimates(4). I expected a polynomial trend with fewer voting for Obama on both ends, given that many upper-income people vote GOP along with rural lower- and middle-class whites (here's one source with links to more, disabusing us of the notion that these days it's only lower-class whites from rural areas voting GOP). Indeed the best fit for the county and state trends was a sixth-order polynomial with R of 0.290 and 0.552 respectively.

5. Average Years of Education

In the 2008 returns Albany County, Wyoming stands out blue against a background of red counties in the rest of Wyoming, and it's not a reservation. Perhaps not coincidentally it's where you'll find the University of Wyoming.

Average years of education is doubtless higher in rural counties with large post-secondary institutions than without (as in many land-grant state universities, like the University of Wyoming) but large American cities also differ considerably in their level of education based on the industries those towns depend on (the first two are Seattle and San Francisco - not surprisingly, since they're aerospace-tech-biotech central). The data for this regression come from the Manhattan Institute's 2001 education survey(5), which corrects some shortcomings from the Census data.

The correlation was weak; the best fit was with an exponential function with an R of only 0.067. One caveat for this variable: high school graduation rate is certainly not the best indicator of the average education of a state's adult population, given the mobility of modern American workers. If you work in DC or New York or Los Angeles, what percentage of your colleagues are actually from there? Percentage of state population with Bachelor's degrees would be much better, but I couldn't find this data. The disconnect between the two variables occurs because states with large influxes of highly educated workers may not have good native education systems. For example, Seattle and San Francisco may both have the most highly educated workforces, but Washington and California have graduation rates of 70% and 68% respectively, which in the bottom half of U.S. states.

6. Median Age

Referring to Guizot's (not Churchill's) famous quote about the proper age for liberals and conservatives, the college town effect could also be partly a result of age. There are also parts of the country whose median ages diverge considerably from the national average, for example the retirement haven Florida, and the large-family haven of Utah. Data is from the 2006 U.S. Census.(6)

Before I did the analysis I expected that if there were a strong relationship it would be polynomial, accounting for greater conservatism in late middle-age than in under-30 or post-retirement voters; one individual is more likely to vote Republican at 50 than 20, and the same person might be likely to vote Democrat at 70 than at 50 probably because of concerns about fixed income and healthcare. However, my initial guess turned out to be wrong (only a sixth order polynomial had a greater R but it was nonsensical in that it predicted a < 0% turnout for Obama for some median ages). The best R I obtained was 0.360 for a power function trendline, although excluding the outlier of D.C. gave an R of 0.452.


Once I had looked at each of these demographic variables in isolation I ran a multiple regression with all of them. I did not have all demographic variables by county, so the multiple regression is at the state level. I included separate variables as follows:

- Population density
- % non-white, including Hispanic white as non-white
- % Mormon + evangelical composite
- % Mainline Protestant churches
- % Black churches
- % Catholic
- Median household income
- Education (graduation rate)
- Median age

Multiple regression with these variables returned an R of 0.893 (compare to evangelism + Mormonism alone at 0.831 and population density alone at 0.578).

There are a number of factors that clearly influenced the outcome and that, while they could not have been predicted from demographic first principles, still were likely to have had an effect (BLACK SWANS). These include: home state voter loyalty, advertising, and campaign visits. On the second and third points, the campaign strategists do not believe demographics are electoral destiny or they wouldn't waste money sending the candidates around the country.

Another interesting question would be to try to detect a media effect for greater metropolitan areas that often lie across state borders. That is to say, is there a narrative created by local media afiliates which differs from that of the national media, and shows itself in vote returns? One way to quantify it would be to look at the difference in the number of positive vs negative stories run about a candidate in City X vs in the mainstream media and in local media of City Y, then compare the outcome in surrounding counties in X and Y to what to what your model otherwise predict. At they looked at a similar question, the effect of one state on surrounding states in the context of Virginia's effect on North Carolina.

Events earlier in the campaign also surely played a role. It seemed from the media buzz that social conservatives and evangelicals in particular were energized by the selection of Sarah Palin for Republican VP, and they were likely energized by Mike Huckabee's candidacy during the primary season as well. Of course, I could be wrong about the correlation, but take a look at this map, which shows the difference in voting from 2004 to 2008 (i.e., red means voted more GOP than last time).

In the general blue headwind blowing across the country from out of the imploding corridors of Lehman Bros (to name only one such low pressure system), it's hard to look at this map and not notice the this-time-even-redder stripe from West Virginia, through Kentucky and Tennessee and Arkansas to Oklahoma, bucking the trend. This is also the region with the highest rates of evangelism.

Finally, in my brief career analyzing elections, I've concluded there are just some counties that are liberal or conservative because they just always were and that's that. This observation doesn't help my model but my favorite is Sioux County, Iowa. I'm not the first person to notice this particular outlier and I've scratched my head over it since. They're not Mormons, there's no certain industry concentrated there, there's no televangelist based there, but they've consistently voted at least above 80% GOP for the last three elections, much higher than Iowa as a whole, and almost always more than 10% higher than the highest neighboring county. It would be one thing if they did it once or twice but it's been consistent.

While 2012 will be a different election, to some degree we non-professional- campaign-strategist-mortals who nonetheless care about the outcome of elections (we're called "citizens") might be able to adjust models like these to see for ourselves what's likely to happen, without waiting for the campaigns or the media to tell us.

Final Speculation About Population Density's Effect on Politics

The following is speculation, and I have no idea how to render it quantitatively or even how to get data supporting it; a demographer might. The question of how religion affects voting habits isn't that interesting because it appears straight ahead. The more curious one is the mechanism by which population density correlates with voting habits. It's a truism that if you rent, when you buy a home you should "be prepared to see your political ideology swing violently to the right" (courtesy The Onion). But I don't think that accounts for the entire phenomenon, since there are plenty of liberal homeowners on the coasts.

My theory is that members of the national racial majority (whites) who live in high density areas are more likely to come into social contact with non-whites. Non-whites, by virtue of being in the minority, are already required to critically analyze their own values where they differ from majority culture. Democrats traditionally have been the American labor party and as such appeals more to minorities (even easier to do in this election). This is probably one reason that they have been likely to vote Democrat, regardless of whether they live in densely populated Washington D.C., or on the open prairies of Shannon County, South Dakota.

Republicans have appealed to red-meat cultural conservative values associated with America's Christian, white cultural majority. At the same time, urban whites who frequently come into contact with non-whites similarly become more circumspect about their own (otherwise less-questioned) cultural assumptions and as a result urban whites are less likely to be motivated by those red-meat values. The density-leads-to-interethnic-contact-and-erodes-cultural-conservatism theory also explain why, although 38 of 82 counties in Mississippi are less white than all of the 58 counties in California, the Mississippi counties vote Democrat at substantially lower rates. The graph below is a stark illustion - in Mississippi, county non-whiteness very closely tracks voting, with an incredible linear relationship with R=0.973. In California it doesn't track as closely (linear relationship R=0.536, slightly better as third degree polynomial with R=0.561); and California counties clearly more often vote more Democratic. Why?

My theory is that this difference results from the higher degree of intermixture between ethnicities in California relative to Mississippi. With no history of segregation or slavery, the economic differences between whites and non-whites in California are less pronounced, and people mingle more; consequently, in California a white person is likely to be more circumspect about his or her own assumptions, just as non-white people elsewhere, whose social assumptions are challenged by having to adapt to mainstream American culture. Interracial marriage rates would be one indirect way to measure the level of inter-mingling. I would bet that most of the California counties have higher interracial marriage rate than the Mississippi counties (even though assuming independent assortment you would assume it to be higher in those first 38 Mississippi counties than in any county in California). Even though many of the Mississippi counties are more than 50% non-white, probably still today there probably aren't as many cases of whites living next door to non-whites, or working together, or spending leisure time together, as in a 25% non-white California county. I'm not a demographer or statistician, so I don't know if there's a better way to directly measure the degree of intermingling between whites and other ethnicities or if this data already exists, but it would also start to explain other demographic tendencies - for example, people are less religious in cities, especially multiethnic ones, and that port cities dependent on foreign trade and tourism tend to be more tolerant of different social norms.

What This Means For the GOP

In the US, evangelical and Mormon populations are not growing well in the fast-growing, ethnically mixed, well-educated, and economically strong cities on the coast. The number of atheists in the US is growing (6% of over-30, 12% of under-30 are atheists).(7) Population density, and the resultant admixture of people from different backgrounds, is increasing. The proportion of white voters in the US is dropping. Per capita income will (we hope) continue growing. Hopefully, Americans' average education will continue to increase as the economy increasingly depends on innovation in technical fields. The young voters who helped sweep in Obama will doubtless become more conservative as they age, but whether they will ever become as culturally conservative as their parents is in question. Note that I didn't deliberately set out to pick six demographic variables that are all changing in the Democrats' favor; I picked the six I thought were most clearly relevant to the election. This is not good news for the current incarnation of the GOP, either in California or anywhere else.

At a time when the global credit crisis is causing many inside and outside the US to doubt whether markets are the best mechanism to allocate wealth and promote growth, the GOP cannot afford to allow the Religious Right to continue steering. The market/strong-defense/evangelical alliance is broken, and one of the partners in that alliance has to go.

May I make the unsurprising suggestion to rank-and-file Republicans that you insist on throwing out the partner that, for the last eight years, has revealed itself as a kind of inept religious statist, that can sometimes win elections but has no real governing principles and in the end can't govern its way out of a wet paper bag. Until you do, I'm over here with the Libertarian Party, and a lot of other former Republicans don't know what to do, but they'll be damned if they'll continue voting on the basis of armband religion, as Kathleen Parker put it. The Southern Strategy is dead, it's 2008, and we need new ideas: we need somebody to speak up for the high tech economy that is America's strong suit in this new world, we need somebody to recognize the economic threat-cum-opportunity that is India and China, we need someone to take a principled stand on human rights abuses by our supposed allies and ourselves, and we need someone to show leadership on energy and market reforms and not just let lobbyists write legislation that benefits not just certain industries at the expense of taxpayers and troops, but certain companies. That our government should govern sounds radical, I know; but as an American, I demand the best government in the world. You want to hear it straight from the capitalist horse's mouth? In The Wealth of Nations Adam Smith said "The proposal of any new law or regulation of commerce which comes from [the owners of businesses], ought always to be listened to with great precaution, and ought never to be adopted till after having been long and carefully examined, not only with the most scrupulous, but with the most suspicious attention. It comes from an order of men, whose interest is never exactly the same with that of the publick, who have generally an interest to deceive and even to oppress the publick, and who accordingly have, upon many occasions, both deceived and oppressed it." From Book I.

We don't need any more cheap grand-standing over time-wasting minor social issues - which is about all the GOP seems to know how to do in 2008 - issues that at best are distractions and affronts to privacy and human dignity (like Terri Schiavo) and at worst threaten American business competitiveness (like restricting stem cell research. Notice all those breakthroughs happening in Asia and not here? Surprise!) The strongest Republican governor in the country right now is the governor of California - a centrist who has denounced California's new gay marriage ban as ridiculous. Too bad he wasn't born in the US or I'd already be selling "Presidator 2012" buttons. Religious Right: you broke the GOP and the rationalists want it back - American demographic trends are your worst electoral nightmare, and they're getting worse for you every day. Go off and form the twenty-first century Republican equivalent of the Dixiecrats and win votes in rural Arkansas and Oklahoma if that's your thing. Frankly, after the last election, it seems to be your only thing.

Sun Tzu said the battle is won or lost before it begins; demography is destiny; and David Brooks says the Religious Right will be able to hold onto the GOP's steering wheel at least until 2012. If Brooks is right, fellow moderates and fiscal conservatives, then the battle is lost, and Obama is already a two-term president. For the sake of the country, I hope he's wrong, because I want at least two real American political parties back in action, competing on the merit of their ideas for the next 219 years of the Republic, just as has happened in the previous 219. The party of ideas, the GOP of Reagan and TR and Eisenhower has a golden opportunity here - but if the withered old hands of the Religious Right keep dragging it back, then it's time to consider defection to the Libertarian Party for 2012, or splitting off into the twenty-first century Bull Moose GOP like Teddy Roosevelt did (and kicked Taft's ass, too). And if that's the plan, then we may already have somebody in the wings.


