The Problem with Stratified Sampling

This week, I’m going to be spending some time discussing some of the problems with current polling methodology.  The numbers we’re seeing in today’s polls may not accurately represent the real shape of the electoral landscape. 

The stratified sampling problem has been almost completely ignored, in favor of more easily digestible problems like the Bradley Effect and cell-phone user undersampling.  Unfortunately, the stratified sampling problem may prove to be more influential than either of those concerns.

I’m going to do my best to keep this post understandable for anyone with an interest in polling, but be forewarned: I’m dealing with a deep-seated technical problem that suffuses the entire polling industry.  If the stratified sampling problem were easy to understand and manage, we wouldn’t be having this conversation.

There's more...

---------------------------------

So first, what is a stratified sampling poll?  We all have some idea of how election polls work: a pollster randomly chooses a number of registered or likely voters, contacts them (usually by phone), and asks who they plan to vote for.

Sound good?

Well, that depends on your definition of “random”.

Purely for the sake of argument, let’s assume the US population is 20% black and 80% white.  Let’s also assume that 95% of black voters plan to vote for Obama, and 40% of white voters plan to vote for Obama.  (I’ll repeat this in a second, for the mathematically disinclined)

Now, let’s assume we poll 100 people and ask them if they plan to vote for Obama or McCain.  We pick these 100 people “randomly”.  That means:

  • There’s a 1 in 5 chance any individual is black.
  • IF that individual is black, there’s a 95% chance they’ll vote for Obama.
  • IF that individual is white, there’s a 40% chance they’ll vote for Obama.

Clearly, it’s very important how many black voters we survey.  A black voter is almost certainly a vote for Obama, but a white voter is more likely to favor McCain by a small margin.  But the chances of us getting exactly 20 black voters and 80 white voters are less than 1 in 10 (.0993 to be precise).  So let’s look at three different scenarios.<!--[if !supportLists]-->

  • Our sample is 30 black voters and 70 white voters.  Then we expect Obama to receive (30*.95) + (70*.40) = 56.5% of the vote.
  • Our sample is 20 black voters and 80 white voters (population average).  Then we expect Obama to receive (20*.95) + (80*.4) = 51% of the vote.
  • Our sample is 10 black voters and 90 white voters.  Then we expect Obama to receive (10*.95) + (90*.40) = 45.5% of the vote.

From this example, we can see that if two different demographic groups vote in very different ways, small errors in the representation of each group within a random sample can have profound effects on the outcome of the poll.

The solution to this problem is to sample demographic groups independently.  This is called stratified sampling.  Instead of randomly picking 100 voters from an 80% white 20% black population, we control for demographic groups.  We pick 80 white voters and 20 black voters.  By making sure our groups match the size of the population, we can curtail the errors we get from oversampling one group over another.

Okay, that was a lot of numbers.  Take a break.  Breathe.  What did we learn?

When we’re faced with demographic groups that vote differently, we should treat each group independently.  The problem is biggest when the difference between how groups vote is biggest.  If blacks and whites vote almost the same, we don’t gain a lot of information by treating them separately.  But if blacks and whites vote very differently, it’s important that we distinguish groups in our polls.

For statistics nerds, the importance of distinguishing groups is directly proportional to the (tetrachoric) correlation between group membership and voting choice.  If there’s a large correlation, it’s important to distinguish.  If there’s a small correlation, the groups are pretty much the same.

---------------------------------

Stratified sampling is the modus operandi of most or all modern polls.  It helps eliminate error created through non-representative sampling techniques.  What pollsters do is to survey individuals in a number of demographic groups, and then assign those demographic groups proportional shares in the electorate.  For instance, a pollster would assume that the black vote accounts for a large proportion of votes in South Carolina (where 30% of the population is black); but a pollster would predict a smaller share of black voters in Iowa (where less than 3% of the population is black).  By modelling the electorate, instead of pure random sampling, pollsters are able to parcel out some unwanted error that would come from uneven demographic distributions in the polling sample.

So stratified sampling is good, right?  It helps us learn more about how voters act in different demographic groups.  It gives us more detailed information.

Yes, it does, but there’s a very real problem with this sort of approach: no one knows the true sizes of the demographic groups.

Sure, the US Census Bureau has race and gender data, but that’s for the population at large.  For polls, we only care about the people who vote on Election Day.  No one knows what voter turnout will look like beforehand.  The best we can do is guess that it will look the same as it did four years ago.  This may not, however, be a reasonable assumption.

Consider this election season.  We know anecdotally that Barack Obama energizes young voters and black voters.  We can make a reasonable assumption that turnout will be higher for both of these groups, but we don’t know how much higher.  There are many, many new registered voters this year, and we’re only starting to get data on how this may change the polling landscape.  Furthermore, while Sarah Palin energizes the conservative base, many conservatives have always been wary of John McCain.  It’s difficult to predict how much of the Republican base will come out to support McCain on Election Day.

This is not a hopeless cause here – pollsters can make educated guesses about the shape of the electorate.  The problem is, there’s no way to be certain whose demographic projections are most accurate.  And as we saw in the example above, even relatively minor shifts in demographic representation can have profound consequences in topline numbers when demographic groups are split in which candidates they prefer.

With about 88% of Democrats supporting Obama and 88% of Republicans supporting McCain, the biggest question is, what will be the relative sizes of Democratic and Republican voting blocs on Nov. 4th?  The correlation between party identification and voting intention is by far the biggest demographic split.  A poll that predicts about equal turnout for Democrats and Republicans will paint a very different picture from a poll that predicts a 38% DEM / 30% GOP split.

Stratified sampling is extremely problematic this election cycle.  Some demographic groups show big correlations with voting intentions.  And no one is really sure what the electorate will look like on Nov. 4th, which voters will come out to cast their ballots.  Without a good model for turnout, our polls aren’t much better than unstratified random samples.

In addition, the margin of error on polls doesn’t include error from incorrect turnout predictions.  Poll margins of error are based solely on the size of a sample.  It’s entirely possible that two polls can disagree by 10 points, and yet represent exactly the same demographic trends.  Look at the example above, again.  When we predict 30% of voters will be black, Obama scores almost 10 points higher than when we predict only 10% will be black.  The two polls follow identical demographic trends.  The only difference, the sole explanation for that 9-point discrepancy, is demographic representation within the sample.

The media has spent considerable time focusing on poll accuracy – on whether voters are lying to pollsters (the Bradley Effect), or on whether some groups are being adequately sampled (cell-phone users).  But there’s one important question that the media keeps dodging, over and over again.

How much could our polls be wrong, simply for predicting the wrong type of turnout on Election day?

Women in summer would like to

Women in summer would like to become beautiful. Everything can grab other's eyes is their best friends.Products make them beauty and confident is their favourite. Look in the street,you can see many different types of make up to show women's personality.

Welcome to the shop, the following is our products, free shipping.

Soccer Shoes Cheap Soccer Shoes Nike Soccer Shoes Adidas Soccer Shoes Nike Soccer Shoes sale Adidas Soccer Shoes sale UGG UGGs UGG Boot UGG Boots UGG Boots Sale Cheap UGG Boots UGG Boots Cheap Women UGG boots ugg boots cardy ugg cardy boots Timberland Timberland sale Timberland boots Timberland boots online Timberland on sale New timberland boots UGG UGG boots UGG boots sale UGG boots short Short ugg Short ugg boots Ugg boots tall Nike Air Nike Air Max Nike Air Max Shoes Nike SB Nike Dunk Nike Dunk SB Nike Dunk SB Shoes Nike Shox Nike Shox Shoes Women Bags Women Bags Sale Women Handbags Women Handbags Sale Women New Bags Cheap Bags Cheap Bags On Sale New women bags New women bags sale New women bags sale online Louis Vuitton Handbags Gucci bags Nike Nike Shoes Nike Shoes Sale Nike running Nike running shoes Nike trainers Nike trainers shoes Timberland Timberland boots Timberland boots sale Timberland boot Timberland boot sale Timberland boots cheap Men timberlands MBT MBT Shoes MBT Chapa GTX MBT Men Shoes MBT Women Shoes Discount MBT Shoes LV Handbags Gucci Handbags Chanel Handbags Chloe Handbags D&G Handbags Dior Handbags Fendi Handbags Hermes Handbags Jimmy Choo Bags Marc Jacobs Bags Miu Miu Handbags Mulberry Bags Prada Handbags Versace Handbags Yves Saint Laurent Balenciaga Bags Burberry Handbags LV Handbags Gucci Handbags Chanel Handbags Chloe Handbags D&G Handbags Dior Handbags Fendi Handbags Hermes Handbags Jimmy Choo Bags Marc Jacobs Bags Miu Miu Handbags Mulberry Bags Prada Handbags Versace Handbags Yves Saint Laurent Balenciaga Bags Burberry Handbags

Those who want to become most beautiful in the world should try them. Just ones can make you different. Girls who want to grab your boyfriends's heart is necessary to use them.

clothes

Fashion clothes like Wholesale MLB Jerseys,MLB Jerseys Wholesale are coming. The Wholesale NBA Jerseys will keep you warm.I think you will like NBA Jerseys Wholesale.

christian

 

There are also online wow power leveling christian louboutin calculators that will give term life insurance quotes.christian louboutin shoes You can compare the quotes of different companies and aion power christian louboutin sale leveling choose one that is least expensive.

Actor Gary Sinise and his Lt.

Actor Gary Sinise and his Lt. Dan Band will perform a free public concert Friday at Fort Leavenworth as part of its Military Spouses’ Appreciation Day.

Custom Logo

shose

You should have the buy mbt shoes, Cheap MBT shoes to have a good journy.The Wholesale nfl jerseys,NFL Jerseys Wholesaleare fashion.If you like Wholesale nhl jerseys, you can buyNHL Jerseys Wholesale.

good things

Do you like NFL jerseys,MlB jerseys,and so on.The NBA jerseys,NHL jerseys are very warm.I hope you can buywholesale jerseys.The MBT shoes,MBT shoes on sale are very important to people.

very useful info, love

very useful info, love reading your post..

    free weight loss programs

I really enjoy reading this

I really enjoy reading this post.Thanks for blog. police equipment

<!--Session data-->

 

Thanks for the post. I now

Thanks for the post. I now have insights on your topic. A teenager is punched by a Seattle police man and it is all caught on video tape in the, cop punches lady case. Numerous individuals are turning this into a cop hate thing, however the girl was completely out of line and also the cop was outnumbered had a lot more joined in. Was the police offer likely to do nothing? If he tased her it might have been wrong, if he shot her it would have been wrong, so really what other alternatives did he have if we want to entrust him to impose the laws.

wales

car insurance quotes

Thank you very much for this

Thank you very much for this information. Good post thanks for sharing. I like this site ;) ----------- çerçeve çerçeve fizik tedavi fizik tedavi web tasarım web tasarım notebook fiyatları notebook fiyatları altın fiyatları altın fiyatları youtube youtube metin2 metin2 -----------

Question

Very interesting read--thanks.  Perhaps this betrays some staggering ignorance on my part, but I think it would be helpful if you could clarify the difference between this and weighting of the results of a poll.  Are the two complementary, does one cancel out the need for the other, or what?