Problems with Sampling II - Weights and Clusters
Pursuant to some correspondance with my colleagues Blue Leader and Dirty D, I'd like to give a brief follow-up to my post from Monday about the problems associated with stratified sampling. BL and DD brought up a pair of topics related to my last post but not covered therein: demographic weighting in poll samples, and cluster sampling. These are alternate methodologies that can be used in polling, and both deserve separate discussion. Now that I've established some groundwork on the topic of stratified sampling, I think it should be easier to tackle these two topics.
-------------------------------------
Let's begin with demographic weighting. Blue Leader asked me to explain how stratified sampling and demographic weighting differ, and whether they're complimentary or competing methodologies.
Basically, demographic weighting is a more easily executed solution to the same problems stratified sampling tries to resolve - namely, imbalance in sample demographics. Where stratified sampling independently samples from each identified demographic group, in proportions according to voter registration or predicted voter turnout; demographic weighting takes regular random sample data and corrects for biases created by over- or under-representation of groups with in the random sample. That was sort of a dense explanation, so let's give a practical example. I'll be using the framework from the stratified sampling post again (80% white, 20% black population; 100 person poll).
So, we want to make sure our poll adequately represents the demographic reality of the population. The stratified sampling solution is to poll the demographic groups separately. Randomly sample 80 white voters and 20 black voters, the appropriate demographic sizes, and then combine the results to get the full picture.
Demographic weighting corrects for biases in sample composition, rather than requiring a different sampling procedure. So, say we pull a random sample of 100 voters. It turns out that our sample is pretty weird: we get 10 black voters and 90 white voters. This considerably underestimates the black voter population from what we know about demographic group sizes. Weighting treats the demographic groups as if they HAD been been drawn in appropriate ratios. So, for example, since black voters should make up 20% of the electorate, but since they only make up 10% of our sample, we double-weight these data. Essentially, each black voter in the poll counts twice because the population includes twice the sample proportion of black voters. Polled white voters each count for 8/9 of a voter, because white voters are overrepresented in the sample (and because 8/9 * 90 = 80).
Weighting portions of random samples to account for population characteristics has the same problems as stratified sampling. Both methods share a reliance on turnout modeling which may prove inaccurate, for the reasons described in Monday's post. However weighting provides a unique advantage, and a unique disadvantage, when compared to stratified sampling procedures.
The advantage comes from the use of unrestricted random sampling. Stratified sampling requires an a priori specification of what demographic categories will be polled, and in what rates. Say a pollster is potentially interested in both race and socioeconomic status. To use stratified sampling appropriately, the pollster will have to create an exhaustive list of possible categories and poll each independently. So now it's not enough to poll just white voters and black voters, or even poor, middle-class, and rich voters. In stratified sampling, it's necessary to separately poll poor white, poor black, middle-class white, middle-class black, rich white, and rich black populations, treating each independently. Demographic weighting allows the analyst to break poll results into demographic categories on the fly, instead of requiring exact proportionality in collected samples.
The disadvantage comes from the inherent variability of random sampling. Say we get a poll result like that described above, where 10% of polled voters are black and 90% are white. When minority populations are undersampled, the magnitude of sampling error increases within that sample. Because sampling error is inversely proportional to the square root of sample size, error becomes less and less reductible at higher sample sizes. At very low sample sizes, however, sampling error can become a serious problem. When random undersampling occurs in already-small minority populations, the error within that weight-adjusted portion of the poll becomes much larger.
The most important problem, though, remains the same problem I described in Monday's post on stratified sampling. Poor turnout modeling essentially overpowers any accuracy within the poll itself. No matter how good the raw data may be, it provides very little value when plugged into a faulty prediction equation.
-------------------------------------
Now, for cluster sampling. Dirty D considers this a potentially more dangerous problem than stratified sampling procedures. I heartily agree that cluster sampling is an important issue, but it seems to me that the problems of cluster sampling are much more obvious and more easily corrected.
So what is cluster sampling? Cluster sampling is, in some ways, the geographical analog to stratified sampling. In cluster sampling, the electorate is divided into regional “clusters”. Then, clusters are randomly chosen and voters are surveyed only within those clusters. Clusters are considered representative of the total electorate.
Cluster sampling requires two conditions: the cluster must resemble the general electorate (mean scores should be the same within all clusters) and the cluster must include considerable diversity (high variance around the means within every cluster). Without these constraints, clusters can’t be assumed to represent the whole electorate, and can’t serve as effective proxies.
So cluster sampling requires new constraints on our data. What do we get in return? More precision?
No. We get cheaper polls. It’s administratively less expensive to poll individuals from a group of randomly selected geographical clusters than it is to poll a random sample of all Americans. The price of cost-cutting, conversely, is to increase the size of the sampling error within the poll. It’s difficult to estimate how much the sampling error is increased, because this depends on the representativeness of the clusters under analysis. It is impossible to know the representativeness of the clusters without doing a regular random sample, but to do a random sample would increase costs and negate the entire purpose of cluster sampling.
The more fundamental problem, however, is that cluster sampling is plainly unreasonable for the problems at hand. Cluster sampling works best when the primary source of variability is within clusters, not between them. With geographical clusters, this assumption hinges on the idea that there is little real difference between voters in rural Kansas and urban Chicago. There should be more variability within each cluster than there is between one cluster and the next.
Look at an electoral projection map, and it should be immediately clear that this is not a sound premise. The traditional red state / blue state divide is illustrative here. If geographical cluster sampling were a reasonable alternative, we would expect polls in all 50 states to closely mirror the national polls. Instead we see some states where John McCain has a clear advantage, and other states where Obama is expected to win easily. Obviously, voting patterns are not the same in these states – if they were, we should expect to see the same candidate ahead in both, and by roughly equivalent margins.
Even for internal state polls, the assumptions of cluster sampling are often violated. Most states in the USA have one or two large cities and a number of rural counties. Recent electoral history consistently shows that Democrats tend to make stronger showings in large cities, and Republicans dominate rural areas. Even within individual states, then, the assumption that mean scores are equal - that a candidate should hold uniform levels of support across a statewide collection of geographic clusters - seems demonstrably false.
So if the premises of cluster sampling are invalid, what recourse do we have? Perhaps the most direct is for us to simply stop engaging in this questionable practice. Yes, it will cost more money to abandon culture sampling. But what good is it to have data we cannot trust?
Tomorrow I'll be writing my regularly scheduled feature - "The Plebian's Guide to Polling" - so expect me to return to this conversation on Thursday. At that time, I'll pick up with the problem of polls-of-polls and some of the faulty assumptions therein.














Recent comments
3 min 16 sec ago
2 hours 7 min ago
2 hours 10 min ago
2 hours 13 min ago
2 hours 15 min ago
1 day 8 hours ago
2 days 5 hours ago
5 days 46 min ago
1 week 3 days ago
1 week 4 days ago