Building a Voter File Part 1: From the Raw Data Up
Karl Rove got your mom's phone number the other day. Not directly, but through a mutual acquaintance--probably a secretary of state. Both parties--and plenty of other organizations besides--are engaged in creating massive stores of information on voters all across the country. This is their story.
In order to run any sort of political campaign, it’s crucial to know something about your voters. Who in a given neighborhood is registered to vote? Who isn’t, but should be? Who falls into what demographic group? Polling can tell you something about people—even a small group can contain useful information about an entire demographic, thanks to the magic of sampling—but to get bulk information about who lives where and what they’re like, you need a full-fledged voter file.
In a very basic outline, the process goes something like this: data is collected, combined with other sources, and put into a useful format. This data can be analyzed, improved, added to, and eventually used to run a successful campaign (Nate Silver has a useful primer). Today, we’re going to focus on the very beginning of this process.
One symptom of America’s unupgraded beta of democracy is our insistence on federalism, even when it makes our lives difficult, and the way we collect voter data is no exception. Every state’s voter file contains different information, arranged in different ways, with different formats for presenting the information. Better than nothing, but not very useful if you want to run a national campaign, or even append information to everyone in the country (as opposed to a particular state). Before any of the interesting stuff can begin, the information needs to be in one single format.
First of all, the file needs to contain a unique identifier for each person. The states are required to assign these IDs—thanks to the Help America Vote Act, or HAVA—but their identifiers are only unique statewide, and sometimes not even that (for example, it might have to be combined with the county a voter is registered in). Only an ID assigned by the organization creating the file can be guaranteed to be unique (and if the file doesn’t contain a unique identifier, it’s extremely difficult and unreliable to verify people’s identities without tracking them via unique ID, thanks to inconsistencies in reporting various types of data). If your name is John on one source of information, Johnny on another, and Jhon on yet a third, it’s a challenge to ensure that all this really refers to the same person.
In addition to a simple list of who lives in the state, the voter file will usually contain other useful information--address, previous vote history, registration information, date of birth. Of course, not all of these are available from every state. Not every state keeps track of party registration, for instance. Some states have detailed vote history, others record only a few elections. But at the end of this process, a raw info-dump has been turned into something that can be combined with information from any other state. Tomorrow: appending!














Recent comments
51 min 37 sec ago
5 days 21 hours ago
6 days 11 hours ago
1 week 5 days ago
1 week 5 days ago
1 week 6 days ago
1 week 6 days ago
1 week 6 days ago
2 weeks 1 day ago
2 weeks 2 days ago