Data Mining

Data Mining for Journalists

Via Slashdot, investigative journalist John Mecklin lays out a way that the Internet revolution is actually helping journalism (crazy, I know):

Now, in the post-Google Age, Allison sees the possibility that computer algorithms can sort through the huge amounts of databased information available on the Internet, providing public interest reporters with sets of potential story leads they otherwise might never have found. The programs could only enhance, not replace, the reporter, who would still have to cultivate the human sources and provide the context and verification needed for quality journalism. But the data-mining programs could make the reporters more efficient — and, perhaps, a less appealing target for media company bean counters looking for someone to lay off.

IMHO, the part about investigative reporters not getting laid off seems increasingly far-fetched.  There are problems in the news business that a few new reporting techniques won't solve.  But still, increasing the efficiency with which the public can gain from its own data is something worth cheering.  As I've tried to stress throughout my posts, the ability to search through massive databases of material like this is still in its infancy.  Our ability to collect information has outstripped our ability to make sense of it, and we're still growing into all the things we can do with this data.

Data Mining Epic Fail

The Ever Amazing Cory Doctorow managed to unearth this little beauty on how government usage of data mining is best described as a catastrophic failure:

They admit that far more Americans live their lives online, using everything from VoIP phones to Facebook to RFID tags in automobiles, than a decade ago, and the databases created by those activities are tempting targets for federal agencies. And they draw a distinction between subject-based data mining (starting with one individual and looking for connections) compared with pattern-based data mining (looking for anomalous activities that could show illegal activities).

But the authors conclude the type of data mining that government bureaucrats would like to do--perhaps inspired by watching too many episodes of the Fox series 24--can't work. "If it were possible to automatically find the digital tracks of terrorists and automatically monitor only the communications of terrorists, public policy choices in this domain would be much simpler. But it is not possible to do so."

There are several points in this to discuss. Let's start by conceding that the massive intrusion of the government into our daily lives by means of data collection and profiling is absolutely terrifying, etc. We all agree to this, and it's non-controversial.  Of course, it's also non-trivial, but I think that we can take it as axiomatic that all of us agree that it's pretty scary.  The question, though, is how scary it can be when they can't even do it correctly, and the answer is, even more terrifying.  The fact that an aggressive lunatic with a rifle has terrible aim doesn't do anything to reassure the people around his target that they won't get hit.   The phrase "false positive", while used correctly, somewhat euphemises what's actually happening. What this means is that the government are arresting and charging the wrong people. 

There's more...

Syndicate content