April 30, 2017

Drowning in Data

It looks like the government (specifically the intelligence community) has the same issue with data many companies are now facing with social media. With the proliferation of digital intelligence collection, there’s more data than they can effectively process.

From the blog post:

[…] The main problem is that there aren’t enough people who can intelligently skim all this data and decide what’s worth passing on, so they pass it all on. The corollary problem is that agencies have different priorities and missions and information that’s completely worthless to most can be essential to one or two. So, the raw information really has to be made available to everyone and the culling decisions have to be made at the local level.

Except that it’s really impossible to do. Even with unlimited budgets, it would be impossible to recruit and train enough competent people.

Collecting information, at least this sort of information, is easy. Performing competent analysis quickly enough to be useful to policy-makers and action officers is really, really hard.

The answer to too much data, if you are looking for insights into a product or service for a company is to use a process of sampling (taking random samples and analyzing those and extrapolate it out to the larger data set) or, perhaps triaging the data: reviewing the major sources or sites first, then smaller sites.

It gets exponentially trickier when you are looking for intelligence–for either the government or a company. (I’m not talking about industrial espionage, I’m more referring to the type of information that a company would find an early warning for useful. A product problem that one user experiences might be a bigger problem down the line. Learning about a planned protest or boycott, that sort of thing.) A tidbit of information on a small blog, connected with something analyzed earlier on another site, could be crucial bits of information. It takes an analyst who knows what he or she is looking for coupled with the ability to sort through the content to turn up that needle in a haystack. You can’t automate your way through that challenge.

The kicker to all of this–at a government and company level–is highlighted by one of the commenters on the Outside The Beltway post:

[…] If you choose to throw out something that later proves to be important, even if couldn’t reasonably have been known to be important at the time, you’re the one who’s going to be punished for it.

If you pass it on and the receiver fails to act on it, they’re the one who’s going to be punished for it.

So there’s a perverse incentive to pass on pretty much everything, no matter how useless.

I think the one consolation for those doing this type of data mining for companies is that there is rarely a potential for lives to be at stake if you miss something. Not so for the government.

And you think PR is stressful?

