Metadata could kill you
Not really. But it could put a serious cramp in your plans at evading Big Brother or bring down your credibility.
Recently, the Wendell Smith episode dominated the Nigerian internet scene. What had happened? A a Microsoft Word document's author metadata contained some incriminating information. Dwelling on the technology behind the episode and skipping the politics, what is metadata?
Metadata is data about data. It describes the content, context and other qualities of original data thus increasing the quality and enhancing its understanding. For example, a webpage may include metadata specifying what language it is written in, what tools were used to create it, and where to go for more on the subject, allowing browsers to automatically improve the experience of users.
You may think you are more tech savvy than Wendell Smith and have scrubbed that erring author metadata from that document you want to put out there. But is that all of it? Here are some ways metadata leak out more than you think.
Not only Microsoft Word tags its files with descriptive data. People creating PDF files tend to forget they also have author and subject metadata fields. Several papers, publications and other media in this form show improperly scrubbed identifying metadata. It is a pity the real Satoshi Nakamoto didn't slip up when publishing the bitcoin paper. Or did he?
Not only desktop applications tag the files they produce, mobiles app do too. Remember Oprah's famous tweet about her love for Microsoft's Surface from an iPad? Mobile apps such as Twitter, and Facebook tag your tweets and posts with metadata such as your phone type and location.
During the TechCabal Battlefield, before details about the contestants were posted, there was speculation that Callbase was eerily similar to Fonenode. How did we confirm that? A whois lookup on both domains. You may have worked around this by using different data when buying domains and forget that they are tools which can tell if two sites gather user stats via the same Google Analytics account.
Remember John McAfee? He was on the run from a police investigation and allowed his picture to be taken by a journalist during an interview. What he didn't know was that the camera embedded location data in the picture. Big oops.
A recent article on Torrentfreak was about a music group which made available for its fans, copies of their own music. Small problem though, the tracks seemed to have been downloaded from a pirate site. How was that information gleaned? Metadata in the mp3 files.
There are many more examples which show metadata behaving like Freudian slips. Technology really wants to tell the truth (after all, progressbars don't lie). But before you think you are far beyond these slips and think every identifying information has been scrubbed, remember that the path to that source code you are compiling right now might be embedded somewhere in your compiler's output.