The term “Big Data” has become virtually synonymous with “schema on read” (where data is applied to a plan or schema as it is ingested or pulled out of a stored location) unstructured data analysis and handling techniques like Hadoop. These “schema on read” techniques have been most famously exploited on relatively ephemeral human-readable data like retail trends, twitter sentiment, social network mining, log files, etc.
But what if you have unstructured data that, on its own, is hugely valuable, enduring, and created at great expense? Data that may not immediately be human readable or indexable on search? Exactly the kind of data most commonly created and analyzed in science and HPC. Research institutions are awash with such data from large-scale experiments and extreme-scale computing that is used for high-consequence.