Data

GrayMeta: The Variety of Data Could be Your Biggest Challenge

By Chris Tribbey

Aaron Edell, VP of operations and professional services for GrayMeta, isn’t a fan of the term “big data.”

Sure, the term can be used to describe the sheer amount of data coming in (for example, every day, 50 years worth of video gets uploaded to YouTube). But Edell likes to use the term big data as the challenges or storing and making use of the data, not the volume alone (even though, obviously in media production, you’re dealing with some of the biggest files in the world.

“Metadata is a problem and we’re all trying to solve it in different ways, and the amount of data being ingested is increasing, as is the variety of data,” he said, speaking at the March 16 Metadata Madness event in New York.

“It’s not a storage problem, it’s a retrieval problem. We know how to store it … you’ve got to store this stuff somewhere, and it needs to be efficient, it needs to scale. If you can’t find something, that’s a big data problem.”

Edell relayed the story of a broadcaster on deadline who couldn’t find a storage photo they owned of a celebrity, and with time running out before going air, they had to buy the photo — again, a photo they owned — from Shutterstock, which had purchased it from them. “That’s a big data problem, and we need to work on solving it,” he said.

But the variety of the problem may be the biggest issue people are dealing with today. He used the example of two spreadsheets, one with two columns of data, the other with three. Now think about those two spreadsheets with thousands of columns of data instead of just a handful.

“How do you combine them, how do you add them, and how do derive value from that and search that, when the structure is just slightly different?” he said. “Take that problem and extend it to your video files, your media content.”

He said media and entertainment companies must address the three “Vs” when approaching their data: velocity, volume, and variety, and put a plan in place take into account the obstacles of having content be a huge proportion of the volume

“We’re in a zetabyte era,” he said. “And 90% of world’s data was created in the last two years. That’s kind of insane, and it’s a trend upwards that we’re already feeling the effects of.”

The anatomy of the solution is being able to extract the data without altering it, make it searchable in a way you can correlate (like avoiding searching through duplicates) and keep what’s on the horizon in mind when data’s being entered, he concluded.