Smart Screen Exclusive

C4 IDs: Internal Medicine for Enterprises

The second of three articles.

The Cinema Content Creation Cloud (C4) is an asset identification framework developed by the Entertainment Technology Center (ETC) at USC, along with its major studio partners, to help manage today’s complex and distributed workflows.

As described in the first article in this series, the C4 ID system creates an unambiguous, universally unique ID for any file or block of data anywhere in the world, regardless of origin or local storage system, so that users in different locations can be certain of working on or referring to the same file.

But C4 IDs can also benefit internal asset management systems by reducing storage costs, improving network efficiency, managing version control and monitoring the health of digital archives.

Data de-duping and archiving: By unambiguously identifying files based on the content of the files themselves, C4 IDs can readily reveal which files are identical copies of each other so that unneeded copies can be removed. The storage recovered by that process can be “quite significant,” C4’s developers claim, delivering immediate real value to the enterprise.

Processing archives to compute C4 IDs and comparing them with the IDs of unarchived files also provides unequivocal confirmation of which files have successfully been backed up. Periodically re-running C4 extractions on archived files and comparing them with the record of C4 IDs will also easily reveal any data degredation, since any alteration of the data will produce a different C4 ID.

Network optimization: As with data de-duplication, C4 IDs can reduce transmission costs across a network. By keeping a record of C4 IDs for all files on remote and local systems, and reconfirming IDs after transmission, transmission errors can easily be detected. Retransmission can then be avoided for data that already exists on the remote system.

Further, since C4 computations are deterministic they provide a way to create a new remote file from source data via computation when that’s more cost-effective than transmitting a file over a network, such as when transcoding a source video file to a desired result is faster then transmitting a previously transcoded file. If you know the C4 ID of the result, and you have a record of how it was produced from the original, you can reproduce the result on a remote system computationally.

Version control: One of the core characteristics of the C4 framework is that it excludes any concept of modifying a file. A modified file is a new file for the purposes of C4 identification — exactly what version control systems need to differentiate between versions.

As a file changes, C4 IDs and a copy of the file can be retained as a version. For large projects, with multiple files, the C4 IDs for a version can be stored as a text file or database and used to restore the entire state of a particular version of the project at any time.

Once files can be differentiated unambiguously, moreover, it’s possible to establish immutable relationships between those files. Just as a C4 ID immutably identifies a particular file, relationships between files can also be defined immutably.

There are many other benefits to an enterprise from using C4 IDs internally. Further information can be found the white paper describing the C4 framework published by ETC and available here.

In the third part of this series we’ll look at how C4 IDs can be used between and among enterprises to coordinate distributed workflows.