Asset Pipeline Design

Over the span of my career, I have worked at several companies. Each with their own approach to building asset pipelines. I have seen strategies that work, and strategies that seem simple but cause problems. I’ve also seen pipelines that are too complicated and delicate to change. Designing a solid pipeline is simple, but rather than thinking of a pipeline as a flow from left to right, it’s far more helpful to conceptualize the pipeline as a series of layers.

The design of a pipeline is an example of what Amazon describe as a One-Way Door. This means that once it’s implemented, it’s going to be incredibly hard to go back on. That’s why it’s critical to get this right in the situation where you have a say in how the pipeline is going to operate at the start of a project.

The core of a pipeline is essentially the key reference point for the assets in the project, and we describe that as the source of truth. i.e. it is the single place that we use to get information about assets within the project.

Asset-Centric Pipeline

This design is the most common format that I have seen in industry. All the assets are stored in source control, so that becomes the source of truth. Perforce is a pretty good candidate for source control, but I’ve also seen Git, Subversion, and one company even used Google Drive as its source control platform (which didn’t go well). The Asset Manager’s key job is to handle the data between the domains (DCC/Engine/Project Tracking) and Source Control.

Data-Centric Pipeline

Rather than use the assets as the source of truth, this design relies on abstract data instead of the assets themselves. In my time in industry, I have only seen this system implemented at one company, but it was quite eye-opening to experience how it can benefit a pipeline.

Comparison Of Pipeline Designs

Aspect	Asset-Centric	Data-Centric
Data Duplication	Assets with similar components means data is represented multiple times. This means updating a single component may involve updating all the other identical components.	Data duplication is avoided by implementing normalization of the data. We only have to update a component once for the changes to be propagated.
Engineering Cost	There are fewer systems involved, so the initial system is less costly to set up. However, extra engineering costs are necessary in the Asset Manager layer.	Designing, building and maintaining a database comes at a cost, but this is justified by time-saving and less prone to error.
Data Format	A Rigid naming convention is critical for this to work. It is necessary to police this and to make critical decisions about folder hierarchy.	Asset names can be obfuscated. They can be given hashed names because the database provides the human readability.
Offline Practicality	Conflicts are going to be the danger when working offline. Either implement locking or engineer a solution to aid merging when the network is restored.	An SQLite proxy could be used when network access is unavailable. More sensibly, locking should be implemented before the network is restored to avoid conflicts.
Financial Cost	No extra cost.	Using a database comes at an added cost because it’s an extra system.

Manual Asset-Naming vs Obfuscation

Asset-centric pipelines have one major problem: their dependence on a rigid naming convention. You can establish a strict naming convention and communicate that to artists working on the project. However, in the real world, artists get sloppy and don’t stick to naming conventions. Camel-casing gets mixed up with Pascal-casing. Extra mystery tags get added to file names. Underscores are underused, overused or missing altogether. You can check for rigid naming conventions as part of an asset validation tool, but what I found in reality was that such tools are intentionally deactivated because artists struggle with that level of discipline. With a data-driven pipeline, sure, you can still use a naming convention for file-names if you want. But the database itself can act as the agent which handles that side of things, meaning you can obfuscate the names, or allow artists to name things whatever they want without worrying about how that will affect the project. Consider the following texture example:

Convention-driven name: ENV_PLANT_WILLOW_LEAF_01.png
Obfuscated name: a8dkja98uj0sj.png

The second example is not readable is it? Well it doesn’t need to be because the artist doesn’t handle it directly. Using an Asset Manager, they will access the asset by searching the database for willow leaf textures, and get the option to edit/update directly in the tool. The other advantage here is that this allows you to create a much simpler folder structure to store such textures in, as the location of the texture is not manually handled by the artist, but instead by the Asset Manager itself. In the case of missing textures, the Asset Manager again can deal with issues as this handles all the specific data for assets rather than relying on the collective discipline of the art team. I’m not saying that manually-named assets shouldn’t be used at all, but when you have assets that are as frequently used as textures, it can start to become a hindrance. But essentially, if an artist does name a texture manually, the whole system won’t break because of the requirement of a texture to be named a certain way.

Buy-In

For me, there’s a clear winner between the two systems. But getting buy-in from leadership is often going to be a challenge. The big win as far as data-centric pipelines are concerned is the potential for avoiding data duplication and the associated reduction of tech debt. However, building a quality database is no mean feat and requires an unavoidable amount of effort to get right. Hopefully, if you are in this situation as a pipeline developer, this discussion is something you can use to present the idea to leadership.

robodojo

Andy Davis | Tools/Pipeline Technical Artist