Deep Learning for Techincal Artists and Technical Directors in 3D Animation, VFX, and Games

Deep Learning is a rapidly growing branch of Artificial Intelligence that is being used in many fields like consumer electronics, medicine, law, and others. There are already many research papers outlining contributions Deep Learning can bring to the 3d pipeline, be that in Animation, Games or VFX.

But research papers are hard.

I have 10 years of experience as a Generalist and Technical Director in 3d Animation. And I have spent the last 3 years researching Deep Learning in the context of 3d animation.

Having spent a significant time in this field, I know I can share valuable straightforward content and code that would otherwise be hard to access through research papers. And this is what 3DeepLearner.com is about: sharing the know-how needed to accelerate the adoption of Deep Learning techniques in our field.

I invite you to explore the limitless possibilities brought upon us by this technology. If you are ready to join the movement, sign up to our mailing list!

A little backstory

My name is Gustavo Boehs and I have been working as a Generalist and Technical Director in 3D animation since 2008. Three years ago (2015) I was presented with a challenge: how to control the rig of a worm character using human motion capture as input. My initial approach was to use regular rigging techniques for motion retargeting, however, the results were far from amazing. Eventually, we decided to build a sort of an exoskeleton that we used for puppeteering, and then we created a custom motion capture rig for that skeleton in Vicon’s Blade software. This time the results were pretty good, but the whole process was time-consuming and the solution could not be easily replicated or generalized.

A CGI worm driven by motion capture
Puppeteering is an interesting but non-generalizable solution to non-bipedal character animation through motion capturing.

This project ended at same time I was starting to write my Ph.D. proposal, and it influenced the topic of choice: Finding a more generalizable solution to the problem of retargeting human motion to non-human characters. As I researched previous work in this field, it became clear that traditional rigging techniques were not the solution. On the contrary, machine learning was the way to go.

Machine learning enables one to compute tasks that are difficult to program explicitly, using knowledge that is implicit in data

In traditional rigging (and in computer programming in general), we explicitly declare rules that control the computation. In machine learning, we use techniques for pattern recognition in a series of examples (data) to implicitly program the computations. In short, it is a way to program things that we can do intuitively but are hard to describe. For example, this problem of transferring human motion to non-human characters can be dealt through ML in two ways; either by Motion Classification or by using Pose Mapping. Motion classification entails correlating a set of captured examples to corresponding motion classifications, and using the computed classifications to drive a character.

Pose mapping entails correlating a set of key poses in a human model to a set of equivalent key poses in a character model. This correlation creates an implicit mapping from one model to another.

I quickly discovered that sticking to the traditional rigging approaches and avoiding machine learning would seriously hinder my work. Conversely, ML would take my research to a whole new level.

Opening the door to a new dimension

The examples shown above are just a drop in the ocean. Machine learning techniques can be used, among others, to filter noisy motions, create full body IK solutions, determine the most relevant blendshapes to describe an anatomy, infer anatomy from motion capture markers, and literally countless other applications. The possibilities are endless.

Despite what some technical papers might make you think, you don’t need a PhD in math to understand deep learning

One would expect such a wonderful toolset to be extremely complicated, technical, and math intensive. Surprisingly, it’s not like that at all. Although some machine learning techniques rely heavily on statistics, the most prominent approach nowadays, Deep Neural Networks (Deep Nets), is not markedly heavy on math, despite what you might read on the odd technical paper here and there. Google, Facebook, Microsoft and other players make their production-proven Deep Net frameworks available for free; most of them accessible through easy to use Python interfaces. The internet is also full of free and paid content, and also training material on deep nets.

Great. I am sold. Where do I start?

Sadly, most blog posts, tutorials, and courses on deep learning focus on tasks such as image classification and natural language processing. While it is true that the basic concepts apply to most applications, I find this content unappealing to TDs for two reasons: (1) the types of data, mostly letters and pixels, are very different from 3D kinematics or geometry; and (2) the applications themselves are too different from our bread and butter as TDs.

80% of TDs I’ve interviewed said applying Deep Learning to their work was not easy

Furthermore, this isn’t just my own opinion. I have run a survey with several senior TDs and developers working in 300+ people companies, and all of them expressed interested in deep learning. However, 80% of them said they were either overwhelmed by the existing training material or simply could not apply the knowledge to their field of work. For example, a classic ‘Hello World’ in Deep Learning is training a network for the task of digit classification, using the MNIST Dataset (image below).

Many handwritten digits
The MNIST dataset is commonly used to train models for digit classification.

While it is nice to learn how to code a neural net for digit classification, how will it translate to your work as a TD? How can you feed this same neural network 3D geometry or kinematics? What if the data varies over time? Should we expect comparable results? Are adaptations needed?  And how can we load Alembic, FBX or Maya files to these frameworks? How can one bring the results back to the host DCC and run the results with good performance, preferably in real-time?

Walking the path

Figuring all this out was critical in bringing my Ph.D. project to life. Fortunately, I already did that, and I am about to share the knowledge with you, so make sure you read on.

I’ve spent the last 3 years doing deep learning as a Character TD and want to share that know-how here

In essence, I had to create a workflow to get the data in and out of my ML library of choice (MATLAB at the time), visualize it (which is crucial in the debugging process of deep nets), and calculate its predictions in real time. This entailed creating a library to pre-process and post-process kinematic data, and creating my own implementation of MATLAB compatible neural networks to run in DCCs such as Maya. The performance was great, and one could even design the topology of the neural network using a nodal interface, which was quite cool.

Autodesk Maya user interface running a neural network.
A regression neural network running real-time inside Maya.

In total, I spent the last 3 years developing deep learning solutions for motion capture retargeting. My initial prototype work has been published at SIGGRAPH (in 2016), and I have a couple of papers describing the final work that went into the thesis coming out in respectable CG journals. In addition, some patents have been filed, regarding the most innovative parts of the work. This site is about sharing that know-how with other people to accelerate the adoption of deep learning techniques in our field.

Why this blog is a great resource for TDs

I am absolutely convinced that deep learning will make a definitive entrance into the 3D pipeline, be that in 3d Animation, Games or VFX, and I want to play an active role in it. Having spent a significant amount of time in this field, I know that technical papers can be a bit daunting. I feel I can share valuable straightforward content and code that will make the path to deep learning for TDs a short one. And this is what 3deeplearner.com is all about!

3DeepLearner is about sharing content and code that can make Deep Learning easy for TDs

I invite you to explore the possibilities that deep learning specifically and machine learning in general represent to our field. If you are ready to join the movement, sign up to our mailing list!