Soon Your Film Can Have AI Insert Sound Automatically

Imagine editing your film and getting to the phase where the production sound needs to be added or perfected. This can be a fun part of editing and getting things together, but it can also be a very time consuming part of the process. If you didn’t have a recording of the live sound, you need to run through the various samples you have available in your library, or get them online, and see which sound works best. If you have the budget, you can add a Foley artist to your team to physically produce the sounds for your video. These sounds must then be edited in for the various sounds you need, be it the footsteps down the alley or the gunshots fired by the assassin that form part of your story. What if this can all be done automatically?

MIT just released an article that describes this technology that interprets the sound in the video they "show" to the software. Without adding a description of what it is that’s making the sound, the software can simulate what it thinks it sounds like by just "looking" at the video, and it’s good enough to fool humans.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have demonstrated an algorithm that has effectively learned how to predict sound: When shown a silent video clip of an object being hit, the algorithm can produce a sound for the hit that is realistic enough to fool human viewers.

I see a future scenario where you have your project open on your NLE, be it Adobe Premiere or Final Cut Pro, and you’ve completed the rough cut of your project and you need to get into the sound editing part of the process. Imagine, instead of importing your Foley library and going through each one to see if it fits the scene, you can now add an effect, like you would Warp Stabilizer in Adobe Premiere to stabilize the footage, and the effect runs through the shot and adds what it thinks the sounds should be for the instances it finds in the various cuts of film. This will be an amazing addition to the video and film production industry for sure. It will make the workflow smoother, and, you can possibly do it all by yourself.

‘When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it,’ says CSAIL PhD student Andrew Owens, who was lead author on an upcoming paper describing the work. ‘An algorithm that simulates such sounds can reveal key information about objects' shapes and material types, as well as the force and motion of their interactions with the world.’

With this AI there are positives and negatives. Costs of producing a video or film can be lowered. Technology like this can make production of a video and do it without the labor costs associated with this type of Foley-production process. That’s the positive. But, from the epic video found as part of Alex Cooke’s article “The Secret World of Foley” Is a Delightful Look at the Unsung Heroes of Film, you are able to observe the process Foley artists who make the sounds for a filmed story go through. It is just such a gracious, magical, and serene process I certainly never thought of as anything important, until the article and video.

Will the technology mentioned in MIT's article take this process away from human creation? Maybe, but these are inevitable changes in an industry that people will need to adapt to, just like photography switched from film to digital. I mean, would I rather have a DJI Ronin stabilizing the shot while shooting it instead of adding the Warp Stabilizer in post? Sure I do. Can it be done with the budget I have for a personal project? Not quite. If it's cheaper to add this AI to my project to give me the Foley I need than it is to ask someone to actually produce it, I'll choose the AI.

I guess you're probably thinking all we need now are robots photographing weddings, events, the streets, landscapes, and fashion and we're done for. What will really freak me out is someone in the comments telling me it’s already happening.

What I know is that it's the people who adapt that survive. But I've also learned that it's the passionate people, the ones who keep doing what they do to the best of their ability, maybe because they believe in it or maybe because it's their form of expression they choose and trust that get them the work no matter what technology they use.

Wouter du Toit's picture

Wouter is a portrait and street photographer based in Paris, France. He's originally from Cape Town, South Africa. He does image retouching for clients in the beauty and fashion industry and enjoys how technology makes new ways of photography possible.

Log in or register to post comments
1 Comment

Well, robots taking pictures is already happening....

Just kidding, but it's getting there. Computers can create visually interesting art relatively consistently, so it's only a matter of time until someone integrates image recognition and makes a photograbot. While this may replace some sound production, there are still various sounds that people don't think about that I'm sure the robots don't integrate.