Producing training sets for Extract: Dialogue by Acon Digital

2 min read
Producing training sets for Extract: Dialogue by Acon Digital

This article summarises the extensive experimental work we have done to develop and build a data training set for Acon Digital’s Extract:Dialogue  an audio plugin that removes common noise problems, such as those encountered with lavaliere microphones. High-quality training sets are crucial to efficiently train the HANCE audio engine.. In the following text, we detail one example of how HANCE has tailored training sets to solve real-world audio problems.

Lavaliere microphones are small microphones placed on actors to capture dialogue for movies and TV. Lavalieres typically need to be hidden under the actor’s clothes, which creates a distinct set of problems. The most common being noise from clothes touching the microphone membrane. Other issues arise from the conditions in which the scenes are filmed, such as broadband or static noise from fans, lightning, generators, traffic, and other unwanted audio sources.

Lavaliere Microphone
Figure 1: A typical lavaliere microphone with a paper clip for size comparison.

We started by dividing the noise palette into the two categories static and burst . We outline static noise as long stretches of unwanted signal data, such as traffic, rain, air conditioning units, ocean waves, and so forth. We gathered a wide variety of material from pre-recorded audio assets provided by  Soundly, a leading sound effects platform and library used by professional film and TV producers worldwide. We further compiled an extensive collection of static noises from real-life recording sessions done with lavaliere microphones.

Bursts are short periods of noise that typically arise due to the lavaliere microphone touching clothes, hair, and hands. We used three different lavaliere microphones to record typical bursts. In addition, we collected natural bursts from set recording sessions. Noises such as mouth clicks, mics thumps and other short burst noises were also added to the training set to make it even more useful.

Having compiled a comprehensive set of unwanted noise, we could finally focus on what we were after: clean, artifact-free voice recordings! To get as close to real-life examples as possible, we recorded voices of different ages and genders in a controlled environment using lavaliere microphones.

To create quality models, we feed our algorithms wanted audio (voice), unwanted audio (noise), and the two combined (noise + voice). We used the Python programming language to generate training sets, and HANCE developed several advanced applications to streamline this process. One of these applications, jokingly named the merge-devide-and-conquer script by our engineers, would for example execute the following procedure:

  1. Extract a random 11 second stretch of a voice recording.
  2. Pick a noise file from the static sound pool, looping it seamlessly if the noise length exceeded the voice’s duration.
  3. Mix in bursts at random intervals to the pulse code modulated (PCM) data.
  4. Normalize the voice and noise to a set limit.
  5. Save the voice, noise, and combined data separately.
  6. Export meta-data to files describing the training script set and what parts to use for validation.

Lavaliere Soundly
Figure 2: The noise sets as seen in the Soundly Application.

We were excited to hear the noise reduction’s effectiveness in our first model, but we were still a long way off a usable product. HANCE spent several months tweaking the set, adding more noise and voice as needed.One particular problem we discovered was that our voice recordings were too controlled compared to real-world recordings. We experienced a gating effect at the end of words when applying the model to recordings from noisy film sets and similar. After researching several options, we found that the model would obtain a more natural transition between voice and silence by adding a short reverb to some of the training set’s clean voice recordings.Building the lavaliere training set gave us a valuable insight into the kind of problems audio engineers face when working with film and TV. The work resulted in an extensive training set for this specific purpose. We believe this is one of the great strengths of the HANCE algorithms, such as the HANCE Audio Engine: combining training data for several different applications to find solutions to unique problems.

Footnotes
Author
Tags
Share

Related posts

Remix: The Next Step in Real-Time Audio Processing by Acon Digital and HANCE

Remix: The Next Step in Real-Time Audio Processing by Acon Digital and HANCE

Acon Digital and HANCE have partnered to launch Remix. This audio plugin operates on the HANCE Audio Engine, a cross-platform library with an easy...

MKBHD Showcases HANCE and Acon Digital’s Extract: Dialogue

MKBHD Showcases HANCE and Acon Digital’s Extract: Dialogue

We partnered with Acon Digital on Extract:Dialogue, which caught the attention of Ellis Rovin at the popular MKBHD YouTube channel, which, at the...

Deep learning for audio applications

Deep learning for audio applications

Here at HANCE, we focus on the use of deep learning combined with digital signal processing. Our goal is to create solutions based on...

Phase aware source separation – overcoming the limitations of the ideal time-frequency magnitude masking

Phase aware source separation – overcoming the limitations of the ideal time-frequency magnitude masking

Spectrogram-based methods for source separation using neural networks typically estimate a separation mask that is multiplied pointwise with the...

HANCE 2.0: Realtime Stem Separation - Hello, Music Industry

HANCE 2.0: Realtime Stem Separation - Hello, Music Industry

HANCE Audio Engine v2.0.1 is now live - and with it, realtime stem separation made effortless.

Why data safety is a competitive edge in AI

Why data safety is a competitive edge in AI

In recent years, with the growth of the artificial intelligence industry, people have shown concern for responsible use, privacy, and data safety. We...