Slight Distortions in Speech Recognition Create Audio Hallucinations
Everyone else agrees that’s not at all what was said, and eventually Brian is convinced as well. But still, he heard what he heard. Something somewhere changed the signal for him only, leading to the aforementioned audio hallucination. Audio hallucinations in human brains are tricky things that happen for a variety of reasons, but it generally reduces to faulty information processing. A misinterpreted signal.
It’s hard to imagine how another person might misinterpret a signal that, to us, seems so clear. But we can look to machines—and machine learning—for examples of how speech recognition can go awry, how a simple phrase might be heard as something completely different in the presence of a slight distortion. A pair of computer science researchers at the University of California, Berkeley, Nicholas Carlini and David Wagner, have demonstrated just this, crafting finely-tuned audio hallucinations by tricking the state-of-the-art DeepSpeech speech recognition neural network into transcribing most any audio (speech or even just plain noise) into really whatever they want. Listen to examples here.
“With powerful iterative optimization-based attacks applied completely end-to-end, we are able to turn any audio waveform into any target transcription with 100 percent success by only adding a slight distortion,” Carlini and Wagner report in a paper recently posted to the arXiv preprint server. “We can cause audio to transcribe up to 50 characters per second (the theoretical maximum), cause music to transcribe as arbitrary speech, and hide speech from being transcribed.”
Enlace: Slight Distortions in Speech Recognition Create Audio Hallucinations