Translating video into sound


We know that sound consists of vibrations in the air or other medium transmitting it and that it causes objects on which the waves impinge to also vibrate, even if we cannot see it with the naked eye. But in theory at least, by closely observing the vibrations of the object, you should be able to reconstruct the sound that caused it to vibrate.

It turns out that you can now do it is practice and it is the great sensitivity of modern instrumentation that enables us to do it. Researchers from MIT, Adobe, and Microsoft played music to plants while taking high-speed video. Then using sophisticated algorithms, they were able to reconstruct from the silent video of the plants a pretty good reproduction of original sound. They repeated the exercise with someone speaking near a bag of potato chips in a sealed room with soundproof glass while taking the video through the glass, and you can actually make out the words.

One obvious application of this is for espionage, to figure out what people are saying from video even when you are not close enough to hear.

But there are many possible uses. A more interesting (to me, at least) is that we may be able to reconstruct sound from old silent video. Of course, the researchers were able to use high quality, high frequency video for their work, not grainy old footage. But they were able to get some results using regular video cameras as well and as the sensitivity of the detectors increase, it may become feasible to figure out, for example, if the actors in silent films were actually saying what the subtitles say.

Comments

  1. Chiroptera says

    Weren’t they making progress on doing just this by observing the vibrations on glass windows to eavesdrop on the conversation going on in the room?

    I know I’ve read a science fiction story where the government officials (I can’t remember whether they were the good guys or the bad guys) met in rooms where devices induced random vibrations to the windows to prevent just this sort of thing.

  2. says

    Weren’t they making progress on doing just this by observing the vibrations on glass windows to eavesdrop on the conversation going on in the room?

    NSA was doing that back in the 80s. You bounce a laser off a window and then demodulate the vibrations in the beam on the other end. That’s one reason why the NSA’s OPS1 building is a glass-cased frame enclosing dead air over what amounts to a giant faraday cage. It also cuts energy costs somewhat because the dead air is good insulation.

  3. Lonely Panda, e.s.l. says

    Fascinating. Thanks for posting this.

    But there are many possible uses. A more interesting (to me, at least) is that we may be able to reconstruct sound from old silent video. Of course, the researchers were able to use high quality, high frequency video for their work, not grainy old footage. But they were able to get some results using regular video cameras as well and as the sensitivity of the detectors increase, it may become feasible to figure out, for example, if the actors in silent films were actually saying what the subtitles say.

    Unfortunately, I don’t think this would work for the silent films. The reason they can do this with non-highspeed cameras is because the sensors on basic video cameras use something called rolling shutter. Each horizontal line is recorded at a slightly later time from the previous line. So even though the frame rate may only be 30 Hz, the line sampling rate would be hundreds of times faster. There are other complications that would reduce the useable sampling rate from there (the exposures for the different lines still overlap, you have fewer pixels to average together to infer the subpixel vibrations). The bandwidth of the reconstructed audio is ultimately limited to half of the effective sampling rate.

    Film cameras (ignoring some special cases such as roll-out photography) use a global shutter; the complete image is recorded simultaneously. Thus their technique for teasing a higher sampling rate out of the image data wouldn’t work.

  4. Mano Singham says

    @1&2,

    The difference is (I think) that in those cases, you were detecting actual vibrations. In this case, you are reconstructing the vibrations from video of the vibrations.

    @#3,

    That’s too bad!

  5. AsqJames says

    Should have just asked the plants what they’d heard.

    Plants can hear. Well, they can sense sound-vibrations. New research from the University of Missouri shows that when the mustard-like Arabidopsis senses the chomping sounds of a caterpillar munching on leaves, it primes itself for a chemical response.

    -- http://www.bbc.co.uk/programmes/b0499llm

    And we all thought Prince Charles was bonkers for talking to his plants!

  6. moarscienceplz says

    Plants can hear. Well, they can sense sound-vibrations. New research from the University of Missouri shows that when the mustard-like Arabidopsis senses the chomping sounds of a caterpillar munching on leaves, it primes itself for a chemical response.

    Hold it. This raises my BS alarms. Did the caterpillars munch on neighboring plants, or were they munching on the test subject plant that released the chemicals response? If it was being chomped itself, a plant could have received that news from its own leaves chemically. Similarly, if it was indeed neighbor plants getting chomped, how do the researchers know for sure those plants weren’t sending out chemical alarms that their test subject plant responded to, and NOT to the sounds of chomping?

  7. moarscienceplz says

    A more interesting (to me, at least) is that we may be able to reconstruct sound from old silent video.

    I’m not sure what you are hoping to hear. I suppose you might be able to hear the directors of silent films giving instructions to the actors. I guess that’s sorta interesting. There might be a handful of famous people who were filmed but didn’t live long enough to have their voices recorded, possibly Rudolph Valentino is among those, but there can’t be many like that. If all you want is to add a sound track to a silent movie, we can do that now with voice actor dubbing and foley artists. But remember what a lead balloon colorizing B&W movies turned out to be.
    Or, possibly you are thinking of restoring sound to old home movies. I think most of that would consist of, “Look at me, daddy!” and, “Don’t point that thing at me, Henry!”

  8. AsqJames says

    @moarscienceplz,

    TBH I just threw it in there ‘cos this post reminded me of it and I thought it was amusing.

    I heard the original transmission of the programme (while doing something else) and at the time I was less than 100% convinced by the explanation of the protocols, but that may have been more to do with my inattention or the programme’s time constraints. It’s a pop science programme so they want to talk about (and listeners want to hear) what’s interesting in the research rather than spend time establishing its credibility (I would hope they do due diligence beforehand though).

    As I recall (you’ll have to check by listening if you can or seeking out the research if it’s available) the caterpillars were never actually present -- the sound of them munching was played to the plants. Other “similar” sounds were played too without the same effect, though here we come to one of my concerns. I can’t remember what it was, but one of the alternative sounds they mentioned didn’t strike me as being easily mistakable for a chomping caterpillar. On the other hand my knowledge of the sounds caterpillars make could charitably be described as “limited”.

  9. OverlappingMagisteria says

    #3 Lonely Panda -- Thanks. I was going to write the same thing about the rolling shutter, but couldn’t find the right way to explain why it wouldn’t work for silent films.

    However, if your goal is to see what the silent film stars actually say, I’d think that lip reading would be an easier method. Of course, lip reading is not as precise as most people expect but you could probably get a good idea of their words.

Leave a Reply

Your email address will not be published. Required fields are marked *