A team created Audeo, a system that can generate music using only visual cues of someone playing the piano.
From The University of Washington
February 4, 2021 – Anyone who's been to
a concert knows that something magical happens between the performers and their
instruments. It transforms music from being just "notes on a page" to
a satisfying experience.
A University of Washington team wondered
if artificial intelligence could recreate that delight using only visual cues
-- a silent, top-down video of someone playing the piano. The researchers used
machine learning to create a system, called Audeo, that creates audio from
silent piano performances. When the group tested the music Audeo created with
music-recognition apps, such as SoundHound, the apps correctly identified the
piece Audeo played about 86% of the time. For comparison, these apps identified
the piece in the audio tracks from the source videos 93% of the time.
The researchers presented Audeo Dec. 8
at the NeurIPS 2020 conference.
"To create music that sounds like
it could be played in a musical performance was previously believed to be
impossible," said senior author Eli Shlizerman, an assistant professor in
both the applied mathematics and the electrical and computer engineering
departments. "An algorithm needs to figure out the cues, or 'features,' in
the video frames that are related to generating music, and it needs to
'imagine' the sound that's happening in between the video frames. It requires a
system that is both precise and imaginative. The fact that we achieved music
that sounded pretty good was a surprise."
Audeo uses a series of steps to decode
what's happening in the video and then translate it into music. First, it has
to detect which keys are pressed in each video frame to create a diagram over
time. Then it needs to translate that diagram into something that a music
synthesizer would actually recognize as a sound a piano would make. This second
step cleans up the data and adds in more information, such as how strongly each
key is pressed and for how long.
"If we attempt to synthesize music
from the first step alone, we would find the quality of the music to be
unsatisfactory," Shlizerman said. "The second step is like how a
teacher goes over a student composer's music and helps enhance it."
The researchers trained and tested the
system using YouTube videos of the pianist Paul Barton. The training consisted
of about 172,000 video frames of Barton playing music from well-known classical
composers, such as Bach and Mozart. Then they tested Audeo with almost 19,000
frames of Barton playing different music from these composers and others, such
as Scott Joplin.
Once Audeo has generated a transcript of
the music, it's time to give it to a synthesizer that can translate it into
sound. Every synthesizer will make the music sound a little different -- this
is similar to changing the "instrument" setting on an electric
keyboard. For this study, the researchers used two different synthesizers.
"Fluidsynth makes synthesizer piano
sounds that we are familiar with. These are somewhat mechanical-sounding but
pretty accurate," Shlizerman said. "We also used PerfNet, a new AI
synthesizer that generates richer and more expressive music. But it also
generates more noise."
Audeo was trained and tested only on
Paul Barton's piano videos. Future research is needed to see how well it could
transcribe music for any musician or piano, Shlizerman said.
"The goal of this study was to see
if artificial intelligence could generate music that was played by a pianist in
a video recording -- though we were not aiming to replicate Paul Barton because
he is such a virtuoso," Shlizerman said. "We hope that our study
enables novel ways to interact with music. For example, one future application
is that Audeo can be extended to a virtual piano with a camera recording just a
person's hands. Also, by placing a camera on top of a real piano, Audeo could
potentially assist in new ways of teaching students how to play."
Kun Su and Xiulong Liu, both doctoral
students in electrical and computer engineering, are co-authors on this paper.
This research was funded by the Washington Research Foundation Innovation Fund
as well as the applied mathematics and electrical and computer engineering
departments.
https://www.sciencedaily.com/releases/2021/02/210204192543.htm
No comments:
Post a Comment