Applications include fields from autonomous robotics to AI training
From: North Carolina State University
June
1, 2021 -- At issue is a type of AI task called conditional image generation,
in which AI systems create images that meet a specific set of conditions. For
example, a system could be trained to create original images of cats or dogs,
depending on which animal the user requested. More recent techniques have built
on this to incorporate conditions regarding an image layout. This allows users
to specify which types of objects they want to appear in particular places on
the screen. For example, the sky might go in one box, a tree might be in
another box, a stream might be in a separate box, and so on.
The
new work builds on those techniques to give users
more control over the resulting images, and to retain certain characteristics
across a series of images.
"Our approach is highly
reconfigurable," says Tianfu Wu, co-author of a paper on the work and an
assistant professor of computer engineering at NC State. "Like previous
approaches, ours allows users to have the system generate an image based on a
specific set of conditions. But ours also allows you to retain that image and
add to it. For example, users could have the AI create a mountain scene. The
users could then have the system add skiers to that scene."
In addition, the new approach allows
users to have the AI manipulate specific elements so that they are identifiably
the same, but have moved or changed in some way. For example, the AI might
create a series of images showing skiers turn toward the viewer as they move
across the landscape.
"One application for this would be
to help autonomous robots 'imagine' what the end result might look like before
they begin a given task," Wu says. "You could also use the system to
generate images for AI training. So, instead of compiling images from external
sources, you could use this system to create images for training other AI
systems."
The researchers tested their new
approach using the COCO-Stuff dataset and the Visual Genome dataset. Based on
standard measures of image quality, the new approach outperformed the previous
state-of-the-art image creation techniques.
"Our next step is to see if we can
extend this work to video and three-dimensional images," Wu says.
Training for the new approach requires a
fair amount of computational power; the researchers used a 4-GPU workstation.
However, deploying the system is less computationally expensive.
"We found that one GPU gives you
almost real-time speed," Wu says.
"In addition to our paper, we've
made our source code for this approach available on GitHub. That said, we're
always open to collaborating with industry partners."
https://www.sciencedaily.com/releases/2021/06/210601135751.htm
No comments:
Post a Comment