DeepMind’s AlphaCode outperforms many human programmers in tricky software challenges
By Matthew Hutson for Science
December 8, 2022 -- Software runs the world. It
controls smartphones, nuclear weapons, and car engines. But there’s a global shortage
of programmers. Wouldn’t it be nice if anyone could explain what they want a
program to do, and a computer could translate that into lines of code?
A new artificial intelligence (AI) system called
AlphaCode is bringing humanity one step closer to
that vision, according to a new study. Researchers say the system—from the
research lab DeepMind, a subsidiary of Alphabet (Google’s parent company)—might
one day assist experienced coders, but probably cannot replace them.
“It’s very impressive,
the performance they’re able to achieve on some pretty challenging problems,”
says Armando Solar-Lezama, head of the computer assisted programming group at
the Massachusetts Institute of Technology.
AlphaCode goes beyond
the previous standard-bearer in AI code writing: Codex, a system released in
2021 by the nonprofit research lab OpenAI. The lab had already developed GPT-3,
a “large language model” that is adept at imitating and interpreting human text
after being trained on billions of words from digital books, Wikipedia
articles, and other pages of internet text. By fine-tuning GPT-3 on more than
100 gigabytes of code from Github, an online software repository, OpenAI came
up with Codex. The software can write code when prompted with an everyday
description of what it’s supposed to do—for instance counting the vowels in a
string of text. But it performs poorly when tasked with tricky problems.
AlphaCode’s creators
focused on solving those difficult problems. Like the Codex researchers, they
started by feeding a large language model many gigabytes of code from GitHub,
just to familiarize it with coding syntax and conventions. Then, they trained
it to translate problem descriptions into code, using thousands of problems
collected from programming competitions. For example, a problem might ask for a
program to determine the number of binary strings (sequences of zeroes and
ones) of length n that don’t have any consecutive zeroes.
When presented with a
fresh problem, AlphaCode generates candidate code solutions (in Python or C++)
and filters out the bad ones. But whereas researchers had previously used
models like Codex to generate tens or hundreds of candidates, DeepMind had
AlphaCode generate up to more than 1 million.
To filter them,
AlphaCode first keeps only the 1% of programs that pass test cases that
accompany problems. To further narrow the field, it clusters the keepers based
on the similarity of their outputs to made-up inputs. Then, it submits programs
from each cluster, one by one, starting with the largest cluster, until it
alights on a successful one or reaches 10 submissions (about the maximum that
humans submit in the competitions). Submitting from different clusters allows
it to test a wide range of programming tactics. That’s the most innovative step
in AlphaCode’s process, says Kevin Ellis, a computer scientist at Cornell
University who works AI coding.
After training,
AlphaCode solved about 34% of assigned problems, DeepMind reports this
week in Science. (On similar benchmarks, Codex achieved
single-digit-percentage success.)
To further test its
prowess, DeepMind entered AlphaCode into online coding competitions. In
contests with at least 5000 participants, the system outperformed 45.7% of
programmers. The researchers also compared its programs with those in its
training database and found it did not duplicate large sections of code or
logic. It generated something new—a creativity that surprised Ellis.
“It continues to be
impressive how well machine-learning methods do when you scale them up,” he
says. The results are “stunning,” adds Wojciech Zaremba, a co-founder of OpenAI
and co-author of their Codex paper.
AI coding might have
applications beyond winning competitions, says Yujia Li, a computer scientist
at DeepMind and paper co-author. It could do software grunt work, freeing up
developers to work at a higher, or more abstract level, or it could help
noncoders create simple programs.
David Choi, another
study author at DeepMind, imagines running the model in reverse: translating
code into explanations of what it’s doing, which could benefit programmers
trying to understand others’ code. “There are a lot more things you can do with
models that understand code in general,” he says.
For now, DeepMind wants
to reduce the system’s errors. Li says even if AlphaCode generates a functional
program, it sometimes makes simple mistakes, such as creating a variable and
not using it.
There are other
problems. AlphaCode requires tens of billions of trillions of operations per
problem—computing power that only the largest tech companies have. And the
problems it solved from the online programming competitions were narrow and
self-contained. But real-world programming often requires managing large code
packages in multiple places, which requires a more holistic understanding of
the software, Solar-Lezama says.
The study also notes
the long-term risk of software that recursively improves itself. Some experts
say such self-improvement could lead to a superintelligent AI that takes over
the world. Although that scenario may seem remote, researchers still want the
field of AI coding to institute guardrails, built-in checks and balances.
“Even if this kind of
technology becomes supersuccessful, you would want to treat it the same way you
treat a programmer within an organization,” Solar-Lezama says. “You never want
an organization where a single programmer could bring the whole organization
down.”
https://www.science.org/content/article/ai-learns-write-computer-code-stunning-advance
No comments:
Post a Comment