Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text.
It is the third-generation language
prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI,
a San Francisco-based artificial intelligence research laboratory. GPT-3's full version has a capacity of 175
billion machine learning parameters. GPT-3, which was introduced in May 2020, and
was in beta testing as of July 2020, is part of a trend in natural language
processing (NLP) systems of pre-trained language representations.
Before the release of GPT-3, the largest
language model was Microsoft's Turing NLG, introduced in February 2020, with a
capacity of 17 billion parameters—less than a tenth of GPT-3's.
The quality of the text generated by
GPT-3 is so high that it can be difficult to determine whether or not it was
written by a human, which has both benefits and risks. Thirty-one OpenAI researchers and engineers
presented the original May 28, 2020 paper introducing GPT-3. In their paper,
they warned of GPT-3's potential dangers and called for research to mitigate
risk. David Chalmers, an Australian
philosopher, described GPT-3 as "one of the most interesting and important
AI systems ever produced."
Microsoft announced on September 22,
2020 that it had licensed "exclusive" use of GPT-3; others can still
use the public API to receive output, but only Microsoft has access to GPT-3’s
underlying model.
Background
According to The Economist,
improved algorithms, powerful computers, and an increase in digitized data have
fueled a revolution in machine learning, with new techniques in the 2010s
resulting in "rapid improvements in tasks" including manipulating
language. Software models are trained to
learn by using thousands or millions of examples in a "structure ...
loosely based on the neural architecture of the brain". One architecture used in natural language
processing (NLP) is a neural network based on a deep learning model that was
first introduced in 2017—the Transformer.
GPT-n models are based on this Transformer-based deep learning neural
network architecture. There are a number of NLP systems capable of processing,
mining, organizing, connecting, contrasting, understanding and generating
answers to questions.
On June 11, 2018, OpenAI researchers and
engineers posted their original paper on generative models—language
models—artificial intelligence systems—that could be pre-trained with an
enormous and diverse corpus of text via datasets, in a process they called generative
pre-training (GP). The authors described
how language understanding performances in natural language processing (NLP)
were improved in GPT-n through a process of "generative pre-training of a
language model on a diverse corpus of unlabeled text, followed by discriminative
fine-tuning on each specific task." This eliminated the need for human
supervision and for time-intensive hand-labeling.
In February 2020, Microsoft introduced
its Turing Natural Language Generation (T-NLG), which was then the
"largest language model ever published at 17 billion parameters." It performed better than any other language
model at a variety of tasks which included summarizing texts and answering
questions.
Capabilities
On May 28, 2020, an arXiv preprint by a
group of 31 engineers and researchers at OpenAI described the development of
GPT-3, a third-generation "state-of-the-art language model". The team increased the capacity of GPT-3 by
over two orders of magnitude from that of its predecessor, GPT-2, making GPT-3
the largest non-sparse language model to date.
Because GPT-3 is structurally similar to its predecessors, its higher
level of accuracy is attributed to its increased capacity and higher number of
parameters. GPT-3's capacity is ten
times larger than that of Microsoft's Turing NLG, the next largest NLP model.
Sixty percent of the weighted
pre-training dataset for GPT-3 comes from a filtered version of Common Crawl consisting
of 410 billion byte-pair-encoded tokens.
Other sources are 19 billion tokens from WebText2 representing 22% of
the weighted total, 12 billion tokens from Books1 representing 8%, 55 billion
tokens from Books2 representing 8%, and 3 billion tokens from Wikipedia
representing 3%. GPT-3 was trained on
hundreds of billions of words and is capable of coding in CSS, JSX, Python,
among others. Since GPT-3's training
data was all-encompassing, it does not require further training for distinct
language tasks. The training data
contains occasional toxic language and GPT-3 occasionally generates toxic
language as a result of mimicking its training data. A study from the
University of Washington found that GPT-3 produced toxic language at a toxicity
level comparable to the similar natural language processing models of GPT-2 and
CTRL. GPT-3 produced less toxic language compared to its predecessor model,
GPT-1, although it produced both more generations and a higher toxicity of
toxic language compared to CTRL Wiki, a language model trained entirely on
Wikipedia data.
On June 11, 2020, OpenAI announced that
users could request access to its user-friendly GPT-3 API—a "machine
learning toolset"—to help OpenAI "explore the strengths and
limits" of this new technology. The
invitation described how this API had a general-purpose "text in, text
out" interface that can complete almost "any English language
task", instead of the usual single use-case. According to one user, who had access to a
private early release of the OpenAI GPT-3 API, GPT-3 was "eerily
good" at writing "amazingly coherent text" with only a few
simple prompts. In an initial experiment
80 US subjects were asked to judge if short ~200 word articles were written by
humans or GPT-3. The participants judged incorrectly 48% of the time, doing
only slightly better than random guessing.
Because GPT-3 can "generate news
articles which human evaluators have difficulty distinguishing from articles
written by humans," GPT-3 has the "potential to advance both the
beneficial and harmful applications of language models." In their May 28, 2020 paper, the researchers
described in detail the potential "harmful effects of GPT-3" which
include "misinformation, spam, phishing, abuse of legal and governmental
processes, fraudulent academic essay writing and social engineering pretexting". The authors draw attention to these dangers
to call for research on risk mitigation.
GPT-3 is capable of performing zero-shot,
few-shot and one-shot learning.
Controversy
GPT-3's builder, OpenAI, was initially
founded as a non-profit in 2015. In
2019, OpenAI did not publicly release GPT-3's precursor model, breaking from
OpenAI's previous open-source practices, citing concerns that the model would
perpetuate fake news. OpenAI eventually released a version of GPT-2 that was 8%
of the original model's size. In the
same year, OpenAI restructured to be a for-profit company. In 2020, Microsoft announced the company had
exclusive licensing of GPT-3 for Microsoft's products and services following a
multi-billion dollar investment in OpenAI. The agreement permits OpenAI to
offer a public-facing API such that users can send text to GPT-3 to receive the
model's output, but only Microsoft will have access to the GPT-3's source code.
Large language models, such as GPT-3,
have come under criticism from Google's AI ethics researchers for the
environmental impact of training and storing the models, detailed in a paper
co-authored by Timnit Gebru and Emily M. Bender in 2021.
More (including reviews) at: https://en.wikipedia.org/wiki/GPT-3
No comments:
Post a Comment