Google Health and Academics Battle It Out
By Shelly Fan for Singularity Hub
Oct
20, 2020 – Machine learning is taking medical diagnosis by storm. From eye
disease, breast and other cancers, to more amorphous neurological
disorders, AI is routinely matching physician performance, if not beating them
outright.
Yet how much can we take those results
at face value? When it comes to life and death decisions, when can we put our
full trust in enigmatic algorithms—“black boxes” that even their creators
cannot fully explain or understand? The problem gets more complex as medical AI
crosses multiple disciplines and developers, including both academic and
industry powerhouses such as Google, Amazon, or Apple, with disparate
incentives.
This week, the two sides battled it out
in a heated duel in one of the most prestigious science journals, Nature.
On one side are prominent AI researchers at the Princess Margaret Cancer
Centre, University of Toronto, Stanford University, Johns Hopkins, Harvard,
MIT, and others. On the other side is the titan Google Health.
The trigger was an explosive study by
Google Health for breast cancer screening, published in January this year. The
study claimed to have developed an AI system that vastly outperformed
radiologists for diagnosing breast cancer, and can be generalized to
populations beyond those used for training—a holy grail of sorts that’s
incredibly difficult due to the lack of large medical imaging datasets. The
study made waves across the media landscape, and created a buzz in the public
sphere for medical AI’s “coming of age.”
The problem, the academics argued, is
that the study lacked sufficient descriptions of the code and model for others
to replicate. In other words, we can only trust the study at its word—something
that’s just not done in scientific research. Google Health, in turn, penned a
polite, nuanced but assertive rebuttal arguing for their need to protect
patient information and prevent the AI from malicious attacks.
Academic discourse like these form the
seat of science, and may seem incredibly nerdy and outdated—especially because
rather than online channels, the two sides resorted to a centuries-old
pen-and-paper discussion. By doing so, however, they elevated a necessary
debate to a broad worldwide audience, each side landing solid punches that, in
turn, could lay the basis of a framework for trust and transparency in medical
AI—to the benefit of all. Now if they could only rap their arguments in the
vein of Hamilton and Jefferson’s Cabinet Battles in Hamilton.
Academics, You Have the Floor
It’s easy to see where the academic’s
arguments come from. Science is often painted as a holy endeavor embodying
objectivity and truth. But as any discipline touched by people, it’s prone to
errors, poor designs, unintentional biases or—in very small numbers—conscious
manipulation to skew the results. Because of this, when publishing results,
scientists carefully describe their methodology so others can replicate the
findings. If a conclusion, say a vaccine that protects against Covid-19,
happens in nearly every lab regardless of the scientist, the material, or the
subjects, then we have stronger proof that the vaccine actually works. If not,
it means that the initial study may be wrong—and scientists can then delineate
why and move on. Replication is critical to healthy scientific evolution.
But AI research is shredding the dogma.
“In computational research, it’s not yet
a widespread criterion for the details of an AI study to be fully accessible.
This is detrimental to our progress,” said author Dr. Benjamin Haibe-Kains at
Princess Margaret Cancer Centre. For example, nuances in computer code or
training samples and parameters could dramatically change training and
evaluation of results—aspects that can’t be easily described using text alone,
as is the norm. The consequence, said the team, is that it makes trying to
verify the complex computational pipeline “not possible.” (For academics,
that’s the equivalent of gloves off.)
Although the academics took Google
Health’s breast cancer study as an example, they acknowledged the problem is
far more widespread. By examining the shortfalls of the Google Health study in
terms of transparency, the team said, “we provide potential solutions with
implications for the broader field.” It’s not an impossible problem. Online
depositories such as GitHub, Bitbucket, and others already allow the sharing of
code. Others allow sharing of deep learning models, such as ModelHub.ai, with
support for frameworks such as TensorFlow, which was used by the Google Health
team.
Ins-and-outs details of AI models aside,
there’s also the question of sharing data that those models were trained from.
It’s a particularly thorny problem for medical AI, because much of those
datasets are under license and sharing can generate privacy concerns. Yet it’s
not unheard of. For example, genomics has leveraged patient datasets for
decades—essentially each person’s genetic “base code”—and extensive guidelines
exist to protect patient privacy. If you’ve ever used a 23andMe ancestry spit
kit and provided consent for your data to be used for large genomic studies,
you’ve benefited from those guidelines. Setting up something similar for
medical AI isn’t impossible.
In the end, a higher bar for
transparency for medical AI will benefit the entire field, including doctors
and patients. “In addition to improving accessibility and transparency, such
resources can considerably accelerate model development, validation and transition
into production and clinical Implementation,” the authors wrote.
Google Health, Your Response
Led by Dr. Scott McKinney, Google Health
did not mince words. Their general argument: “No doubt the commenters are
motivated by protecting future patients as much as scientific principle. We
share that sentiment.” But under current regulatory frameworks, our hands are
tied when it comes to open sharing.
For example, when it comes to releasing
a version of their model for others to test on different sets of medical
images, the team said they simply can’t because their AI system may be
classified as “medical device software,” which is subject to oversight.
Unrestricted release may lead to liability issues that place patients,
providers, and developers at risk.
As for sharing datasets, Google Health
argued that their largest source used is available online with application to
access (with just a hint of sass that their organization helped to fund the
resource). Other datasets, due to ethical boards, simply cannot be shared.
Finally, the team argued that sharing a
model’s “learned parameters,”—that is, the bread-and-butter of how they’re
constructed—can inadvertently expose the training dataset and model to malicious
attack or misuse. It’s certainly a concern: you may have previously heard of GPT-3,
the OpenAI algorithm that writes unnervingly like a human—enough to fool
Redditors for a week. But it would take a really sick individual to bastardize
a breast cancer detection tool for some twisted gratification.
The Room Where It Happens
The academic-Google Health debate is
just a small corner of a worldwide reckoning for medical AI. In September 2011,
an international consortium of medical experts introduced a set of official
standards for clinical trials that deploy AI in medicine, with the goal of
plucking out AI snake oil from trustworthy algorithms. One point may sound
familiar: how reliably a medical AI functions in the real word, away from
favorable training sets or conditions in the lab. The guidelines represent some
of the first when it comes to medical AI, but won’t be the last.
If this all seems abstract and high up
in the ivory tower, think of it another way: you’re now witnessing the room
where it happens. By publishing negotiations and discourse publicly, AI
developers are inviting additional stakeholders to join in on the conversation.
Like self-driving cars, medical AI seems like an inevitability. The question is
how to judge and deploy it in a safe, equal manner—while inviting a hefty dose
of public trust.
No comments:
Post a Comment