The digital dark matter clouding AI

Read time 2 minutes | Monday, 5 June 2023

The Takeaway

Scientists using artificial intelligence technology may be inviting unwanted noise into their genome analyses. Now, CSHL researchers have created a computational correction that will allow them to see through the fog and find genuine DNA features that could signal breakthroughs in health and medicine.

Artificial intelligence has entered our daily lives. First, it was ChatGPT. Now, it’s AI-generated pizza and beer commercials. While we can’t trust AI to be perfect, it turns out that sometimes we can’t trust ourselves with AI either.

Cold Spring Harbor Laboratory (CSHL) Assistant Professor Peter Koo has found that scientists using popular computational tools to interpret AI predictions are picking up too much “noise,” or extra information, when analyzing DNA. And he’s found a way to fix this. Now, with just a couple new lines of code, scientists can get more reliable explanations out of powerful AIs known as deep neural networks. That means they can continue chasing down genuine DNA features. Those features might just signal the next breakthrough in health and medicine. But scientists won’t see the signals if they’re drowned out by too much noise.

So, what causes the meddlesome noise? It’s a mysterious and invisible source like digital “dark matter.” Physicists and astronomers believe most of the universe is filled with dark matter, a material that exerts gravitational effects but that no one has yet seen. Similarly, Koo and his team discovered the data that AI is being trained on lacks critical information, leading to significant blind spots. Even worse, those blind spots get factored in when interpreting AI predictions of DNA function. Koo says:

“The deep neural network is incorporating this random behavior because it learns a function everywhere. But DNA is only in a small subspace of that. And it introduces a lot of noise. And so we show that this problem actually does introduce a lot of noise across a wide variety of prominent AI models.”

The digital dark matter is a result of scientists borrowing computational techniques from computer vision AI. DNA data, unlike images, is confined to a combination of four nucleotide letters: A, C, G, T. But image data in the form of pixels can be long and continuous. In other words, we’re feeding AI an input it doesn’t know how to handle properly.

By applying Koo’s computational correction, scientists can interpret AI’s DNA analyses more accurately. He says:

“We end up seeing sites that become much more crisp and clean, and there is less spurious noise in other regions. One-off nucleotides that are deemed to be very important all of a sudden disappear.”

Koo believes noise disturbance affects more than AI-powered DNA analyzers. He thinks it’s a widespread affliction among computational processes involving similar types of data. Remember, dark matter is everywhere. Thankfully, Koo’s new tool can help bring scientists out of the darkness and into the light.

Written by: Luis Sandoval, Communications Specialist | sandova@cshl.edu | 516-367-6826

Funding

National Institutes of Health, Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory

Citation

Majdandzic, A., et al., “Correcting gradient-based interpretations of deep neural networks for genomics”, Genome Biology, May 9, 2023. DOI: 10.1186/s13059-023-02956-3

The Takeaway

Principal Investigator

Peter Koo

Associate Professor
Cancer Center Member
Ph.D., Yale University, 2015

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Takeaway

The Takeaway

Principal Investigator

Peter Koo

Tags

Contact

Connect with CSHL

The Takeaway

Stay informed

The Takeaway

Principal Investigator

Peter Koo

Tags

DISCOVER: Related stories

Making AI algorithms show their work

AI is helping scientists explain our brain

AI training: A backward cat pic is still a cat pic