Scientists have developed artificial intelligence software capable of creating proteins that can be useful as vaccines, cancer treatments, or even as tools to remove carbon pollution from the air.
This research, reported today in the journal Science, was conducted by the University of Washington School of Medicine and Harvard University. The article is titled “Scaffolding Functional Sites of Proteins Using Deep Learning”.
“The proteins we find in nature are amazing molecules, but engineered proteins can do so much more,” said lead author David Baker, HHMI researcher and professor of biochemistry at UW Medicine. “In this work, we show that machine learning can be used to design proteins with a wide variety of functions.”
For decades, scientists have used computers to try to engineer proteins. Some proteins, such as antibodies and synthetic binding proteins, have been adapted into drugs to fight COVID-19. Others, like enzymes, help in industrial manufacturing. But a single protein molecule often contains thousands of bonded atoms; even with specialized scientific software, they are difficult to study and design.
Inspired by how machine learning algorithms can generate stories or even images from prompts, the team set out to create similar software to design new proteins. “The idea is the same: neural networks can be trained to see patterns in data. Once trained, you can give them a prompt and see if they can generate an elegant solution. Often the results are compelling , even beautiful,” the official said. author Joseph Watson, postdoctoral researcher at UW Medicine.
The team trained several neural networks using information from the Protein Database, which is a public repository of hundreds of thousands of protein structures from all kingdoms of life. The resulting neural networks surprised even the scientists who created them.
The team developed two approaches to design proteins with new functions. The first, dubbed “hallucination”, is akin to DALL-E or other generative AI tools that produce new output based on simple prompts. The second, called “inpainting”, is analogous to the autocomplete feature found in modern search bars and email clients.
“Most people can find new pictures of cats or write a paragraph from a prompt if asked, but with protein design, the human brain can’t do what computers can do now,” said lead author Jue Wang, a postdoctoral researcher at UW Medicine. “Humans just can’t imagine what the solution might look like, but we have machines in place that do.”
To explain how neural networks ‘hallucinate’ a new protein, the team likens it to how they might write a book: “You start with a random assortment of words – total gibberish. Then you impose a requirement like than the one in the opening paragraph. , it has to be a dark, stormy night. Then the computer will change the words one at a time and ask itself, “Does my story make more sense?” If so, it keeps the changes until a full story is written,” says Wang.
Books and proteins can be understood as long sequences of letters. In the case of proteins, each letter corresponds to a chemical building block called an amino acid. Starting with a random string of amino acids, the software mutates the sequence over and over until a final sequence that encodes the desired function is generated. These final amino acid sequences code for proteins that can then be made and studied in the laboratory.
The team also showed that neural networks can fill in missing pieces of a protein structure in just seconds. Such software could help in the development of new drugs.
“With AutoComplete, or ‘Protein Inpainting’, we start with the key features we want to see in a new protein and then let the software do the rest. These features can be known binding motifs or even enzymatic active sites,” says Watson.
Lab tests revealed that many of the proteins generated by hallucination and inpainting worked as expected. This included new proteins capable of binding metals as well as those that bind to the cancer receptor PD-1.
The new neural networks can generate several different types of proteins in as little as a second. Some include potential vaccines against the deadly respiratory syncytial virus, or RSV.
All vaccines work by presenting part of a pathogen to the immune system. Scientists often know which piece would work best, but creating a vaccine that achieves the desired molecular shape can be difficult. Using the new neural networks, the team tricked a computer into creating new proteins that included the necessary pathogenic fragment as part of their final structure. The software was free to create all the supporting structures around the key fragment, yielding several potential vaccines with various molecular shapes.
In a lab test, the team found that known RSV antibodies stuck to three of their hallucinated proteins. This confirms that the new proteins have adopted their intended shapes and suggests that they could be viable vaccine candidates that could prompt the body to generate its own highly specific antibodies. Further testing, including on animals, is still needed.
“I started working on the vaccine stuff just as a way to test our new methods, but in the middle of working on the project my two-year-old son got infected with RSV and spent an evening in the ER to having his lungs made me realize that even the ‘testing’ issues we were working on were actually quite significant,” Wang said.
“These are very powerful new approaches, but there’s still a lot of room for improvement,” said Baker, who received the 2021 Breakthrough Award in Life Sciences. “Designing high-activity enzymes, for example, is always very difficult. But each month, our methods continue to improve! Deep learning has transformed protein structure prediction over the past two years, we are now in the midst of a similar transformation of protein design. “
This project was led by Jue Wang, Doug Tischer, and Joseph L. Watson, who are postdoctoral fellows at UW Medicine, and Sidney Lisanza and David Juergens, who are graduate students at UW Medicine. Lead authors include Sergey Ovchinnikov, John Harvard Scholar Emeritus at Harvard University, and David Baker, professor of biochemistry at UW Medicine.