Biotech Labs Bank on Generative AI to Design New Protein Structures

OpenAI’s DALL.E 2 has been making it big with text-to-image models that easily generate pictures from textual descriptions. Earlier this week, two biotech labs – Generate Biomedicines and David Baker’s Group – relied on generative AI, particularly diffusion models, to come up with new protein structures and, eventually, better drugs.

Boston-based therapeutics company Generate Biomedicines announced a program called Chroma which, according to the company, is the “DALL-E 2 of biology.” Similarly, biologist David Baker’s team from the University of Washington has also come up with RoseTTAFoldDiffusion. The model can build accurate designs for new proteins that can be brought to life in the lab.

Why it matters? The AI ​​generators can be used to create designs for proteins with particular characteristics, such as structure, size, or function, which enables the development of novel proteins that can perform specific tasks on demand. Once developed, this can be used to create/identify drugs, which help in regulating the basic health processes in living beings; for example, when we fall sick, proteins help us get better. The aim of protein synthesis by AI generators is to help biologists extend the ingredient list of natural proteins and make new medications on demand.

Although technology-backed protein designs are not new, they have mostly been outdated and time-consuming in working with large, complicated proteins, which are important in curing difficult diseases. Chroma and Baker’s method are the very first full-fledged programs that can build precise designs for a wide variety of proteins.

How did they do it? In Chroma, the text noise is introduced by separating the chains of amino acids that make up proteins. Chroma assembles these chains into a protein from a random group of them. For RoseTTAFold Diffusion, a second neural network is used to predict protein structure and provides information about how the parts of a protein fit together, and then it uses this information to direct the whole generating process.

Baker’s Group and Generate Biomedicines have created proteins with different degrees of symmetry, such as circular, triangular, or hexagonal proteins. Generate Biomedicines went a step ahead and designed proteins in the shapes of the 26 letters of the Latin alphabet and the numbers 0 to 10. Both groups are capable of creating new proteins and matching them to preexisting structures.

To test whether Chroma produced designs that could be made into real medicines, Generate Biomedicines took the sequences for some of its designs—the amino acid strings that make up the protein—and ran them through another AI program. The result showed that 55% of them would be predicted to fold into the structure generated by chroma, which suggests that these are designs for viable proteins. Similarly, some of RoseTTAFold Diffusion’s designs were developed in the lab by Baker’s team. This created a novel protein that binds to the parathyroid hormone, which regulates blood calcium levels.

In 2021, Chinese biotech company Helixon developed Omegafold, which joined DeepMind’s AlphaFold, RoseTTAFold and ESMFold by Meta AI. So, the question remains why did Generate Biomedicines choose to implement RoseTTTAFold instead of the other open-sourced protein-prediction models that have better accurate results.

Earlier this year, Bengaluru-based algorithmic biologist Manoj Gopalakrishnan built Tapestry, a single-round quantitative method for extensive molecular testing that offers significant time and cost savings compared to conventional RTPCR tests.

It is really interesting to see how this evolves in the near future, where life sciences and biotech companies are experimenting with protein predicting models alongside the image generation tools such as DALL.E 2 to develop new protein structures, in turn helping develop better drugs and medical solutions.

In an interaction with AIM, chief medical scientist at Microsoft Research, Junaid Bajwa, said that the journey from what the initial discovery is, to translating into real molecules, and taking those molecules into the real world would be critical.

While major big tech companies are focusing on developing protein prediction models, including the likes of Meta and Google-backed DeepMind, Microsoft seems to be more focused on the implementation side of things, where it has partnered with Novartis, Novo Nordisk and others to apply. to the real-world scientific research advancements, focusing on the impact side of things.

Leave a Comment