[ad_1]
Protein researchers speak of the “folding problem”—the challenge of predicting ahead of time what shape a chain will take. Nature solves the folding problem easily, using the ultimate parallel-processing computer: the universe. In the real world, every particle interacts with every other particle simultaneously. But human-built computers, which make most calculations sequentially, struggle to simulate this process. Given a simulated protein—rendered onscreen as a rainbow-colored wad of ribbon, or as a bunch of grapes—a piece of software might attempt to calculate how different folds will affect the protein’s free energy. The idea is to fold the protein in a consistently downhill direction. But finding the steepest path on such complex terrain is tricky. Sometimes it’s not even clear which way is down. A computer might bring the folding to a stop when, in fact, there is further to go—as though the simulated golf ball has become trapped in a divot from which a real one might easily escape. The software must sometimes cheat a little: picking up the ball and moving it, to see if it wants to get rolling again.
The most sophisticated program for modelling protein folding is called Rosetta. Baker and his graduate students started writing it in 1996; it looks like a video game crossed with a programming environment, with images of proteins filling some windows and complicated code scrolling in others. Rosetta is open source, and runs on a variety of platforms. It’s now used by hundreds of academic labs and companies around the world, all of whom contribute to the code, which is millions of lines long. Baker, who is not a top-shelf coder, doubts that any of his own code remains: in the early days, comments left next to his contributions would identify them as “crazy Baker stuff.” Still, Sarel Fleishman said, “David’s lab and David himself have been incredibly dominant in this field. Dominant not in the sense of fending people off—it’s actually the reverse. It’s about openness.”
Protein folding has obvious commercial applications, but Rosetta is mostly free. “One of the good choices early on was that no individual would ever make any money directly from it,” Baker told me. The funds generated from corporate licenses go into a pot guarded by a nonprofit called RosettaCommons; some of the money pays for RosettaCon, an annual summer gathering of protein folders traditionally held in August, in Leavenworth, Washington, a mountain town about two hours away from I.P.D. This year, the pandemic upended tradition, and the meeting was held virtually. Meanwhile, in April, a couple hundred researchers convened an early, online meeting, to discuss COVID-19. “A lot of us have been talking about the idea of feeling called to work on COVID during this time,” Rebecca Alford, who completed her Ph.D. at Johns Hopkins, in June, told me. The fact that so many protein designers use Rosetta has made impromptu collaboration easy. Alford said, “You can ask someone in California or in China, ‘What do I do with this piece of code?’ ”
Protein-folding software has two main components: a “sampling method” and an “energy function.” The sampler tries different starting places for the golf ball; the energy function aims to direct it downhill. From the beginning, Rosetta, drawing on Baker’s lab experiments, was good at both tasks. It successfully predicted protein folds. But it achieved its singular position in the field because of tweaks and additions made, over the years, by the larger community of researchers, which honed the software’s precision and extended its capabilities. “Every new generation of students is motivated to contribute,” Baker said. “They share in the progress and benefits—including a very luxurious, all-expenses meeting and reunion once a year.”
In the nineteen-seventies, the pioneers of protein design worked by building physical models of their amino-acid chains. William DeGrado, a biochemist at the University of California, San Francisco, coined the term “de novo” protein design in the nineteen-eighties; he recalled, “I was told it was going to be impossible quite a bit.” Protein design is a two-way street: you must figure out how to predict a shape from a sequence and also find the right sequence for a desired shape. It’s a give-and-take, with the overarching goal of finding a shape that does something useful, such as binding, antibody-like, to a virus. A protein designer might start by taking natural proteins and tweaking them. She might also use a system of directed evolution, in which large collections of proteins are tested, selected for certain properties, and then mutated, over and over, until the right traits emerge. (Refining this process is what won Arnold her Nobel Prize.)
Thanks to improved computational tools, including Rosetta, and faster methods for making and testing proteins, de-novo design has begun to show real promise. “It’s amazing how much progress has been made, and how it’s just accelerating so rapidly,” DeGrado said. Baker agreed that progress was speeding up. “The fact that we’re spinning out a couple of companies a year is kind of remarkable,” he said. His lab’s work on COVID-19 has convinced him that the grail is almost within reach. “The hope is that the next time there’s an outbreak, within two days, we’ll have models of candidates,” he told me.
Broadly speaking, new advances in protein design have clustered in three main areas. The first is “binding”—the construction of proteins that adhere tightly to biological targets. In May, I spent a Friday night video-chatting with Inna Goreshnik, a research scientist at I.P.D., as she carried out part of an experiment with Longxing Cao, a postdoc. (I.P.D. occupies the top two floors of its building, and is home to around a hundred and thirty scientists, seventy of whom work in Baker’s lab.) Goreshnik stood at a lab bench in a striped sweater and face mask. “This is very stressful,” she said, as she carried out the calculations needed to prepare the samples. “I usually don’t have anyone watching me do math.”
Their target was SARS-CoV-2, the coronavirus that causes COVID-19. Earlier, Cao had identified a vulnerable spot on the virus’s spike protein—a kind of grappling hook on its outer shell which enables it to invade cells. His goal was to design “binder” proteins that would adhere to that particular spot on the spike, thereby disabling its function. Rosetta contained a precise model of the spike; Cao had written scripts that used that model to generate, de novo, binders that might work. It was as though, given the measurements of a hand, Rosetta were designing a glove. The program ended up suggesting nearly a hundred thousand possible binders, most between fifty-five and eighty-eight amino acids long. For a few thousand dollars, Cao hired a biotech company to produce DNA strands—synthetic genes—that could instruct cells to build those binders. He then introduced each synthetic gene, encoding a unique binder, into a different yeast cell, and, once those cells had manufactured the binders, added the viral spikes. To see if the binders had attached to the spikes, he ran the cells past a laser, one by one, looking for subtle signatures in their fluorescence. A few of the binders did pretty well.
This was the process’s first step. In the second, Cao subjected the most promising candidates to “site-saturation mutagenesis”—a directed-evolution technique. He swapped out the first amino acid of each candidate for a different one, creating nineteen alternate versions. He repeated this process for the second amino acid, then the third, and so on. Then he ordered another batch of DNA that could make these mutated proteins, and tested them. Certain single-site mutations worked better than others; he created a third set of proteins, combining the best ones. These proteins were what he and Goreshnik were about to produce. During our video chat, Goreshnik held up two small tubes containing white powder: the dried DNA strands. Cao raised a flask of yeast cells, into which the DNA would go.
For around three hours, Goreshnik mixed the DNA fragments with other chemicals, then ran them through a PCR machine, which multiplied and sewed them together. She purified the results, then multiplied and purified them again. “There’s lots of walking and a lot of pipetting,” she said. Eventually, she showed me a small container: “All that work, and at the end we get just thirty microlitres of liquid in a tube,” she said. Later that night, Cao would introduce the DNA to the yeast cells, which together would make the binding proteins over the course of the next twenty-four hours. Goreshnik and Cao hoped that, in addition to making proteins that bound to SARS-CoV-2, they could refine their process so that more of it could be done with Rosetta. “The final goal is just to order one design, and it works,” Cao said. Ideally, the de-novo protein wouldn’t just bind to its target strongly and specifically—it would do so in exactly the way predicted by the software.
[ad_2]
Source link