A Brief Introduction to DNA Computing

DNA computing is the idea of using chemical reactions on biological molecules to perform computation, rather than silicon and electricity. We often hear about quantum computers, and there is a lot of discussion about whether it will actually work, or crack RSA, stuff like that. In the domain of alternative computers, DNA computers are often overlooked, but they’re easy to understand (none of the quantum weirdness), and have potential to do massively parallel computations efficiently.

I first heard about DNA computers when doing my undergrad research project this term. I won’t bore you with the details, but it has to do with the theoretical aspects of DNA self-assembly which we will see is related to DNA computing.

The study of DNA computing is relatively new: the field was started by Leonard Adleman who published a breakthrough paper in Science in 1994. In this paper, he solved the directed Hamiltonian Path problem on 7 vertices using DNA reactions. This was the first time anything like this had been done. In this article, I will summarize this paper.

Operations on DNA

DNA is a complicated molecule with a lot of interesting properties, but we can view it as a string over a 4 letter alphabet (A, C, G, T). Each string has a Watson-Crick complement, where A is complement to T, and C is complement to G.

Without delving too deep into chemistry, I’ll describe some of the operations we can do with DNA.

1. Synthesis. We can use a machine to create a bunch of single DNA strands of any string we like. The technical term for these is oligonucleotides, but they’re just short DNA pieces. One limitation is we can only make strands of 20-25 nucleotides with current lab techniques.

2. Amplify. Given a test tube with only a few strands of DNA, we can amplify them into millions of strands using a process called polymerase chain reaction (PCR).

3. Annealing. Given a test tube with a lot of single stranded DNA, cooling it will cause complementary strands to attach to each other to form double strands.

4. Sort by length. By passing an electrical field through a solution, we can cause longer DNA strands to move to one side of the solution, a technique called gel electrophoresis. If desired, we can extract only strands of a certain length.

5. Extract pattern. Given a test tube of DNA, we can extract only those that contain a given pattern as a substring. To do this, put the complement of the pattern string into the solution and cause it to anneal. Only strands that contain the pattern will anneal, and the rest can be washed away.

This list is by no means exhaustive, but gives a sample of what operations are possible.

Solving Directed Hamiltonian Path with DNA

The Directed Hamiltonian Path problem asks, given a directed graph, does there exist a path from s to that goes through all the vertices?

For example, in this graph, if s=1 and t=3, then 1->4->2->3 is a directed Hamiltonian path.

This problem is related to the Travelling Salesman Problem, and is particularly interesting because it is NP-complete, so conventional computers can’t solve it efficiently. It would be really nice if DNA could solve it better than normal computers.

Here I’ll describe the process Leonard Adleman performed in 1994. He solved an instance of Directed Hamiltonian Path on 7 vertices, which is obviously trivial, and yet this took him 7 days of laboratory time. Early prototypes often tend to be laughably impractical.

Main Idea: we represent each vertex as a random string of 20 nucleotides, divided into two parts, each of 10 nucleotides. We represent a directed uv-edge by taking the second half of u and the first half of v, and taking the complement of all that.

The idea is that now, a directed path consists of vertex strands interleaved with edge strands, in a brick wall pattern, like this:

When we put all the vertex and edge strands into a test tube, very quickly the solution will anneal (and not just one, but millions of copies of it). However, the test tube also contains all kinds of strands that don’t represent Hamiltonian paths at all. We have to do a tricky sequence of chemical reactions to filter out only the DNA strands representing valid solutions.

Step 1. Keep only paths that start on s and end on t. This is done by filtering only strands that start and end with a given sequence, and this is possible with a variation of PCR using primers.

Step 2. Sort the DNA by length, and only keep the ones that visit exactly n vertices. Since each vertex is encoded by a string of length 20, in our example we would filter for strands of length 80.

Step 3. For each vertex, perform an extract operation to filter only paths that visit this vertex. After doing this n times, we are left with paths that visit every vertex. This is the most time consuming step in the whole process.

Step 4. Any strands remaining at this point correspond to Hamiltonian paths, so we just amplify them with PCR, and detect if any DNA remain in the test tube. If yes, there exists a directed Hamiltonian path from s to t in the graph.

That’s it for the algorithm. Adleman went on to describe the incredible potential of DNA computers. A computer today can do about $10^9$ operations a second, but you can easily have $10^{20}$ DNA molecules in a test tube.

DNA Computing since 1994

Shortly after Adleman’s paper, researchers applied similar ideas to solve difficult problems, like 3-SAT, the maximal clique problem, the shortest common superstring problem, even breaking DES. Usually it was difficult to implement these papers in the lab, for example, Richard Lipton proposed a procedure to solve 3-SAT in 1995, but only in 2002 did Adleman solve an instance of 3-SAT with 20 variables in the lab.

On the theoretical side, there was much progress in formalizing rules and trying to construct “universal” DNA computers. Several different models of DNA computing were proven Turing complete (actually my research adviser Lila Kari came up with one of them). It has been difficult to build these computers, because some of the enzymes required for some operations don’t exist yet.

There has been progress on the practical side as well. Since Adleman, researchers have looked into other models of using biological molecules for computation, like solving 3-SAT with hairpin formation or solving the knight’s tour problem with RNA instead of DNA.

In 2006, a simplified DNA computer was built that had the ability to detect if a combination of enzymes were present, and only release medicine if they are all present (indicating that the patient had a disease). In 2013, researchers built the “transcriptor”: DNA versions of logic gates. One reason these are important because transcriptors are reusable, whereas previously all reagents have to be thrown away after each operation.

Current Limitations of DNA Computing

Clearly, the method I described is very time consuming and labor intensive. Each operation takes hours of lab work. This is not really a fundamental problem, in the future we might use robots to automate these lab operations.

The biggest barrier to solving large instances is that right now, we can’t synthesize arbitrary long strands of DNA (oligonucleotides). We can synthesize strands of 20-25 nucleotides with no problem, but as this number increases, the yield quickly becomes too low to be practical. The longest we can synthesize with current technology is a strand of length about 60. (Edit: Technology has improved since the papers I was looking at were written. According to my adviser, we can do 100 to a few thousand base pairs now in 2016).

Why do we need to synthesize long oligonucleotides? To represent larger problem instances, each vertex needs a unique encoding. If the encoding is too short, there will be a high probability of random sections overlapping by accident when they’re not supposed to, thereby ruining the experiment.

One promising direction of research is DNA self-assembly, so instead of painstakingly building oligonucleotides one base at a time, we put short strands in a test tube and let them self-assemble into the structures we want. My URA project this term deals with what kind of patterns can be constructed with self-assembly.

Today, if you need to solve a Hamiltonian path problem, like finding the optimal way to play Pokemon Go, you would still use a conventional computer. But don’t forget that within 100 years, computers have turned from impractical contraptions into devices that everyone carries in their pockets. I’ll bet that DNA computers will do the same.

References

1. Adleman, Leonard. “Molecular computation of solutions to combinatorial problems“. Science, volume 266, 1994.
2. Lipton, Richard. “DNA solution of hard computational problems“. Science, volume 268, 1995.