Are programming competitions a good use of time?

10 minutes remaining in the contest, but you’re still a few points short of advancing. Armed with your mighty coding powers, the first three problems fall quickly, but problem 4 is proving a tough nut to crack. After four incorrect attempts, your time is running short, and you’re searching desperately for an off-by-one error, an edge case you haven’t considered.

You read the problem statement one more time, and at last, you find it. An integer overflow bug. With a wide grin, you fix the bug with a few quick keystrokes. You upload the source code file…

1

Accepted! You sit back, feeling both relieved and elated. With the addition of 25 points, your advancement to the next round is now guaranteed.

It’s not hard to see why programming contests are so appealing — this is programming distilled to its essence. No need to figure out how to integrate an unfamiliar API, or refactor code to be unit testable, or make sense of vague business requirements. In competitive programming, each problem has a self-contained, algorithmic solution, and an automated judge instantly determines if it’s correct or not.

2Above: Programming contests contain the same types of problems as technical interviews at companies like Google.

Competitive programming promises even more glory. Win enough contests, and you get an interview at Facebook or Google, where they ask you… you guessed it… more algorithm coding questions!

By doing programming contests, you gain an intimate understanding of data structures and algorithms and their complexities. While your colleagues vaguely know the difference between a depth-first versus a breadst-first search, you develop a much deeper intuition. You will never forget that one contest where you used DFS instead of BFS, causing your solution to time out.

Unlike the real world, competitive programming offers an arena of pure meritocracy. As long as you solve difficult problems fast, you will surely rise through the ranks. In the Google Code Jam, thousands of programmers try their luck in the qualifying round, but this number is reduced to 3000 by Round 2, and 500 by Round 3. Then for the grand finale, the top 25 elite coders are flown in to compete on-site in the world finals.

3Above: 25 of the world’s best compete in the Google Code Jam world finals.

I used to look up in awe at red coders (the highest rated users have red usernames). By the time I solved the first problem, they would have not only solved it in 10 minutes, but also solved 2-3 even harder ones. What was it like to think at that level, to possess that much coding wizardry?

And for some time, I strove to be like them. I studied my dynamic programming and graph algorithms in my spare time, and practiced on SPOJ and Hackerrank and Codeforces. I competed in my university’s ACM tryouts, and three times failed to qualify for the ACM team. So it goes.

In the last few years, I got to talk to a lot of competitive programmers, most of whom were far better than myself. Perhaps I was searching for some magical quality that gave them coding superpowers, but none was to be found. Instead, the key to a high rating was simply many years of dedication and hard work.

It’s illuminating to read the answers on this Quora question: “How does it feel to finally be red at TopCoder after years of hard work?” The short answer: nothing much really.

4Above: Rating graph of Codeforces user netman. Getting to red takes years of practice.

Given the amount of time it takes to master competitive programming, one naturally wonders: is this really a good use of time? In a contest, you are ultimately solving problems that other people have solved already, so nothing new is being produced. Although solving a contest problem is satisfying, I find it a lot more rewarding to build projects or apps with my novel ideas.

Recently, Google found that being good at programming competitions is negatively correlated to being good at software engineering.

In the video, Peter Norvig notes that competitive programmers are “used to going really fast, cranking the answer out and moving to the next thing, but you do better if you’re more reflective and go slowly and get things right”.

Ironically, the same thing that makes programming contests so attractive is its own downfall. Contests focus on data structures and algorithms, which are just a small part of software engineering. Other skills like UI design, databases, network architecture, distributed systems, etc, are not touched in programming contests.

Even if you only look at algorithmic problems, competitive programming is still not representative of reality. Due to limitations in automated judging, contest problems are limited to exact, deterministic algorithms that have a single correct answer. This rules out entire classes of randomized and approximate algorithms. Algorithms now rely more and more on data and machine learning, and less on combinatorial approaches, which further renders competitive programming less relevant.

Now, are programming contests useful? Yes, but only up to a point. Solving contest problems is an excellent way to familiarize yourself with a programming language and its data structures, as well as get better at converting procedural ideas to code. These are very useful skills for a coding interview. However, even the most difficult Facebook/Google interview questions are maybe around a Codeforces Div2 C (or Div1 A) difficulty, which is a long way from the hardest contest problems.

5Above: Beyond a certain point, skills learned in programming contests are only useful for programming contests.

I would put the inflection point at about 1700 Codeforces rating (enough to solve Div2 A and B consistently). Beyond that, you continue to improve, but be aware that you’ll be studying things solely for contests that have little relevance anywhere else, for example, Fenwick trees, max flow algorithms, bitmask DP, and other increasingly obscure topics.

So far, I’ve been fairly critical of competitive programming, but rather than deride it as a waste of time, I think it’s best to view it as a sport. Like soccer or basketball, the function of sports in society is to inspire excellence, and above all, to have fun. Terry Tao wrote a similar article on math competitions; I’d agree with him.

My advice to you: do programming contests if you find them fun and you enjoy tackling hard problems. But don’t take it too seriously: it takes years of dedicated effort to get extremely good at it, dedication that very few people have. Unless you’re at or aiming to compete at the World Final level, you definitely shouldn’t be spending a significant amount of time getting better at contests. Your time is better spent studying machine learning, or statistics, or compilers, or distributed systems, or just about anything else in computer science.

I have accounts on Hackerrank and Codeforces if you want to follow me. As per my own advice, I’m no longer actively trying to get better, but I still do problems for fun occasionally.

Edit: This article has some interesting discussion on Reddit and Codeforces.

Side Project: Conversation Player

Here’s a side project I’ve been working on. It’s a widget that plays back conversations in real time. Here’s a video demo of it:

The idea is to share text message snippets in a more digestible manner. The usual way to distribute a chat log to a friend would to send them a transcript of the messages, along with timestamps and all. The information is there, but it’s mentally taxing to read through it, and it’s especially difficult to make sense of the timestamps.

We figured it would be much easier to watch the conversation unfold in real time. This way, you see the conversation one message at a time, instead of all of it at once.

2

In the future, this will be a widget that you can embed in a blog or website. The reader can toggle the playback speed and use the slider to jump anywhere in the conversation.

We also built a set of utilities to import conversations from Facebook Messenger and Skype into a JSON format that our app can understand. Right now, these utilities are a bit clumsy to work with, so we’re holding back on releasing the project.

Conversation Player is built using React. It’s my first time using React, so it was good coding practice. I also had to learn the whole modern Javascript setup with babel, browserify, webpack, and so on. Everything is buzzword compliant now. Yay!

Blog Styling Update

It’s been over 6 years since I started this blog, and today I felt like it was time for a styling update. It’s not that anything is broken, rather, the internet is constantly evolving, so that a even a good website will inevitably become obsolete in 6 years.

To illustrate, here’s the old style, using the Contempt theme:

ss_old

The problem is over half of the screen space is taken up by the gray bars at the side. The content has a fixed width of 800px or so, which is great for the 1280×1024 displays that were common in 2010, but looks pretty terrible on the larger displays today.

At first, I tried to tweak the CSS for the theme, but you need to get a premium subscription on WordPress to change the CSS (and it didn’t look that good anyway). So instead, I browsed through a bunch of themes and switched everything to the Hew theme:

blog-new.png

This is more in line with the trend for blogs nowadays, with minimal clutter and text in large font in the center of the page. The widgets, which once occupied a third of the vertical space, are now hidden behind an options menu.

It’s a cosmetic change, but hopefully it will motivate me to blog more.

How to succeed in your first tech internship

Congratulations, you’ve just landed your first software engineer internship! You’ve passed a round or two of interviews, signed an offer letter, and you’re slated to start next month. What now? You might be a bit excited, a bit apprehensive, wondering what the startup life is like, are you even smart enough to do the work they give you…

I felt all these things when I started my first internship three years ago. Now, I’ve completed four internships and I’m halfway through my fifth one; I’m sort of a veteran intern by now. In these five internships, I’ve learned a good deal about what it takes to succeed in an internship, things that are not obvious to those just starting out. Hopefully by sharing this, others can avoid some of the mistakes I made.

Your first week at [startup]

Chances are that you’ve coded in assignments for schoolwork, and maybe you’ve coded a few side projects for fun. Work is a bit different: you’re working with a massive codebase that you didn’t write, and probably no single engineer in the company understands it all. Facing a codebase of this complexity, you might feel overwhelmed, struggling to find the right file to start. You feel uneasy that a small change is taking you hours, afraid that your boss thinks you’re underperforming.

Relax, you’re doing fine. If you got the job, it means they have faith in your abilities to learn and to succeed. I’ve talked to hundreds of Waterloo interns, and I’ve never heard of anyone getting dismissed for underperforming. The first few weeks will be rough as you come to terms with the codebase and technology stack, but trust me, it gets much, much easier afterwards.

Asking for help

As an intern, you’re not expected to know everything, and often you will be asking for help from more experienced, full-time engineers.

Before asking for help, you should spend a minute or so searching Google, or Stack Overflow, or the company wiki. Most general questions (not relating to company specific code) can be answered with Google, and you save everyone’s time this way.

When you do ask for help, be aware that they might be working on a completely different project, so they don’t have the same mental context as you. Rather than jump straight into the intricate technical details of your problem, you should describe at a high level what you’re trying to accomplish, and what you tried, and only then delve into the exact technical details.

An example of a poorly phrased question would be: “hey, how do I invalidate a FooBarWindow object if its parent is not visible?” You’re likely to get some confused stares — this might make perfect sense to you, but they’re wondering what is FooBarWindow and why are you trying to invalidate it at all.

A better way to phrase it would be something like: “hey, I’m working on X feature, and I’m encountering a problem where the buttons stop working after you press the back button. After looking a bit, I discovered my component should have been invalidated when its parent is no longer visible, but that’s not happening…” This time, you’ve done a much better job of describing your problem.

It’s always helpful to take notes, so you never ask the same question again. How do you commit your code to Git? How do you deploy the app to stage? If you don’t write it down, you’re going to forget.

At the start, you’re going to be asking 5 questions an hour, which is okay. Soon you will find yourself needing to ask less and less, and eventually you’ll only ask a handful per day.

Taking charge of your own learning

Like it or not, software engineering is a rapidly shifting field, where a new Javascript framework comes out every six months. You have to be continuously learning things, or your skills will become obsolete. Learning is even more important when you’re an intern, still learning the ropes. Fortunately, a tech internship is a great opportunity to learn quickly.

Not all software engineers are equal — at some point, you get to choose what you want to do: frontend, backend, or full stack? Web, iOS, or Android? Become an expert in Django or Ruby on Rails? Depending on the company, you often get considerable say on what team you’re on, and what project you work on within your team. Use this as an opportunity to get paid to learn new, interesting stuff!

Good technologies to learn should satisfy two criteria: it should be something you’re interested in, and it should also be widely used in the industry. That is to say, it’s more useful to know a popular web framework than an internal company-specific framework that does the same thing.

When you get to pick what project to take next, it might be tempting to pick something familiar, where you already know how to do everything. But you learn a lot more by working on something new; in my experience, employers have always been accommodating to my desire to work on a variety of different things.

You will overhear people talk passionately, with phrases like, “oh, it’s running Nginx inside Docker and fetches the data from a Cassandra cluster…” If you’ve never heard of these technologies, this sentence would be nonsensical to you. It’s well worth the time to spend 10 minutes reading about each technology that you hear mentioned, not to become an expert, but just to have a passing understanding of what each of these things do. With a few minutes of research, you’d be able to answer: “when should you use Cassandra over MySQL?

Learning is valuable even when it’s not immediately relevant to you. Occasionally, you’ll find yourself in meetings where you don’t have a clue what’s going on, say with business managers or projects you’re not involved in. Rather than zone out and browse Reddit for the duration of the meeting, listen in and learn as much as you can, and take notes if you begin to fall asleep! The human brain has near infinite capacity for learning new things, and at no point will it reach “capacity” like a hard drive.

Take responsibility and deliver results

A common misconception is programmers are paid to write code. Wrong: as a programmer, your job is to deliver results and provide value to your company; part of this job involves writing code, but a lot of the work is communicating with managers, designers, and other engineers to figure out what code to write.

When you’re assigned a project, you own it and you’re in charge of any tasks required to push it through to completion. What if something is broken in an API owned by another team? You might be tempted to hand in your code and proclaim, “my code works fine, so my job here is done, I can show you that their API is broken, so it’s their fault.” No, if your feature is broken then you need to fix it one way or another. So go and ping the engineer responsible, schedule a meeting with him, anything to get your project completed.

Sometimes you run into problems that seem insurmountable, so complex that you feel compelled to put down your sword and give up, and tell yourself, “this is too hard for an intern“. This is a bad idea, you should never expect a full-time engineer to come in, take over, and bail you out of the situation. Your mentors are not superhuman — it’s not like they can instantly conjure a solution, no, they have to work through the problem one piece at a time, just like you. There’s no reason you can’t do the same.

The product you deliver is what ultimately matters, so don’t worry about secondary measures of productivity, like how many lines of code you commit, or how many story points you rack up on Jira. There’s an apocryphal tale of a programmer who disagreed with management measuring productivity by lines of code, and writing “-2000” because he made the code simpler. Likewise, you aren’t being judged if you come in 30 minutes after your manager does, or if you leave 30 minutes before he does, or if you just feel like taking a mid-day stroll in the park, as long as you’re consistently delivering quality features.

Many interns suffer from “intern mentality” and consider themselves fundamentally different from full-times in some way. This is an irrational belief — your skills are probably on par with those of a junior engineer (or will be in a few weeks). This means you should behave like any other full-time engineer (albeit minus interview and on-call duties); the only difference is you’re leaving in a few months. Don’t be afraid to contribute your insights and ideas and consider them less valuable because you’re “just an intern”.

Other tips

What should you learn to prepare for an internship if you have spare time? Learn Git! Git is a version control system used in most companies, and is both non-trivial to pick up, and used more or less the same way everywhere. Other stuff is less useful to pre-learn because they’re either easy to pick up, or can be used in lots of different ways so it’s more efficient to learn on the job.

Internships are a great way to travel places, if that interests you. I picked 5 internships in 4 different cities for this reason. Unlike school, you don’t have to think about work during weekends, which leaves you lots of time to travel to nearby destinations.

I’ve only talked about what happens during work. If your internship is in the USA, the Unofficial Waterloo USA Intern Guide was super helpful in answering all my logistical questions. Also, some of my friends have written about crafting a resume, and how to ace the coding interview.

I have a Youtube channel!

Here’s something I’ve been working on recently: a Youtube channel of my guitar covers. I’ve been playing guitar for a few years now (I started in first year university) and I thought it would be fun to record myself playing my favorite songs.

At time of writing, I have 11 videos. Here’s a few of them:

I’m going to upload more as I have time. Please subscribe!

A Brief Introduction to DNA Computing

DNA computing is the idea of using chemical reactions on biological molecules to perform computation, rather than silicon and electricity. We often hear about quantum computers, and there is a lot of discussion about whether it will actually work, or crack RSA, stuff like that. In the domain of alternative computers, DNA computers are often overlooked, but they’re easy to understand (none of the quantum weirdness), and have potential to do massively parallel computations efficiently.

I first heard about DNA computers when doing my undergrad research project this term. I won’t bore you with the details, but it has to do with the theoretical aspects of DNA self-assembly which we will see is related to DNA computing.

The study of DNA computing is relatively new: the field was started by Leonard Adleman who published a breakthrough paper in Science in 1994. In this paper, he solved the directed Hamiltonian Path problem on 7 vertices using DNA reactions. This was the first time anything like this had been done. In this article, I will summarize this paper.

Operations on DNA

DNA is a complicated molecule with a lot of interesting properties, but we can view it as a string over a 4 letter alphabet (A, C, G, T). Each string has a Watson-Crick complement, where A is complement to T, and C is complement to G.

Without delving too deep into chemistry, I’ll describe some of the operations we can do with DNA.

1. Synthesis. We can use a machine to create a bunch of single DNA strands of any string we like. The technical term for these is oligonucleotides, but they’re just short DNA pieces. One limitation is we can only make strands of 20-25 nucleotides with current lab techniques.

2. Amplify. Given a test tube with only a few strands of DNA, we can amplify them into millions of strands using a process called polymerase chain reaction (PCR).

3. Annealing. Given a test tube with a lot of single stranded DNA, cooling it will cause complementary strands to attach to each other to form double strands.

4. Sort by length. By passing an electrical field through a solution, we can cause longer DNA strands to move to one side of the solution, a technique called gel electrophoresis. If desired, we can extract only strands of a certain length.

5. Extract pattern. Given a test tube of DNA, we can extract only those that contain a given pattern as a substring. To do this, put the complement of the pattern string into the solution and cause it to anneal. Only strands that contain the pattern will anneal, and the rest can be washed away.

This list is by no means exhaustive, but gives a sample of what operations are possible.

Solving Directed Hamiltonian Path with DNA

The Directed Hamiltonian Path problem asks, given a directed graph, does there exist a path from s to that goes through all the vertices?

For example, in this graph, if s=1 and t=3, then 1->4->2->3 is a directed Hamiltonian path.

This problem is related to the Travelling Salesman Problem, and is particularly interesting because it is NP-complete, so conventional computers can’t solve it efficiently. It would be really nice if DNA could solve it better than normal computers.

Here I’ll describe the process Leonard Adleman performed in 1994. He solved an instance of Directed Hamiltonian Path on 7 vertices, which is obviously trivial, and yet this took him 7 days of laboratory time. Early prototypes often tend to be laughably impractical.

Main Idea: we represent each vertex as a random string of 20 nucleotides, divided into two parts, each of 10 nucleotides. We represent a directed uv-edge by taking the second half of u and the first half of v, and taking the complement of all that.

The idea is that now, a directed path consists of vertex strands interleaved with edge strands, in a brick wall pattern, like this:

When we put all the vertex and edge strands into a test tube, very quickly the solution will anneal (and not just one, but millions of copies of it). However, the test tube also contains all kinds of strands that don’t represent Hamiltonian paths at all. We have to do a tricky sequence of chemical reactions to filter out only the DNA strands representing valid solutions.

Step 1. Keep only paths that start on s and end on t. This is done by filtering only strands that start and end with a given sequence, and this is possible with a variation of PCR using primers.

Step 2. Sort the DNA by length, and only keep the ones that visit exactly n vertices. Since each vertex is encoded by a string of length 20, in our example we would filter for strands of length 80.

Step 3. For each vertex, perform an extract operation to filter only paths that visit this vertex. After doing this n times, we are left with paths that visit every vertex. This is the most time consuming step in the whole process.

Step 4. Any strands remaining at this point correspond to Hamiltonian paths, so we just amplify them with PCR, and detect if any DNA remain in the test tube. If yes, there exists a directed Hamiltonian path from s to t in the graph.

That’s it for the algorithm. Adleman went on to describe the incredible potential of DNA computers. A computer today can do about 10^9 operations a second, but you can easily have 10^{20} DNA molecules in a test tube.

DNA Computing since 1994

Shortly after Adleman’s paper, researchers applied similar ideas to solve difficult problems, like 3-SAT, the maximal clique problem, the shortest common superstring problem, even breaking DES. Usually it was difficult to implement these papers in the lab, for example, Richard Lipton proposed a procedure to solve 3-SAT in 1995, but only in 2002 did Adleman solve an instance of 3-SAT with 20 variables in the lab.

On the theoretical side, there was much progress in formalizing rules and trying to construct “universal” DNA computers. Several different models of DNA computing were proven Turing complete (actually my research adviser Lila Kari came up with one of them). It has been difficult to build these computers, because some of the enzymes required for some operations don’t exist yet.

There has been progress on the practical side as well. Since Adleman, researchers have looked into other models of using biological molecules for computation, like solving 3-SAT with hairpin formation or solving the knight’s tour problem with RNA instead of DNA.

In 2006, a simplified DNA computer was built that had the ability to detect if a combination of enzymes were present, and only release medicine if they are all present (indicating that the patient had a disease). In 2013, researchers built the “transcriptor”: DNA versions of logic gates. One reason these are important because transcriptors are reusable, whereas previously all reagents have to be thrown away after each operation.

Current Limitations of DNA Computing

Clearly, the method I described is very time consuming and labor intensive. Each operation takes hours of lab work. This is not really a fundamental problem, in the future we might use robots to automate these lab operations.

The biggest barrier to solving large instances is that right now, we can’t synthesize arbitrary long strands of DNA (oligonucleotides). We can synthesize strands of 20-25 nucleotides with no problem, but as this number increases, the yield quickly becomes too low to be practical. The longest we can synthesize with current technology is a strand of length about 60. (Edit: Technology has improved since the papers I was looking at were written. According to my adviser, we can do 100 to a few thousand base pairs now in 2016).

Why do we need to synthesize long oligonucleotides? To represent larger problem instances, each vertex needs a unique encoding. If the encoding is too short, there will be a high probability of random sections overlapping by accident when they’re not supposed to, thereby ruining the experiment.

One promising direction of research is DNA self-assembly, so instead of painstakingly building oligonucleotides one base at a time, we put short strands in a test tube and let them self-assemble into the structures we want. My URA project this term deals with what kind of patterns can be constructed with self-assembly.

Today, if you need to solve a Hamiltonian path problem, like finding the optimal way to play Pokemon Go, you would still use a conventional computer. But don’t forget that within 100 years, computers have turned from impractical contraptions into devices that everyone carries in their pockets. I’ll bet that DNA computers will do the same.

References

  1. Adleman, Leonard. “Molecular computation of solutions to combinatorial problems“. Science, volume 266, 1994.
  2. Lipton, Richard. “DNA solution of hard computational problems“. Science, volume 268, 1995.

CS488 Final Project: OpenGL Boat Game

Here’s something I’ve been working on for the past few weeks for one of my courses, CS488 – Intro to Computer Graphics. For the final project, you’re allowed to do any OpenGL or raytracing project, as long as it has 10 reasonable graphics related objectives. Here’s a video of mine:

A screenshot:

It’s a simple game where you control a boat and go around a lake collecting coins. When you collect a coin, there’s a bomb that spawns and follows you around. You die when you hit a bomb. Also if two bombs collide then they both explode (although you can’t see that in the video).

Everything is implemented in bare-metal OpenGL, so none of those modern game engines or physics engines. It’s around 1000-ish lines of C++ (difficult to count because there’s a lot of donated code).

Edit (8/10/2016) – I received an Honorable Mention for this project!

Some thoughts about CS488

For those that haven’t heard about CS488, it’s one of the “big three” — fourth year CS courses with the heaviest workload and with large projects (the other two being Real-time and Compilers). It’s one of the hardest courses at Waterloo, but also probably the most rewarding and satisfying course I’ve taken.

There are four assignments, each walking you step by step through graphics techniques, like drawing a cube with OpenGL, or building a puppet with hierarchical modelling, or writing a simple ray tracer. Then there’s the final project, where you can choose to make something with OpenGL or extend your ray tracer. The class is split 50/50, about half the class did OpenGL and the other half did a ray tracer. I personally feel that OpenGL gives you more room to be creative and create something unique whereas ray tracing projects end up implementing a mix of different algorithms.

The first two assignments weren’t too bad (I estimate it took me about 10 hours each), but some time during assignment 3 I realized I was spending a lot of time in the lab, so I got an hours tracking app on my phone to track exactly how much time I was spending working on this course. Assignments 3 and 4 each took me 15 hours. I spent 35 hours on my final project, over a period of 3 weeks. I chose relatively easy objectives that I was confident I could do well, which left time to polish the game and do a few extra objectives. I’m not sure what the average is for time spent on the final project, but it’s common to spend 50-100 hours. Bottom line: you can put in potentially unbounded amounts of time to try to get the gold medal, but the effort actually required to get a good grade is quite reasonable.

Now the bad part about this course (obviously not the instructor’s fault) is OpenGL is so incredibly difficult to work with. Even to draw a line on the screen, you have to deal with a lot of low level concepts like vertex array objects, vertex buffer objects, uniform attributes to pass to shaders, stuff like that. It doesn’t help that when something goes wrong in a shader (which runs on the GPU), there’s no way to pass an error message back to the CPU so you can print out variables and debug it. It also doesn’t help that there’s a lot of incompatible OpenGL versions, and code you find in an online tutorial could be subtly broken for the version you’re using. On the other hand, working with OpenGL really makes you appreciate modern game engines like Unity which takes care of all the low level stuff for you.