Facebook: Simon Says – How to set up Thrift on Windows

May 15, 2010

I’ve been doing some of the Facebook Engineering Puzzles recently, and in a previous blog post I described how to solve the problem User Bin Crash.

The problem User Bin Crash would be considered a batch problem. Such problem requires a program that takes some input, and produces some output. All the judge does is feed your program the input and check its output.

Thrift problems are a bit different. Such a problem would be an interactive problem, where the judge program has to interact with your program.

In the facebook puzzles, Simon Says, Rush Hour, Battleship, and Dinosaur island are interactive problems. All of the rest of the problems are batch.

Interactive problems are submitted in a different way from regular problems, as well. You, the programmer, are required to make a Thrift program, and instead of sending it to run on their server, you run your program on your own computer. The program connects to the facebook servers via Thrift. A successful program will return a special code which can be used to claim credit for a problem.

Because of the nature of interactive problems, they are somewhat difficult to set up. Thrift itself is rather new and its documentation lacking. When making my first Thrift submission, I found a few difficulties. No adequate tutorial exists on this topic, so this post may be of some use to future puzzlers.

Prerequisites

Simon Says is, after all, a simple test problem to help you get set up with Thrift.

This tutorial is relevant most if you are using the Java programming language, and are using Windows with Cygwin. If you’re using a different language, or a different platform, parts of this document may not apply to you.

Cygwin is required because the thrift compiler itself does not run natively on windows. Only the base cygwin is necessary to run the thrift compiler.

Step 1: Setting up the Thrift compiler

The official Thrift package comes only with the source code, and no linux nor cygwin binaries. As building anything from source is a tricky and error-prone process, we’re not going to compile Thrift from source (even though the official documentation recommends us to do so).

I wasn’t able to compile Thrift from source, but I found someone else’s cygwin Thrift binaries. I’ll put a link to it:

thrift.zip (2.5 mb)

To run this in Cygwin, extract thrift.exe to /cygwin/bin/.

This Thrift binary is of version 0.20, which is not backwards compatible with 0.10. The Thrift wiki provides cygwin binaries to 0.10, which isn’t very useful for us since the code generated by 0.10 can’t be used to solve Facebook problems.

Anyways, this compiler will likely be obsolete soon, and 0.3x will likely break backwards compatability again.

The Thrift compiler compiles your thrift source into source code into a language of your choice. The outputted source code contains much of the code handling networking, protocols, etc.

To run the thrift compiler, navigate to the directory containing simonsays.thrift (while in cygwin), and run,

thrift --gen java simonsays.thrift

You should now have a directory called gen-java. At this point we just need the stuff in gen-java, and we don’t need the thrift compiler anymore. We can also switch back from the cygwin shell.

Part 2: Setting up the Java libraries

To build and run a Thrift Java program, we need to link a few libraries. The first library is the Java thrift library, which I’ve named thrift.jar.

Another dependency is SLF4J, a logging framework required by the Thrift libraries. I’m not sure why it really needs this library, but we need this to compile the code. I’ve named this library slf4j.jar.

Here’s a download link to the two jar files:

javathrift.zip (189 kb)

Now we’re ready to begin coding. In a new directory, place the two jar files and the gen-java folder; create your source file in this folder as well.

Part 3: Coding the problem

The problem itself is very simple. On each round, you receive a list of colors via startTurn(), and send the colors back one by one in order using chooseTurn(). When you’re done, call endTurn() (which also tells you if there’s more rounds).

In the beginning, call registerClient() with your facebook email address; at the end, call winGame() to receive the special password.

Here’s my implementation (simonsays.java):

import org.apache.thrift.*;
import org.apache.thrift.transport.*;
import org.apache.thrift.protocol.*;
import java.util.*;

public class simonsays{
    public static void main(String[] args) throws Throwable {

        // Set up the thrift connection to the Facebook servers
        TTransport transport =
            new TSocket("thriftpuzzle.facebook.com", 9030);
        TProtocol protocol = new TBinaryProtocol(transport);
        SimonSays.Client client = new SimonSays.Client(protocol);
        transport.open();

        // Register client
        client.registerClient("bai.li.2005@gmail.com");

        boolean done = false;

        while(!done){

            // Retrieve color list
            List<Color> listColors = client.startTurn();

            // Some debug information
            System.out.println(listColors.size());

            // Play back colors list
            for(Color color : listColors){
                client.chooseColor(color);
            }

            // If we're done, endTurn() will return true.
            done = client.endTurn();
        }

        String pwd = client.winGame();
        System.out.println(pwd);
    }
}

Compiling the code:

javac -cp .;./thrift.jar;./slf4j.jar;./gen-java simonsays.java

Running the code:

java -cp .;./thrift.jar;./slf4j.jar;./gen-java simonsays

If you did everything correctly, the program should output the numbers 1 to 31 and finally a line like this:

simonsays(8d96b611989c0d38b0c47703a37466c5)

Part 4: Submitting the code

Be sure to replace my email with your own email if you wish to receive credit for the problem.

From the email you provided in registerClient(), send an email to puzzles@facebook.com, with the subject as the line outputted by the program.

If you want, you can also attach your source code, but that’s not needed. We’re done!


Facebook: User Bin Crash

April 26, 2010

I’ve started doing some of the Facebook engineering puzzles, which are programming challenges that Facebook supposedly uses to recruit people.

This is a little bit similar to Project Euler and various online judges like SPOJ, but it’s somewhat different too.

Instead of submitting solutions in a web form, solutions are sent via email, in the form of an attachment. After several hours, the automated grader would send you back a response: whether it was successful or not, and if it is, how long your program ran.

The Facebook system differs from SPOJ in that results are not available immediately, and also that an incorrect submission would return a generic error message (whether it produced incorrect output, crashed, ran out of time, failed to compile, whatever).

This made it a bit annoying to write solutions as it was difficult to figure out exactly what was wrong with the submission.

The problems are grouped into four groups in order of difficulty: Hors d’oeuvre, Snack, Meal, and Buffet. So far I’ve only solved the Hors d’oeuvre problems and one snack.

User Bin Crash

The problem goes something like this:

You’re on a plane, and somehow you have to dump some cargo or else it will crash. You know how much you need to dump (in pounds), and also an inventory of the items you can dump.

Each item in the inventory has a certain weight, and a certain value. By dumping a combination of the items, you have to dump at minimum the given weight, while minimizing the loss.

You can only dump integral amounts of an item (you can’t dump three quarters of a package), but you’re allowed to dump any amount of the item.

The program you write takes the parameters and outputs the minimum loss (value of items that are dumped).

An example

Suppose that the plane needs to dump 13 pounds. To simplify things, suppose that there are only two types of items we are allowed to dump.

Item A weighs 5 pounds, and costs 14.

Item B weighs 3 pounds, and costs 10.

The optimal solution is to dump two of item A, and one of item B. The cost for this is 2 \cdot 14 + 1 \cdot 10 or 38.

Any other combination either costs more than 38, or does not weigh at least 13 pounds.

A solution using dynamic programming

This problem is a variation of the fairly well known knapsack problem. A dynamic programming solution exists for that, and can be modified to work for this problem.

Dynamic programming is just a way to speed up recursion. So in order to form a dynamic programming algorithm, we should first make a recursive algorithm.

Working backwards, suppose that we have a weight W, and a set of items S = [e_1,e_2, \cdots e_n] such that S is the optimal set for weight W.

Now remove one item from the set:

S' = S - e

This new set S' must be the optimal set for W-w_e (I’m going to use the notation w_e to denote the weight of an item e)

The converse is not necessarily true, however. Just because you have an optimal set for some subproblem, adding any arbitrary element to your set does not make your new set optimal.

But adding an element to an optimal subset may create another optimal subset. This depends on which is smaller- the cost obtained by adding the element, and the cost obtained by not adding the element.

It’s probably better to express this idea recursively:

Hosted by imgur.com

Here F() is a function taking a list of items we can use, L, and the minimum weight, W, and returning the minimum cost. e is an element of L (for convenience, e is always the first element).

Obviously when W \leq 0, we don’t have to dump any more items.

The first path, F(L, W-w_e) + v_e is the path taken when we decide to dump e.

The second path, F(L-e, W) is the path taken when we don’t dump any more of e. In order to avoid recursing indefinitely, once we decide not to dump any more of something we never go back and dump more of it.

The code doesn’t handle all the edge cases. For example if we only have one possible item to dump, we are forced to dump it until the weight is reached; we can’t just decide not to dump any more of it and end up with an empty list.

If we use this code, the program will start unfolding the recursion:

Hosted by imgur.com

Being a recursive function, we can go further:

Hosted by imgur.com

Now having reached the end of the recursion, we can fill up the tree from bottom up:

Hosted by imgur.com

Whenever we reach a node that is the parent of the two nodes, we fill it up with the smaller of the two child nodes.

Now let’s transform this recursive function into a dynamic programming routine.

Instead of a function taking two parameters, we have the intermediate results stored in a two-dimensional array:
Hosted by imgur.com

Here the first row is the optimal solutions for each weight using only A, while the second row is by using A and B.

Filling up the first row is very obvious, as we basically only have one item to choose from:

Hosted by imgur.com

The first few cells of the second row is equally easy:

Hosted by imgur.com

However we now reach a point where it’s uncertain what we should put in the cell with the question mark. If we follow our pattern, it should be 20.

But the cell above it is 14. How can the optimal solution when we’re allowed to use both A and B be 20, if the optimal solution when we’re not allowed to use B is 14?

Instead, the cell should be 14:

Hosted by imgur.com

We continue this to fill up the entire array:

Hosted by imgur.com

The bottom right corner of this array is our answer.

The computational complexity of this algorithm is the size of the array, or O(nW). This complexity is not actually polynomial, but this algorithm is considered to run in pseudo-polynomial time.

This is because as W‘s length increases, the running time increases exponentially. If, in our example, W was actually 13000000 instead of 13, and A weighed 5000000 instead of 5, and B weighed 3000000 instead of 3, this algorithm might run out of space.

There is no real way around this problem. The knapsack problem is NP-complete to solve exactly.

However there’s something we can do about it. We can divide all weights by a common factor, and the result would be the same. In the example we can simply divide 13000000, 5000000, and 3000000 all by a million.

Needless to say, this would fail badly if W had been, for instance, 13000001.

The implementation

With the algorithm, it’s pretty straightforward to implement it.

Here’s my implementation in Java (don’t cheat though):

import java.io.*;
import java.util.*;

public class usrbincrash{
    public static void main(String[] args) throws Exception{

        BufferedReader in = new BufferedReader(
            new FileReader(args[0]));

        // Minimum weight to prevent crash
        int crashw = Integer.parseInt(in.readLine());

        // List containing weights of items
        List<Integer> itemW = new ArrayList<Integer>();

        // List containing values of items
        List<Integer> itemV = new ArrayList<Integer>();

        String parse;
        while( (parse = in.readLine()) != null){
            Scanner scn = new Scanner(parse);
            scn.next();

            itemW.add(scn.nextInt());
            itemV.add(scn.nextInt());
        }

        // Take the GCD's before starting the DP
        int gcd = crashw;
        for(int i : itemW) gcd = gcd(gcd, i);

        // Divide all weights by gcd
        crashw /= gcd;
        for(int i=0; i<itemW.size(); i++)
            itemW.set(i, itemW.get(i)/gcd);

        // Calculate optimal fit using dynamic programming
        int[][] dp = new int[itemW.size()][crashw+1];

        // First row of DP array done separately
        dp[0][0] = 0;
        for(int j=1; j<=crashw; j++){

            int aW = itemW.get(0);
            int aV = itemV.get(0);

            if(aW > j) dp[0][j] = aV;
            else dp[0][j] = aV + dp[0][j-aW];
        }

        // Filling up the rest of the DP array
        for(int i=1; i<dp.length; i++){

            dp[i][0] = 0;
            for(int j=1; j<=crashw; j++){

                int iW = itemW.get(i);
                int iV = itemV.get(i);

                // Cell directly up from current
                int imUp = dp[i-1][j];

                // Cell left of it by iW
                int imLeft = 0;
                if(iW > j) imLeft = iV;
                else imLeft = iV + dp[i][j-iW];

                // Smallest of the two
                dp[i][j] = imUp<imLeft? imUp: imLeft;
            }
        }

        System.out.println(dp[itemW.size()-1][crashw]);
    }

    // GCD using the Euclid algorithm
    static int gcd(int a, int b){
        if(b == 0) return a;
        return gcd(b, a%b);
    }
}

When submitting it, it’s necessary to use the -Xmx1024m option. Otherwise the program will run out of memory and fail. According to the robot, the longest test case took 2911.623 ms to run.


Follow

Get every new post delivered to your Inbox.

Join 69 other followers