A Sign That You’re Doing it Wrong

Lifehacker recently posted an article for Memory Restart, which restarts Firefox when its RAM consumption becomes too high.

To me, it’s a pretty clear sign that you’ve done something wrong if people are writing plugins for you browser to restart it, because it consumes too much RAM with some regularity.

 

Champagne Problem

I had an interview a few weeks ago, and I spent a little time preparing for it. I puzzled over some logic problems, and read some common interview questions, knowing that there would still likely be questions I hadn’t heard. Still, I did have one cache hit, so to speak.

One of my favorite questions, though, was one about pouring bottles of champagne: Suppose you are in charge of pouring the champagne for the midnight toast at a prestigious party with more VIPs than you can count. You have 10 waiters, who are wheeling in 1000 bottles of champagne and some bad news; exactly one of these bottles is poisoned, and if you drink even a single drop, you will become violently ill in one hour. If you serve tainted champagne to any guest, you and your employees will be fired on the spot.

Your waiters are sympathetic to this fact, and have thus volunteered to be taste-testers — a night home on the couch being miserable is better than being unemployed. Though, because of time constraints, you only have time for one testing before people get sick and you have to serve champagne.

The naive approach is to split the bottles up into groups of 100, and have each waiter try a drop from each. You would know from whichever waiter gets sick which group of 100 bottles has the bad bottle, but there would be 99 good bottles wasted, and of course, the wasted good bottles get deducted from your pay.

So, you want everyone to keep their jobs (and not serve tainted booze) while wasting as few good bottles as possible. How many bottles do you have to waste? Click below to reveal solution.

Show ▼

Tagged with:
 

Google Chat Reminds About Father’s Day

Google Chat subtly encouraged its users to call home for Father’s Day:

Tagged with:
 

JavaScript Unit Testing with QUnit

In the vein of habits I wish I had picked up in Software Engineering, I’ve been increasingly using unit tests. Working on a project recently with a friend of mine, it fell to me to pick out a unit testing library for our development.

My uninformed search led me to QUnit, from the same people that bring us jQuery, which we already use heavily. We found it immediately simple to use, expressive and powerful. Much of this particular site involves AJAX, and as such, we wanted to be able to test and make sure our interface to these resources is working, as well. Not to mention that most operations rely on information retrieved from remote resources.

To that end, the vast majority of our tests use their asyncTest function, which lets you perform any kind of asynchronous request, and then in your callback, you indicate to the system that your test has all its necessary information and can continue. For example,


asyncTest("Our Site's API", function() {
$.ajax({...
success: function(response) {
ok('data' in response, "We expect data in the response.");
}
}
}

One big ‘get’ in my mind is that it’s from the same group that produces a library we already use heavily, so they tend to be thought-out in similar ways. Plus, it runs in the browser, and has nice styling for the interface that make your unit tests look extra classy!

The QUnit site has a lot more examples and demos, but this concludes my shameless plug for a unit testing quite I’ve come to appreciate very much.

Tagged with:
 

Git in Four Commands

Git is not a complicated tool for most things. I still find it a little tricky to set up for multiple users, but even that’s pretty easy. It really caters to the use-case where you’re just starting a project, or are the sole developer, but just want to keep track of changes and versions, make branches, etc.

The first major point of git is that everyone has their own copy of the repository. When you commit changes, you commit them to your local copy of the repository. If you are working on a group project, there is a shared resource that can be pushed to and pulled from, but I actually like that it takes an extra command to do that — it forces me to make sure that it’s what I actually want to do. Now, to the bare-bones commands:

  1. git init — Wherever you’re writing your code, type “git init”. It creates an empty repository in that directory (there’s a magical hidden folder “.git” that gets created and knows things).
  2. git status — Git knows which files have changed since you last saved changes, and it will happily tell you which files are new and changed with this command.
  3. git add — When you change files and are at a point where it make sense to save changes to your code (a bug fix, a new feature, etc.), then tell git which files you want to put in this commit with “git add”. If committing a set of changes with git is like shooting a gun, then adding files to be committed is like loading that gun. Git knows which files have changed, but it can make sense to group changed files into different logical commits. For example, if you fix two bugs between commits, you might want to add the changes for each bug fix separately.
  4. git commit — When you have added a bunch of changed files to be committed, now you’re ready to actually commit those changes. Type “git commit -m ‘A short, meaningful summary of the changes that happened.’”

There is a lot more to git, and a million tutorials, that will explain things in more detail, but these are the commands I spend 95% of my time using, and enough to at least get you starting tracking changes for anything and everything. It doesn’t matter very much what version control you use, just use something.

Tagged with:
 

Unfuddle

For a recent project, we were looking for an issue tracker. Our biggest criteria was something that was free — we weren’t planning on doing anything complicated, and so available issue trackers were essentially equal in our eyes. We ended up going with Unfuddle, but it wasn’t until after signing up that we discovered its biggest strength: an API.

There are other issue trackers with an API, but this one seemed reasonable enough so we used it. The big get for us in building a web-based app, is that it lowers the barrier for reporting bugs. We set up our development server to include a bug-reporting button that collected a little report that then sent it to our server, which uses the Unfuddle API to add a ticket. It also records and submits a little information about the user’s browser, current session, etc. that might help in debugging their problem.

Not the most complicated or profound system in the world, but it does put user-reported (and tester-reported) bugs in the same place we check our to-dos and other issues. No checking a support email account we remember once a month. Just puts bugs right were we’re looking anyway. I hope to share a little bit of the backend code soon!

Tagged with:
 

Until recently, I bore the burden of a dirty little secret: I didn’t use unit testing. I don’t know if this has been other students’ experience, but my software engineering course could have been a lot better. Perhaps it’s a topic that requires students to determine their own level of involvement, but particularly, I am irked that these issues were not as developed as they could have been:

  1. Version control – svn, git, something! Anything, really! Just use something!
  2. Unit testing – sure we talked about it, but its utility isn’t clear until you use it in earnest
  3. Documentation – always felt more like a chore than a tool
  4. Issue tracking – Joel on Software said it best: “I was born with only two bug-storing-slots in my brain.”
  5. More best practices – Effective C++ (it’s worth it — get it and read it) alone was worth more than the course was.

I concede that these are things whose utility might be hard to convey to college students who may or may not be working on projects they really believe in. Still, very important.

One of the things I like about documentation is that you can refer to the documentation when uncertain about side-effects, input, etc. of functions and you don’t really feel like reading through code and trying to keep it in your brain-RAM. Not only that, but I usually refer to my own documentation to get an idea of the guarantee that a function was offering, to try to make sure that it’s making those same guarantees after I make changes. Not only this, but articulating functionality crystalizes it’s implications in your mind. When describing how something works to a friend, it often reminds you of issues you hadn’t considered. Writing documentation is the same in this respect.

When working with other people, while it’s nice to be able to talk face-to-face, the issues we think of that need changing go in one ear and out the other for me. Not that it wasn’t a good idea, but my brain-RAM is not infinite. Bugs, features, ideas, issues all get paged out of my head. Write them down instead, which is where issue tracking comes in. Design questions and decisions come out of issues all the time, and the trail of comments and discussion make a nice reference for documentation.

I find that when I’m fixing a bug, the functionality of that chunk of code gets reduced to that particular feature in my mind. So, if when I try using it, if that bug is fixed, then my job is done. But we’ve all experienced unintended side effects, where in fixing one thing, we break another. Having a list of features to check is a step up from trying to remember the things that need to work, but it’s much less than the power of unit testing. I’ve recently been using UnitTest++ for C++ applications, and QUnit from the jQuery people, and have been pleased to thrilled. It allows you to check functionality exhaustively and quickly on a whim. Even the task of writing the tests forces you to define the functional requirements of your code, and I find it helps to clarify a lot.

Version control lets you try crazy new approaches and track changes, isn’t that great? I was helping a friend last semester with some of his code, and he was trying to make code changes with copy, paste, and the undo buffer. In the end, he ended up introducing more bugs into the code than he was fixing. It’s intractable to keep these kinds of code changes in your brain-RAM. Use something, use anything. I have a qualm with svn because it was something I used so rarely, I could never remember how to set it up. Even a barrier as small as that was enough to prevent me from using version control at all. I love git because it’s so easy to set up, I use it for almost anything: from a report (if using LaTeX), to a weekend project, to my thesis. There are tons of tools out there, but just use something.

Python Lesson Learned

I have finally learned a lesson after running into this fact repeatedly: no matter what you are trying to do, python has a module for it. I’m reminded of two xkcd comics in particular.

Tagged with:
 

numerical_limits

I’m working on a C++ class that encapsulates changes to a “steerable” variable. It’s part of a system that allows trusted commands from a socket library to change the value of registered variables. For example, a client might:

int sInt(0);
float sFloat(0.3);
registerParam("someSteerableInteger", &sInt);
registerParam("someSteerableFloat", &sFloat);
...
// Library makes these calls after receiving remote commands:
setParam("someSteerableInteger", 5);
setParam("someSteerableFloat", 3.4);

There’s some intelligence in the class to ensure integrity of the data. For example, it will cast the value past into setParam(…) correctly, so that a call to setParam(“someSteerableInteger”, 5.3) results in the 5.3 cast as an integer and then used to set the integer’s value.

Some of the other intelligence needed is that the user might specify the minimum and maximum valid values the variable might take on. Or he or she might not. If not, it would be nice to have a convenient way of specifying a reasonable default minimum and maximum.

Enter numerical_limits. It’s a static templated class that can give you some insight into the built-in types. For example:

#include <iostream>
#include <limits>
using namespace std;
...
cout << "Min unsigned int: " << numerical_limits<unsigned int>::min() << endl;
cout << "Max float: " << numerical_limits<float>::max() << endl;

Tagged with:
 

Sum of Dice

I recently had an interview in which I was asked, “how many ways can 1000 dice make the sum 3000?” I was caught a little off-guard, and the phrasing made me wonder about a closed-form solution which, after some reading, I don’t believe exists. Still, a workable solution isn’t as easy as that.

Let F(n, k) denote the number of ways n dice can make the sum k. Note that the first die can be 1, 2, 3, 4, 5 or 6, so F(1000, 3000) = F(999, 2999) + F(999, 2998) + … + F(999, 2994). While this recurrence relation leads to an easy implementation, it suffers from two big problems: performance and maximum recursion depth. Without any memoization or optimization, this would lead to 61000 function calls, which is about 10780. That’s a big number. If you were able to evaluate one such function call every clock cycle on every computer in the world (about 7 x 1018 cycles/second), you would need about 10750 years, or 10650 lifetimes of the universe (about 10100 years) to solve it. Get ready to wait.

One optimization is to check if the arguments n and k make sense at each level, and prune accordingly. For example, n dice can only sum to between n and 6n, so if k is outside of that range, you don’t need to branch down any farther. Even using this and instantaneous branch evaluation, your code still wouldn’t complete in the expected lifetime of the universe.

Obviously, this is still a solvable problem (there are about 10757 ways 1000 dice can sum to 3000). A python implementation figured that out in 3.5 seconds. One way to help crack this egg is to use memoization. Consider the calls to F(999, 2999) and F(999, 2998). Expand these out to get:

  • F(999, 2999) = sum[F(998, k = 2998, 2997, 2996, 2995, 2994, 2993)]
  • F(999, 2998) = sum[F(998, k = 2997, 2996, 2995, 2994, 2993, 2992)]

You’ll notice that both of these expansions call F(998, k = 2997, 2996, 2995, 2994, 2993), each of which is a lot of work to evaluate. So, if once you determined each of these values once, you could use them in other places and save huge amounts of time.

One thing you’d need in order to implement memoization is a map (or dictionary in python) that stores key-value pairs. So once you figure out F(998, 2997), you could store it like myMap[(998, 2997)] = …; but this can add some complication. Every time to call F, you’d have to check if you’ve already evaluated the function for those arguments. And when you’ve figured out and stored a lot of these values, each search will take some time. It will still work and will get you the answer you need, but probably not as fast as you’d like, and not as fast as it could be. Using memoization for F(700, 2100) took 8.2 seconds in a python implementation, but a better approach took 1.5 seconds. The maximum recursion depth prevented it from using bigger numbers.

Algorithmically, the best you’ll be able to do (as far as I can tell) is O(n2), which is not bad in the scheme of things, and certainly not the combinatoric time we saw first. Enter dynamic programming. Particularly, we’ll use a bottom-up approach. We’ll build two arrays, each associated with a number of dice. The i‘th index in that array is the number of ways n dice can sum up to (n + i). So for 1 die, myArray[0] is the number of ways you can roll a 1, and for 4 dice, myArray[10] would be the number of ways you can roll 4 dice to get 14.

As I said, we’ll keep track of two arrays, say previous and current. First, initialize previous to be the number of ways 1 die can be rolled: {1, 1, 1, 1, 1, 1}, because 1 die can be rolled each number in only one way. Then, we’ll use that to fill the array “current” with the number of ways 2 dice can be rolled, based on the array previous. For example, current[0] = previous[0] = 1, because the only way to get a sum of n with n dice is to roll all 1′s. Then, current[1] = previous[0] + previous[1], because if we roll a 2 for the second dice, there are previous[0] ways to sum up to 1 with the remaining die, or if we roll a 1 for the second dice, there are previous[1] ways to sum up to 2 with the remaining die.

Each entry current[i] should be the sum of previous[i-5, i-4, i-3, i-2, i-1, i], taking the bounds of the previous array into consideration (this can actually be generalized to any m-sided dice by using previous[i-m+1, ..., i]). Once you’ve filled the array current, then you assign current to previous, and move on to the next number of dice, 3. Continue this until you have filled the array for the desired number of dice. If performance is a big issue, this can be implemented as a moving frame. For example, since current[i] uses previous[i-5, ..., i] and current[i+1] uses previous[i-4, ..., i+1], you can alternatively write current[i+1] = current[i] – previous[i-5] + previous[i+1], and use only two add/subtract operations instead of 5.

A bonus to this bottom-approach is that once finished, you actually have an array with the number of ways you can make any sum with the n dice. This makes it easy if you wish to find the number of ways n dice can form a sum up to or greater than a number.

Tagged with: