SIGGRAPH 2010

I arrived in Los Angeles yesterday to attend (and talk at) SIGGRAPH 2010. This is not the first time the conference has been here, but it is certainly my first time to this smoggy city.

Adjacent to the Los Angeles Convention Center is the Staples Center, where workers have been setting up for the X Games. Fortunately they won’t begin until July 29th, the last day of this conference, but in watching the setup I was imagining an enormous re-enactment of high school stereotypes. Tens of thousands of jocks right next door to tens of thousands of nerds.

Tagged with:
 

It’s a question that comes up often in conversation, and especially when meeting new people. The normal pleasantries of where one is from and what one does naturally lead there. “High-Performance Computing, eh? What’s that?”

I sometimes feel it a mission to dispel myths about supercomputing that the layperson might have. Pop culture is full of stern-looking authority figures leering at a screen, looking over the shoulder of an endearingly-disheveled nerd. Or I think Chuck represents this well:

So when answering this seemingly question about what exactly my job entails, I have to start at the bottom, explaining how processors aren’t getting any faster (and haven’t been for quite a while, relatively speaking). And how this fact necessitates a different way of looking at programming tasks, moving from “fast” or “deep” to “wide.” Supercomputers aren’t single small boxes in the middle of a vast room covered in billions of pixels, as Chuck would seem to suggest. No, nothing so glamorous. In fact, seeing a rack of IBM’s Blue Gene is more akin to 2001: A Space Odyssey – being confronted with a towering black monolith:

The BlueGene/P system at Argonne National Lab. The man in the picture is actually now an awesome system administrator for the KAUST Supercomputing Lab!

So modern supercomputers are not a single chip that can perform trillions of calculations per second, but are rather a set of relatively simple processors that each perform modestly. Though when working in concert, the results are astounding. Not long ago a supercomputer sustained calculations at a rate of 1 petaflop, or a million billion operations every second. If we were to compare that to a relatively modern desktop, a second on that computer is the equivalent of about four days of computation on your desktop. If the same code were to run on the computers used in the Apollo missions, it would take approximately 630 years (this is a rough approximation based on a figure of 20 microseconds per add).

The reason for the modest clock rates of each processor in modern devices is power. Intel successfully grew processor performance by increasing the rate at which operations were performed (among other advances), but at a great cost in power. For example, a chip on a BlueGene/P compute node runs at a mere 850MHz, though it’s impossible to use this number alone to compare performance. In fact, of the budget allocated for the purchase of a system like that, only half that money goes towards actual equipment. The rest goes towards the power of not only running it, but cooling the damn thing off.

Graphics cards have become an unlikely source of high-performance computing in the last ten years or so. It’s seen many struggles, from being difficult to program and even harder to debug, to early cards not supporting floating-point calculations and not supporting certain types of loops. And yet NVIDIA now markets a graphics card with as many as 480 cores.

I recently happened upon this video of Jamie Hyneman and Adam Savage of Myth Busters explaining the difference between a CPU (the brain of your computer for the uninitiated) and a GPU (the part the handles much of the graphics). When presented with explosions, robots and paintballs, the difference really lights up (skip to 8 minutes in for the really good bit, but the whole thing is worth a watch):

Tagged with:
 

I’ve been working with a large shared system at my school, sometimes building packages for myself and sometimes for others. The one thing that’s almost certain across all such installations is that it’s difficult. Installing dependencies can be a very deep rabbit hole, and there are seemingly more configuration, build and source control systems than there are atoms in the universe.

On this particular shared system, our awesome sysadmins (these guys are really pretty great!) use a package called modules to set environment variables for use with various packages. Suppose you have several versions of a library that you’re working with. Let’s say some users need Python 2.4, and other 2.6 and 3.0. Instead of managing your path yourself, modules can help:

$> module load python-2.4
$> which python
/opt/share/python/2.4/ppc64/bin/python

One of the really great things about modules is that the way so-called modulefiles are written, not only can you load modules easily, but you can also unload them just as easily:

$> which python
/opt/share/python/2.4/ppc64/bin/python
$> module unload python-2.4
$> module load python-2.6
$> which python
/opt/share/python/2.6/ppc64/bin/python

It’s of course not limited to any particular environment variable. The modulefile can specify where the man pages are, what paths to include when using cmake, dynamic library path and whatever you’d like.

Modulefiles are also extremely convenient places to store information on how you actually built the library. When you come back to it three months from now to build the next version of some code base, you won’t remember the complex arguments you had to pass into configure or cmake in order to get the damned thing to build. Build notes are an essential part of maintaining code, especially if other people will be using the libraries you’ve built.

Speaking of which, I’ve worked on other shared systems before where several people need to use the same library and end up building it separately. Frank and Steve both need libX, and have made their installations accessible to the other users on the system, but who wants to sully their .bash_profile by adding some long and ugly paths to their PATH? If Steve makes his modulefiles directory public, too, then you can just that directory in MODULEPATH:

# in ~/.bash_profile
...
export MODULEPATH=~/modules/:/home/frank/modules/
...

Perhaps it’s not the best thing since sliced bread, but I like that it affords me a way to bridge the gap between having a convention for where libraries are installed (say, if you use MacPorts, for example?) and being able to easily set all your environment variables to easier if not easy compilation.

By way of a way that its helped me, I’ve been compiling quite a few projects recently that rely on cmake. These projects also have a whole lot of dependencies, but my build process is now something along the lines of:

$> module load libxml2
$> module load vtk
$> module load osmesa
$> ...
$> cmake ~/ParaView

This, compared to:

CMAKE_INCLUDE_PATH=$CMAKE_INCLUDE_PATH:/opt/share/libxml2/2.7.7/ppc64/include:/opt/share/vtk/5.6.0/ppc64/include/:...\
CMAKE_LIBRARY_PATH=$CMAKE_LIBRARY_PATH:/opt/share/libxml2/2.7.7/ppc64/include:/opt/share/vtk/5.6.0/ppc64/lib/:...\
cmake ~/ParaView

Computers aren’t perfect, but they are experts at remembering the details. Almost to a fault. Modules helps my interactions to be a little more equitable where I have to remember the important parts (the names of the modules I need, for example) and the computer can keep track of where the hell I put everything!

Tagged with:
 

Delicious Sushi

I’m not quite sure what the message here is.

Freshest dolphin in town?

Tagged with:
 

Socket to Me

I’m not a “network guy.” I still don’t know what exactly the subnet mask means, and I am often thankful that OS X is so willing to automatically configure network settings for me well.

That said, recently I’ve been finding myself doing a lot of programming with sockets. They provide a low-level network interface to communicate between computers, and are used like other file descriptors. On the C side of things there’s a little more work than I’d like, and as such, I’ve found Python an invaluable tool.

In fact, I think any time you’re working with a new concept, technique or algorithm it’s extremely helpful to use a scripting language. Like others, Python offers an interactive session where you can develop code fragments by trial and error with each step, rather than trying to debug a chunk of code you’ve written with only a vague notion of what’s going on behind the scenes. It allows you to pause between steps and see the effects and results of each function.

Interestingly enough, another tool has come in extremely handy – netcat. It’s designed to print to stdout everything that it hears on the socket, and then it sends everything it receives on stdin through the socket. It allows you to examine some of the specifics of a protocol without worrying about the details of your own code or whether or not your code works. Netcat is tried and true, and will tell you exactly what’s happening.

This all came up in the context of WebSockets. They’re a part of the HTML5 spec and provide a JavaScript interface for real socket communication (there are of course some caveats, especially with respect to how to handle binary data). We’ve been using them for a project where we’d like the client to not need a special program to interact with a piece of software, and so instead implemented the protocol in JavaScript.

There was, however, some trouble at the offset. I had a bit of difficulty finding out why exactly the WebSocket client would seem to start to make a connection but then immediately complain about handshakes. What would have been much easier is to just open up netcat on the same port and have a conversation with the WebSocket itself.

# First off, I was running SimpleHTTPServer from a directory with a dummy html file
$> python -m SimpleHTTPServer 8888
# On the terminal, listen on port 35000
$> netcat -l 35000

And then try to make a connection from the JavaScript side

# From a JavaScript terminal from that dummy html file, in Chrome or Safari for example
ws = new WebSocket("ws://localhost:35000");
ws.onopen = function() { window.console.log("Hello!"); };
ws.onmessage = function(event) { window.console.log("Received " + event.data); };
ws.onclose = function() { window.console.log("Goodbye!"); };

I had figured (incorrectly) that WebSockets would work in essentially the exact same way that sockets would. This is what we’d then expect to receive:

GET / HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: localhost:35000
Origin: http://localhost:8888

It was only after realizing that this is what the WebSocket was sending that it became clear that the reason that no connection was actually happening was because the browser wasn’t getting the rest of the handshake:

HTTP/1.1 101 Web Socket Protocol Handshake\r
Upgrade: WebSocket\r
Connection: Upgrade\r
WebSocket-Origin: http://localhost:8888\r
WebSocket-Location: ws://localhost:35000/\r
WebSocket-Protocol: sample\r\n\r\n

Oddly enough, I can’t seem to get netcat to play nice sending the response code, but it can easily-enough give us an indication of what was happening at first. Python can handle the rest for us:

$> python
>>> import socket
>>> s = socket.socket()
>>> s.bind(('', 35000))
>>> s.listen(1)
>>> client, info = s.accept()
>>> client.recv(1024)
'GET / HTTP/1.1\r\nUpgrade: WebSocket\r\nConnection: Upgrade\r\nHost: localhost:35000\r\nOrigin: http://localhost:8888\r\n\r\n'
>>> client.send('HTTP/1.1 101 Web Socket Protocol Handshake\r\nUpgrade: WebSocket\r\nConnection: Upgrade\r\nWebSocket-Origin: http://localhost:8888\r\nWebSocket-Location: ws://localhost:35000/\r\nWebSocket-Protocol: sample\r\n\r\n')
199

And notice that it’s not until the listening socket sends its response portion of the handshake that the JavaScript console will print the “hello” we told it to upon opening the connection. Now that we have the two talking, let’s actually send and receive a couple of messages:

# Again from the JavaScript console
ws.send("Hello from JavaScript!")

And then we receive the message in Python:

>>> client.recv(1024)
'\x00Hello from JavaScript!\xff'

And here we find another oddity of WebSockets. All messages seem to be prepended with \x00 and appended with \xff (though this is also mentioned in the specification). If we try to send a message from Python without these two extra characters, we’ll get nothing out on the JavaScript side (go ahead and give it a try!):

# We get nothing on the JavaScript side :-/
client.send('hello')
# Magic happens!
client.send('\x00hello\xff');

Of course I’m sure there are other robust tools the real “networking guys” use to make their lives easier. But outside of pulling up WireShark or trying to figure this out by writing C code, netcat and Python definitely saved the day.

Tagged with:
 

I’m living in New York, New York this summer while I work at IBM Research. From the offset, I was skeptical of the city. Any city, really. I grew up in Colorado, where the population of the entire state is less than a third of that of the NYC metro area.

The first week was a little rocky, but mostly because I was unfamiliar with my neighborhood and wasn’t sure where people went to do their grocery shopping, to eat, grab a drink and so forth. By Monday of the following week, I knew my commute to work like the back of my hand, blending in among the real New Yorkers with the disaffected forward-looking stare that says, “I just want to get where I’m going, pal.”

When I first arrived, I was on a red-eye flight and got in around 9. I had contemplated taking a taxi to my place, both wanting to have ridden in a New York City Yellowcab and not wanting to deal with public transit, but when I got there I felt like I ought to hit the ground running. Riding on the bus, I examined the faces and demeanors of all those around me, wondering to myself which best embodied the New Yorker. On the bus, off the bus, transfer to another bus in Harlem. The name of the neighborhood brought to mind poverty and violence, and frankly, as a wet-behind-the-ears exhausted-and-irritable honkey with luggage, I had no idea what to expect. Getting off the bus and walking to the transfer, I passed the undeniable odor of marijuana, urine, drunk and irate homeless people and so forth. Bear in mind this was about 10 am on a Sunday. But while waiting, I realized something sort of magical about the city — no one cares. If you don’t get in anyone’s way or make yourself particularly noticeable, but just wear that look of just wanting to get where you’re going, no one will notice you’re there.

I had imagined that it would be a city of all kinds (which, it really is) but also a city of all kinds of rude. This was a major misconception. Though most citizens would not return a “hello” from a stranger on the street, most will help with directions when asked and apologize when they bump into you. I had hoped that at some point during the summer I would accidentally bump into someone who would then call back at me “hey, I’m walkin’ here!” This seems unlikely to happen at this point.

There are some stereotypes that are true to TV life. Attractive, busy and exasperated professional women are in abundance, a la Liz Lemon of 30 Rock, for example. There is a certain level of dress that seems to be expected here, even on the street. Most men wear shirts and resort to jeans as their most casual and most women, at least this time of year, wear dresses, though I think that might be because of how outrageously hot it can get.

The New Yorker’s hatred of tourists is a uniting factor, and something that I began to understand almost immediately. It can be easy to get distracted by the enormous buildings, and visual stimuli, but most inhabitants pass these things every day and are just on their way to work, or dinner, or a friend’s. I’m gaining a sense of what parts of the city to avoid for this reason – it’s very frustrating to get stuck behind a slow-walking tourist who’s aloofness makes him meander windingly down the sidewalk, impossible to pass. Times Square is a death trap — three blocks of fanny-pack-wearing fathers trying to decide what to see next, keep track of lagging children and generally getting in everyone’s way. But, let the tourists have Times Square.

Don’t take too long here. A friend who shall remain nameless visited me here, and used to the slower-paced life in Boulder, CO, she pondered what she wanted from a pizza place only after we had gotten to the front of the line. On another occasion, she flip-flopped on her order. To be fair, these places weren’t extremely busy, but it brought to light the fact that there’s an expectation here that you have been to a place before, know exactly what you want, and can complete your transaction in less than a minute. In some ways, this is a charm that I like; there are many places to eat here, but I’ve quickly developed preferences and can walk into my favorite pizza place and make my order like a regular.

My favorite things about New York couldn’t be experienced in a vacation here. When I first moved in, for example, I was so pleased with the view from my place. Not that it’s particularly incredible (I live on the Upper East Side and I don’t know what sort of reputation the neighborhood holds as far as views), but I like it all the same. I look out my window and I see dozens of different buildings that I would call skyscrapers, all designed differently, peppered with garden terraces and charming signs of age. The jagged horizon is somehow enchanting, and my curtains always stay open. At night the neighboring buildings provide a soft and diffuse light, and different patterns of lit windows.

I love walking down certain streets and being able to see down the avenues, the tiny separations between giant buildings. Properly positioned, you can sometimes see a mile or so before a hill obstructs the view. It reminds me of a project I saw to create a horizonless map of Manhattan.

The last few hours of sunlight in the day are perfect. People talk about the Colorado sky, or the sunsets we have, but I’ve never seen light quality quite like this. It’s a beautiful golden warmth every day without fail. It makes me want to curl up and take a nap, or stretch out on the lawn and enjoy the end of the day. In the park, this deliciousness is only compounded by the reservoirs, the heavy trees and residents playing frisbee, picnicking and taking walks along the paths. It’s a beautiful time, and there’s an odd sense of community to it. I’ve often wondered if such a place existed, where there isn’t any one group that’s out and enjoying the place, or even a tight-knit group of neighbors. But there is a dense packing of total strangers who can come to the same place and enjoy the grass and the outdoors.

Almost most of all, I love Central Park. I was excited when I found my place because it’s a mere three cross-town blocks from Central Park. I bike around there almost every day, and it’s almost always a treat. There are hordes of runners, cyclists, rollerbladers and even a few cross-country skiers. As a cyclist, you have to keep a watchful eye on the bipeds you’re passing as they sometimes have a tendency to step out in front of you. We largely ignore crosswalks, and only the few cars on the road observe them. There are some sections that are filled with horse-drawn carriages, bikeshaws and more adventurous tourists who decided to rent a bike and ride around the park, but you make do. Sometimes it’s actually quite a thrill to be riding as fast as you can and dodging these obstacles and having a little friendly battle with other riders. This, too, is a nice feature about riding here — no matter what your level, you can always find an equivalent cyclist for a little friendly competition and motivation.

Biking in the city is also a pretty big rush. In the morning hours, most of the usually-busy roads are ghost-towns, but seventh avenue at 7 in the evening is a sea of taxis. I was riding down one day to Penn Station in the evening, catching a train out to Long Island to see a friend. A bicyclist will make it down there faster than a taxi, but it’s not for the faint of heart. Sprint, breaks, sprint, dodge pedestrian, coast, zip between cars, breaks, sprint. There are no bike lanes, but rather only the spaces between cars. The upside of this game of Frogger is that you can zip out between any two lanes where there’s space. I’m not quite sure how cab drivers feel about us, but I imagine that if nothing else, they’re quite used to people with a death-wish.

Had I never spent more than several weeks here, I never would have discovered what I love about the city. These are the things one does and notices not when trying to visit the Met or Times Square or the Empire State Building, but only when you’ve enough time to be alright with not packing every waking hour or weekend with a trip to somewhere new and exciting. This is the meandering life, and enjoying it.

Tagged with:
 

Shared Objects

Shared objects are great. Imagine you have several programs running that all make use of the same library. If they are all statically linked against that library, then there will be several copies of essentially the same code in memory. Shared objects allow all these programs to reference the same code, held in memory in only one spot.

I’ve recently been spending a lot of time trying to get a few key libraries compiled on a cluster, where the idea is to run jobs based on these libraries in parallel. As memory is particularly limited, and there are several cores per node, I figured it would particularly make sense to compile everything as a shared object. Also, a subsequent library depended on it.

A lot of different packages configure, build and install themselves in a number of ways and often don’t adhere to conventions. Despite trying every flag known to man into the configuration tools, I was unable to get a particular piece to build as a shared object. But I happened upon a particularly useful code snippet to save the day:

ar -x mylib.a
gcc -shared *.o -o mylib.so

Colbert Report

One of the joys I was hoping to experience here in New York City was to go to a filming of The Colbert Report. Unfortunately, getting a ticket can be tricky. There are rarely tickets available, and so you have to sign up to receive notifications when they become available.

If you do not constantly sit on your email, your chances of going are greatly diminished. I’ve been trying for about a month, because as a computer scientist, I am often on my email and can usually respond pretty quickly (< 5 minutes). Today… was my lucky day. Sort of.

I responded within a minute of receiving a notification, and was pleased that today I had been fast enough. There were still 40 tickets left! By the time I had finished filling out the form, there were a mere 28 left. I’m sure that in a couple of minutes the whole thing had closed off. It would seem that Colbert’s a popular guy.

So, here’s the trouble… the filming is on August 23rd, about a week and a half after I leave the city, and mere days before returning to Saudi Arabia. Just… my… luck.

Tagged with:
 

Sockets, Endianness and Your BlueGene/P

Endianness is one of those things that, as a computer scientist you learn about but rarely have to think about. It does occasionally come into play and I recently had some frustration with it.

For the uninitiated, it’s talking about the order in which bytes are stored in memory, and unsurprisingly, different processors have different philosophies regarding this.

The term itself comes from Gulliver’s Travels, when two groups have strongly-held beliefs regarding on which end it was best to crack a hard-boiled egg. And as in the book, it can cause tension.

I’m working with a library in which a server and client communicate over sockets, and they tend to prepend their messages with the length of the message to follow. In the code, it’s stored as a four-byte integer, and for the most part, it works beautifully. However, we recently ported it to the BlueGene/P, which uses PowerPC chips. These, unlike the common Intel chip, are big-endian, and so client and server would hang, waiting for an exceptionally large message when only a small one was sent. For example, there are some common messages that are 329 bytes long. If you interpret the value 329 using the alternative endianness, you’ll expect 1224802304 bytes, or about 1.2 GB.

I’ll admit that it took a while to catch, but we eventually got it figured out. We don’t use sockets much, but couldn’t imagine that this was a problem that hadn’t been encountered before. After all, the internet uses sockets, has been around for a long time, and it’s not as if big-endian devices aren’t allowed on. It turns out that there is in fact a convention – that information sent over sockets ought to be in big-endian format, but networking libraries provide you with functions to help out with this.

The functions htons, htonl and sometimes htonll convert Host to network two-byte, four-byte and eight-byte messages respectively. And the inverse functionality is encapsulated by ntohs, ntohl and ntohll.

Lesson learned.

Tagged with:
 

Freediving

One of the greatest features of KAUST is its proximity to the Red Sea. In fact, when I look out of my apartment window, I see water in the harbor a mere 30 meters away. The campus beach is all of a 10-minute ride away. Countless afternoons friends and I have decided that we don’t need to go back to the office today and that our time might be better spent snorkeling on the reef.

Skin diving is great by virtue about its accessibility. Strictly speaking you just need goggles, but that’s easily augmented with a snorkel and fins – all things you can just throw in a backpack. And in fact, some people can hold their breath for relatively long periods of time. At least, long enough to go down, see something interesting and come back up. However, there are others that take it to a whole different level:


via WonderHowTo

While I’ll continue to work on holding my breath, I think a pony bottle or bailout bottle would be an interesting addition to all this.

Tagged with: