MPI_Datatype

It has been a while since I’ve had to work with MPI, but recently I had to learn a new trick with it. MPI provides ways to convey data between processes in a number of ways, from broadcasts to scatters to all-gathers. But obviously you have to provide a certain amount of information about the structure of the data, not the least of which is the datatype.

MPI defines enumerated constants for the basic data types in C: MPI_CHAR, MPI_INT, etc., and for the most part these will suffice. But what if you want to scatter your own struct? This can be done through a number of utility functions, but the most versatile seems to be MPI_Type_struct().

You let MPI know how many different blocks, or chunks of memory there are, their lengths, their offsets and types, and then what variable to store the resulting integer handle in. So if we had a struct:


typedef struct {
int var;
char string[STRING_LENGTH];
double foo;
} bar;

We would first indicate that there are three blocks, of lengths 1, STRING_LENGTH and 1:

int count = 3;
int lengths[3] = {1, STRING_LENGTH, 1};

The offsets indicate the byte offsets from the base address of each of the types in the structure. For this example, the “var” variable is the first, and thus has offset 0. On the other hand, “string” will have an offset that is sizeof(int), and “foo” will appear after the int and the string of length STRING_LENGTH:

MPI_Aint offsets[3] = {0, sizeof(int), sizeof(int) + STRING_LENGTH};
MPI_Dataype types[3] = {MPI_INT, MPI_CHAR, MPI_DOUBLE};

To finish up, we ask MPI to fill an integer we declare with the handle that will hereafter refer to this struct for the purposes of MPI. Then, we commit it, and it’s ready for use!

int barDatatype;
MPI_Type_struct(count, lengths, offsets, types, &barDatatype);
MPI_Type_commit(&barDatatype);

Now if you have an array of bar structs, you can use scatterv with it and your new datatype, or bcast for that matter. It’s business as usual, as if it were any of the include base datatypes in MPI.

Tagged with:
 

Promises to my OS Prof

For me, operating systems was one of the most worth-while courses in my undergraduate career. Not only gaining insight into the black-box that can be the operating system, but learning a bunch of skills that I have found invaluable since then.

Even in graduate school, I encounter computer scientists that have never used the command line. While its all well and good to use your IDE, it is absolutely crippling to not be able to do all the same magics from the command line. Not only that, but there is a wealth of tools accessible from your favorite shell and bash scripting is a useful piece to keep in one’s toolbox. The command line was one thing with which I was better acquainted through operating systems; this was especially true in one project where we had to roll our own shell. This was particularly good because the shell gives access to a lot of system tools that I’ve since found I want to use in other programs: forking, dup’ing, interacting with environment variables, etc.

Other good experiences were working with threading libraries and even booting up EC2 instances and testing out code there. I am extremely grateful to then-professor Mike Colagrosso for making it such a worthwhile experience. Lately I’ve been working with code where I remember quality code promises he made us make:

ALWAYS save the return value of system calls. I try to avoid programming in C as much as possible (after all, C++ is usually an alternative, if not Python), but whenever I do I am constantly reminded of this one. The most common design for functions that I encounter in that kind of code is to accept the variable that is to be updated, and return instead, a code indicating the success/failure/warnings/etc. of the code. In some cases, the return value is the only way you’ll have access to the resource just requested (for example in the case of fork(2) ), but either way it’s the basic mechanism for getting feedback about how code has executed.

NEVER use strcpy(3). I’ve never used a buffer overflow in a clever way (though this is a someday project), or I should say on purpose. But I’ve dealt with enough instances of me being distracted and writing bad code that I’ve gotten to know gdb (the GNU DeBugger) better than I would like. The strcpy(3) function relies on null character termination to stop copying memory from one string into another. If, however, a string isn’t null terminated or isn’t terminated before the length of the string buffer being copied into, then the operation will overrun the buffer. Instead, ALWAYS use strncpy(3). It allows you to specify the maximum length that should be copied (like, for instance the length of the buffer being copied into) so as to avoid this embarrassing problem.

Despite the taking the class just over three years ago (it’s unsettling to realize its been so long), these few promises have stuck with me. So Mike, if you’re reading this, I’m doing my part!

Tagged with:
 

A Bad First Instinct

It’s an itch. A compulsion. Sometimes when working with a new library I get a very strong impulse to do a very bad thing — reinvent the wheel. Or, in this case, re-write the wheel.

It’s easy to object to the way code is organized, and can cause a certain amount of discomfort. Whether it’s the desire to go through and reformat code (braces belong on the same line as the if statement!) or the pain of hacking together relatively incompatible library designs, it can be unpleasant.

Something that I’ve been pushing myself to do, and slowly learning to do is to be comfortable with that discomfort. To resist the urge to rewrite code, as it’s never just a matter of rewriting it, but also testing it, an so forth. I’m slowly beginning to accept the painful truth: no code is perfect.

That said, there are times when I do rewrite libraries. Sometimes you’re handed code from ten years ago that’s no longer applicable, or antiquated, or just plain ugly. In these scenarios it’s perfectly justified to make large sweeping structural changes (with the protection of your favorite version control, of course). Just, hold off. Wait, and try to use the library, and the first-pass issues will either grow into systemic problems or wither and recede into the cracks.

Tagged with:
 

This last week saw SIGGRAPH 2010 in Los Angeles, sunny California. It was there that I gave my first talk at a real conference. One request that I had not anticipated was for the slides from the talk, and so I post them here now.

Pending some red tape resolution, I hope to post the live-working demos soon. Until then, I hope that this video from the original conference submission will whet your appetite!

For those particularly curious, feel free to contact me or refer to the abstract.

Tagged with:
 

SIGGRAPH 2010

I arrived in Los Angeles yesterday to attend (and talk at) SIGGRAPH 2010. This is not the first time the conference has been here, but it is certainly my first time to this smoggy city.

Adjacent to the Los Angeles Convention Center is the Staples Center, where workers have been setting up for the X Games. Fortunately they won’t begin until July 29th, the last day of this conference, but in watching the setup I was imagining an enormous re-enactment of high school stereotypes. Tens of thousands of jocks right next door to tens of thousands of nerds.

Tagged with:
 

It’s a question that comes up often in conversation, and especially when meeting new people. The normal pleasantries of where one is from and what one does naturally lead there. “High-Performance Computing, eh? What’s that?”

I sometimes feel it a mission to dispel myths about supercomputing that the layperson might have. Pop culture is full of stern-looking authority figures leering at a screen, looking over the shoulder of an endearingly-disheveled nerd. Or I think Chuck represents this well:

So when answering this seemingly question about what exactly my job entails, I have to start at the bottom, explaining how processors aren’t getting any faster (and haven’t been for quite a while, relatively speaking). And how this fact necessitates a different way of looking at programming tasks, moving from “fast” or “deep” to “wide.” Supercomputers aren’t single small boxes in the middle of a vast room covered in billions of pixels, as Chuck would seem to suggest. No, nothing so glamorous. In fact, seeing a rack of IBM’s Blue Gene is more akin to 2001: A Space Odyssey – being confronted with a towering black monolith:

The BlueGene/P system at Argonne National Lab. The man in the picture is actually now an awesome system administrator for the KAUST Supercomputing Lab!

So modern supercomputers are not a single chip that can perform trillions of calculations per second, but are rather a set of relatively simple processors that each perform modestly. Though when working in concert, the results are astounding. Not long ago a supercomputer sustained calculations at a rate of 1 petaflop, or a million billion operations every second. If we were to compare that to a relatively modern desktop, a second on that computer is the equivalent of about four days of computation on your desktop. If the same code were to run on the computers used in the Apollo missions, it would take approximately 630 years (this is a rough approximation based on a figure of 20 microseconds per add).

The reason for the modest clock rates of each processor in modern devices is power. Intel successfully grew processor performance by increasing the rate at which operations were performed (among other advances), but at a great cost in power. For example, a chip on a BlueGene/P compute node runs at a mere 850MHz, though it’s impossible to use this number alone to compare performance. In fact, of the budget allocated for the purchase of a system like that, only half that money goes towards actual equipment. The rest goes towards the power of not only running it, but cooling the damn thing off.

Graphics cards have become an unlikely source of high-performance computing in the last ten years or so. It’s seen many struggles, from being difficult to program and even harder to debug, to early cards not supporting floating-point calculations and not supporting certain types of loops. And yet NVIDIA now markets a graphics card with as many as 480 cores.

I recently happened upon this video of Jamie Hyneman and Adam Savage of Myth Busters explaining the difference between a CPU (the brain of your computer for the uninitiated) and a GPU (the part the handles much of the graphics). When presented with explosions, robots and paintballs, the difference really lights up (skip to 8 minutes in for the really good bit, but the whole thing is worth a watch):

Tagged with:
 

I’ve been working with a large shared system at my school, sometimes building packages for myself and sometimes for others. The one thing that’s almost certain across all such installations is that it’s difficult. Installing dependencies can be a very deep rabbit hole, and there are seemingly more configuration, build and source control systems than there are atoms in the universe.

On this particular shared system, our awesome sysadmins (these guys are really pretty great!) use a package called modules to set environment variables for use with various packages. Suppose you have several versions of a library that you’re working with. Let’s say some users need Python 2.4, and other 2.6 and 3.0. Instead of managing your path yourself, modules can help:

$> module load python-2.4
$> which python
/opt/share/python/2.4/ppc64/bin/python

One of the really great things about modules is that the way so-called modulefiles are written, not only can you load modules easily, but you can also unload them just as easily:

$> which python
/opt/share/python/2.4/ppc64/bin/python
$> module unload python-2.4
$> module load python-2.6
$> which python
/opt/share/python/2.6/ppc64/bin/python

It’s of course not limited to any particular environment variable. The modulefile can specify where the man pages are, what paths to include when using cmake, dynamic library path and whatever you’d like.

Modulefiles are also extremely convenient places to store information on how you actually built the library. When you come back to it three months from now to build the next version of some code base, you won’t remember the complex arguments you had to pass into configure or cmake in order to get the damned thing to build. Build notes are an essential part of maintaining code, especially if other people will be using the libraries you’ve built.

Speaking of which, I’ve worked on other shared systems before where several people need to use the same library and end up building it separately. Frank and Steve both need libX, and have made their installations accessible to the other users on the system, but who wants to sully their .bash_profile by adding some long and ugly paths to their PATH? If Steve makes his modulefiles directory public, too, then you can just that directory in MODULEPATH:

# in ~/.bash_profile
...
export MODULEPATH=~/modules/:/home/frank/modules/
...

Perhaps it’s not the best thing since sliced bread, but I like that it affords me a way to bridge the gap between having a convention for where libraries are installed (say, if you use MacPorts, for example?) and being able to easily set all your environment variables to easier if not easy compilation.

By way of a way that its helped me, I’ve been compiling quite a few projects recently that rely on cmake. These projects also have a whole lot of dependencies, but my build process is now something along the lines of:

$> module load libxml2
$> module load vtk
$> module load osmesa
$> ...
$> cmake ~/ParaView

This, compared to:

CMAKE_INCLUDE_PATH=$CMAKE_INCLUDE_PATH:/opt/share/libxml2/2.7.7/ppc64/include:/opt/share/vtk/5.6.0/ppc64/include/:...\
CMAKE_LIBRARY_PATH=$CMAKE_LIBRARY_PATH:/opt/share/libxml2/2.7.7/ppc64/include:/opt/share/vtk/5.6.0/ppc64/lib/:...\
cmake ~/ParaView

Computers aren’t perfect, but they are experts at remembering the details. Almost to a fault. Modules helps my interactions to be a little more equitable where I have to remember the important parts (the names of the modules I need, for example) and the computer can keep track of where the hell I put everything!

Tagged with:
 

Socket to Me

I’m not a “network guy.” I still don’t know what exactly the subnet mask means, and I am often thankful that OS X is so willing to automatically configure network settings for me well.

That said, recently I’ve been finding myself doing a lot of programming with sockets. They provide a low-level network interface to communicate between computers, and are used like other file descriptors. On the C side of things there’s a little more work than I’d like, and as such, I’ve found Python an invaluable tool.

In fact, I think any time you’re working with a new concept, technique or algorithm it’s extremely helpful to use a scripting language. Like others, Python offers an interactive session where you can develop code fragments by trial and error with each step, rather than trying to debug a chunk of code you’ve written with only a vague notion of what’s going on behind the scenes. It allows you to pause between steps and see the effects and results of each function.

Interestingly enough, another tool has come in extremely handy – netcat. It’s designed to print to stdout everything that it hears on the socket, and then it sends everything it receives on stdin through the socket. It allows you to examine some of the specifics of a protocol without worrying about the details of your own code or whether or not your code works. Netcat is tried and true, and will tell you exactly what’s happening.

This all came up in the context of WebSockets. They’re a part of the HTML5 spec and provide a JavaScript interface for real socket communication (there are of course some caveats, especially with respect to how to handle binary data). We’ve been using them for a project where we’d like the client to not need a special program to interact with a piece of software, and so instead implemented the protocol in JavaScript.

There was, however, some trouble at the offset. I had a bit of difficulty finding out why exactly the WebSocket client would seem to start to make a connection but then immediately complain about handshakes. What would have been much easier is to just open up netcat on the same port and have a conversation with the WebSocket itself.

# First off, I was running SimpleHTTPServer from a directory with a dummy html file
$> python -m SimpleHTTPServer 8888
# On the terminal, listen on port 35000
$> netcat -l 35000

And then try to make a connection from the JavaScript side

# From a JavaScript terminal from that dummy html file, in Chrome or Safari for example
ws = new WebSocket("ws://localhost:35000");
ws.onopen = function() { window.console.log("Hello!"); };
ws.onmessage = function(event) { window.console.log("Received " + event.data); };
ws.onclose = function() { window.console.log("Goodbye!"); };

I had figured (incorrectly) that WebSockets would work in essentially the exact same way that sockets would. This is what we’d then expect to receive:

GET / HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: localhost:35000
Origin: http://localhost:8888

It was only after realizing that this is what the WebSocket was sending that it became clear that the reason that no connection was actually happening was because the browser wasn’t getting the rest of the handshake:

HTTP/1.1 101 Web Socket Protocol Handshake\r
Upgrade: WebSocket\r
Connection: Upgrade\r
WebSocket-Origin: http://localhost:8888\r
WebSocket-Location: ws://localhost:35000/\r
WebSocket-Protocol: sample\r\n\r\n

Oddly enough, I can’t seem to get netcat to play nice sending the response code, but it can easily-enough give us an indication of what was happening at first. Python can handle the rest for us:

$> python
>>> import socket
>>> s = socket.socket()
>>> s.bind(('', 35000))
>>> s.listen(1)
>>> client, info = s.accept()
>>> client.recv(1024)
'GET / HTTP/1.1\r\nUpgrade: WebSocket\r\nConnection: Upgrade\r\nHost: localhost:35000\r\nOrigin: http://localhost:8888\r\n\r\n'
>>> client.send('HTTP/1.1 101 Web Socket Protocol Handshake\r\nUpgrade: WebSocket\r\nConnection: Upgrade\r\nWebSocket-Origin: http://localhost:8888\r\nWebSocket-Location: ws://localhost:35000/\r\nWebSocket-Protocol: sample\r\n\r\n')
199

And notice that it’s not until the listening socket sends its response portion of the handshake that the JavaScript console will print the “hello” we told it to upon opening the connection. Now that we have the two talking, let’s actually send and receive a couple of messages:

# Again from the JavaScript console
ws.send("Hello from JavaScript!")

And then we receive the message in Python:

>>> client.recv(1024)
'\x00Hello from JavaScript!\xff'

And here we find another oddity of WebSockets. All messages seem to be prepended with \x00 and appended with \xff (though this is also mentioned in the specification). If we try to send a message from Python without these two extra characters, we’ll get nothing out on the JavaScript side (go ahead and give it a try!):

# We get nothing on the JavaScript side :-/
client.send('hello')
# Magic happens!
client.send('\x00hello\xff');

Of course I’m sure there are other robust tools the real “networking guys” use to make their lives easier. But outside of pulling up WireShark or trying to figure this out by writing C code, netcat and Python definitely saved the day.

Tagged with:
 

The Pains of Nightly Builds

I’ve been working with WebKit a lot lately. Specifically, WebGL. It’s fantastic, and it has the potential to change a lot of important applications, but working with nightly builds is… well, frustrating.

I’ve been using a revision released in December. Some later revisions have weird behaviors and draw pixelated, but I’ve finally come across a problem with r52426 that I just can’t circumvent – floating-point textures. They just don’t seem to work properly. I had noticed the bug before, but it wasn’t absolutely crucial before. Rather, the real bug is that you can’t initialize floating-point textures from the JavaScript side.

They’ve fixed this in later versions, but many of these that have this bug fixed also have a problem with gl.viewport, which is pretty mission-critical. So now I’m sifting through the changelog, hoping to find a release somewhere in the middle that isn’t completely broken.

Update: Victory! If you’re going to be developing WebGL stuff on Mac, I’d recommend WebKit r53036. It seems to have everything I need (so far) fixed.

Tagged with:
 

Tech Addiction

Several days ago, Gizmodo posted a quiz about technology addiction. Somewhat nervously / reluctantly looked through the questions, and was pleased to find that I fit into their “Coffee Fiend” category. Well, rather, I was pleased that I didn’t fall in any of their worse categories, despite working with computers for a living. There are even some places I could do with shaving down that would get me into their “Social Drinker” category; I see this as proof that one still can be in a tech field and not be a total tech fiend.

Some of their more amusing / revealing / interesting questions were:

2. Do you sometimes bring your laptop when you sit on the toilet?

Just for the record, I do not. But, I know people who do, and let’s face it – that’s why the iPhone was invented in the first place.

8. Have you ever changed vacation plans based on wi-fi availability?

Well, technically yes, I have done that. But it was deciding between two capsule hotels, and I was going to be there for a week for a tech conference. I think that’s justified.

10. If your house were on fire, would you run in to rescue your laptop?

I have sort of done this already. Fire alarms are a regular feature of KAUST, and one day we saw smoke coming from the building we were in as one went off. Ben and I looked at each other for 2 or 3 seconds, then both immediately grabbed our laptops and dashed out. In all fairness, most of my work is on there (I’m doing more off-site backups for the important stuff), and it’s the only computer I have in this country. What is life without computer?

35. Do you tweet or read blogs while watching movies at home?

Well, I do play Tetris. That’s my tech meditation time.

Tagged with: