relativistic observer: large projects

There are periods of time during a project when I don't even want to sleep. Others around me get very annoyed. But when I come out the other end, something magical can be seen. This is partly because I, thankfully, work in the realm of computer graphics. And partly because I'm a visual person who can imagine a visual result that others can appreciate.

And it's all in the demo.

There is no sleight of hand in a demo. Not when people are to be impressed. But sometimes people just don't get the value in what you construct. This is where you have to educate them, to show them the value, to connect it to something they can understand. You have to make all that obsessive development time mean something.

You need to become tolerable again.

I have talked about where ideas come from. About the different frames of mind we can be in. About how to foster creativity in the first place. But, once you get the idea and reason out how it can be implemented, there is a new phase that needs to be explored. How does this process unfold, this intense development? How does the large feature or the complex technique get implemented? How can we, as mere humans, even manage something like this? What tools do we use to do the seemingly impossible? What parts of our brains do we have to use to accomplish our goals?

Organization

The best method to tackle a large project is to get organized. I do this by taking notes, drawing pictures, and building tools.

I have found that some of the best notes to take are these:

new ideas or features that you would like to explore
problems that need to be resolved
places to look when updating to some new arrangement of the code

For most people, the note-taking process is a hassle. But you really need to start taking those notes to accomplish a project that is so big you can't keep it all in your head!

When drawing a picture, sometimes a flowchart is useful. Here we have the basic step in constructing a Laplacian pyramid. The objective is to decompose the step into smaller operations, a process known as top-down decomposition.

Here the basic step gets split into reduction, expansion, and difference substeps.

The reduction step is the process of converting an image into another image that is half the size in both width and height. And one which thus does not contain any of the highest-frequency information in the original image. The expansion step is the process of resizing the half-sized image back into full size. This image will be blurrier than the original by definition. The difference step is the process of determining the differences between the original full-sized image and the blurred full-sized image. These differences form the highest frequency detail in the image.

This step can be repeated to create a quarter-sized image and a half-sized detail image.

So not only is the image decomposed into various frequency bands, but the process of decomposing the image has also been decomposed into steps!

Rational Processes

Using your rational mind is partly deduction, and partly experience. For instance, when you implement a gradient operation, experience tells you that the center of a line has a zero gradient, and either side of the line has a non-zero gradient. As a practical demonstration of this, consider the Painter brush stroke. It is from an airbrush at high opacity with a 2 pixel diameter: a typical thin line.

If you compute the gradient using a Sobel technique, each 3x3 neighborhood of the image is convolved with two 3x3 kernels. There are variations on this theme, but usually the kernels will look something like this:

1 2 1 -1 0 1
0 0 0 and -2 0 2
-1 -2 -1 -1 0 1

The first kernel is for computing gradients in the y direction (horizontally-oriented edges) and the second gradient is for computing gradients in the x direction (vertically-oriented edges).

Convolution means multiplying each element of the kernel with corresponding pixel in the actual neighborhood in the image and forming a sum of the products.

You do that for both kernels, producing two sums, which you can imagine to be the x and y value of a vector field. The gradient is simply the magnitude of that vector.

The result of this is a gradient like you see here. Notice that the center of the line has an empty space in it, corresponding to a zero edge.

My rational mind already knows this through experience. So this means that if I want to use the gradient as a mask, and process the center pixels of the line, I will have to do something to fill in the center of the gradient. Like an annealing operation (a blur followed by an increase of the contrast or exposure of the gradient).

A rational mind mixed with the ability to visualize is probably the best way to get image processing operations done the quickest. But there are times when visualizing is not enough. We must see the intermediate results and check that they are being produced correctly and effectively. This brings us to the next technique: building tools.

Building Tools For Visualizing and Debugging

Any process in image processing, no matter what it is, will have intermediate results. There will be a blurred buffer, morphology applied to something, a gradient, a vector field, some representation that needs to be visualized. And we may need to verify that each step is being accomplished correctly, or verify that the step is even doing what we imagined it would, and is thus useful in the process of finding a solution.

So we need to construct a tool to see the intermediate results, to study them, to inspect them, and to debug their construction when your idea of what they should look like does not match what you get.

I have done this time and time again with large projects I have worked on, and it has enabled me to make much faster progress on a large project. And with a tool such as this, it becomes another thing: your demo environment. Not only can you see what's happening, but others can as well.

In order for a demo to come off smoothly, your implementation has to be fast as well. This means that you will need to implement selective update, and also you will need to make it go as fast as possible through optimization.

It doesn't matter what kind of project you are working on. You will always need to demo to justify your continued work. You will need to show progress. You will need to convince people that it can be done.

Tool construction (a testbed with demo capability) is your best tool to accomplish this!

Choosing the Best System to Build On

When constructing an image processing tool that involves steps, intermediate results, complex staging, or heavy computation, you need to choose a system to build it all on top of. For my purposes, I am considering a Macintosh as my system platform. But there are APIs and methodology that apply to any task.

Core Image is a good API for image processing, when your result is constructed one pixel at a time. It can allow you to utilize a GPU or a multi-core CPU to get the job done, and it can render the task of constructing a pass on your data into a simple thing. This is highly desirable when you have a lot of passes to construct. Core Image kernels are pretty easy to construct. You can reference any number of source images, but you may produce only one pixel in the destination image. This conceptually works pretty easy for blurs, color operations, compositing operations, and even transitions. You can build Core Image filters on top of your operations, and their parameters are entire images. And settings for your operations.

OpenGL is a good system for doing computation and presenting that computation inside a texture on screen. When this texture is transformed in 3D, as in "onto a 3D object" then this is the ideal API to accomplish the task. OpenGL may also be used for computing results on 2D flats that are presented using an orthographic projection. The computation can occur using almost any OpenGL operation or it can occur using a fragment program. This is conceptually the same as Core Image, so there is not much value in going the OpenCL route unless textures are going to be transformed in 3D.

OpenCL is a good system for doing arbitrary computation using the GPU and the CPU. You can support multiple output buffers as well as multiple input buffers. This means that come simulation operations are easier. Also, things like scatter and gather to and from planar color formats are much more natural. For instance, conversion of RGB to YCC where the Y is kept separate from the CbCr information can be supported very easily. One RGB image input, two images, one Y ands the other CbCr output.

Multi-core CPU computation is another good method to get things done fast. Here you can use Grand Central Dispatch to easily queue your computation on multiple CPUs. It has never been easier.

The Dangers of Obsession

You can get buried in a project. It can overcome you. This can have a very confusing effect. Unless you disentangle yourself from it for a while and take a step back, you run the risk of becoming irrevocably lost.

Back in my Caltech days, there were those people who were interested in Dungeons and Dragons (D&D). This sometimes resulted in people becoming obsessed with the rule systems and the immersive game-play.

And sometimes people just got lost. First they forgot to shower, neglecting their basic cleanliness. Then they showed the effects of malnutrition: the endless supply of Coke and little white powdered-sugar donuts. They started talking about fifth-level clerics and trolls. They always carried those little clear twelve- and twenty-sided dice around with them. And one day they didn't come to class. And never appeared again.

These were good, perhaps weak-willed people who were casualties of war. The war against obsession.

Yet I also saw people get obsessed in technical and scientific matters. These were called grad students. They would work on their thesis obsessively, disappearing into a dark cave until they came out with something hard and shiny like a diamond. I observed that obsession had its value, it seems.

Buried in Complexity

You can add more and more to a program over a period of many months. This is called add-on programming. And it can lead to another problem: complexity. A haphazard programmer can continue to kludge up a piece of code using branching and questionable data structures. This can lead to spaghetti code: twisty passages all alike.

The only solution to this problem is rethinking it: it must be rewritten. There is no other way if it is to be modified in the future. If you were adding more and more stuff to it, then this is a virtual certainty. At this point it is time to develop the right control structures and data structures to render the solution in the most effective and extensible way.

Immersive Programming

At some point you will need to debug what you have created and make it work. This requires total immersion. The better you have organized your code, the easier it will be to understand the processes it uses and thus to figure out which steps are correct and which are incorrect. This is the process of debugging.

It's like putting your head into the code and visiting codeland.

One thing is sure: you better have your head on straight when you debug a large project the first time. This will be when your organization and rethinking of control and data structures will pay off.

SOmetimes when debugging a project it becomes clear that there is a logic flaw in the code. This can be a small one, like an off-by-one error, or some statements that are out of order.

Or it can be a very large problem indeed. One with huge ramifications for the code.

My advice is to fix it before going any further, no matter how sweeping the implied changes are.

To Sum It All Up

Once you have been through fifty or so large projects, you begin to see patterns much more clearly. Perhaps you can profit from some of the patterns I have found, and some of the cautionary tales.

All I know is that I mostly had to learn these things the hard way.

Sigh.

relativistic observer

Saturday, March 3, 2012

Intense Development