relativistic observer: Intense Development

Saturday, March 3, 2012

Intense Development

There are periods of time during a project when I don't even want to sleep. Others around me get very annoyed. But when I come out the other end, something magical can be seen. This is partly because I, thankfully, work in the realm of computer graphics. And partly because I'm a visual person who can imagine a visual result that others can appreciate.

And it's all in the demo.

There is no sleight of hand in a demo. Not when people are to be impressed. But sometimes people just don't get the value in what you construct. This is where you have to educate them, to show them the value, to connect it to something they can understand. You have to make all that obsessive development time mean something.

You need to become tolerable again.

I have talked about where ideas come from. About the different frames of mind we can be in. About how to foster creativity in the first place. But, once you get the idea and reason out how it can be implemented, there is a new phase that needs to be explored. How does this process unfold, this intense development? How does the large feature or the complex technique get implemented? How can we, as mere humans, even manage something like this? What tools do we use to do the seemingly impossible? What parts of our brains do we have to use to accomplish our goals?

Organization

The best method to tackle a large project is to get organized. I do this by taking notes, drawing pictures, and building tools.

I have found that some of the best notes to take are these:

new ideas or features that you would like to explore
problems that need to be resolved
places to look when updating to some new arrangement of the code

For most people, the note-taking process is a hassle. But you really need to start taking those notes to accomplish a project that is so big you can't keep it all in your head!

When drawing a picture, sometimes a flowchart is useful. Here we have the basic step in constructing a Laplacian pyramid. The objective is to decompose the step into smaller operations, a process known as top-down decomposition.

Here the basic step gets split into reduction, expansion, and difference substeps.

The reduction step is the process of converting an image into another image that is half the size in both width and height. And one which thus does not contain any of the highest-frequency information in the original image. The expansion step is the process of resizing the half-sized image back into full size. This image will be blurrier than the original by definition. The difference step is the process of determining the differences between the original full-sized image and the blurred full-sized image. These differences form the highest frequency detail in the image.

This step can be repeated to create a quarter-sized image and a half-sized detail image.

So not only is the image decomposed into various frequency bands, but the process of decomposing the image has also been decomposed into steps!

Rational Processes

Using your rational mind is partly deduction, and partly experience. For instance, when you implement a gradient operation, experience tells you that the center of a line has a zero gradient, and either side of the line has a non-zero gradient. As a practical demonstration of this, consider the Painter brush stroke. It is from an airbrush at high opacity with a 2 pixel diameter: a typical thin line.

If you compute the gradient using a Sobel technique, each 3x3 neighborhood of the image is convolved with two 3x3 kernels. There are variations on this theme, but usually the kernels will look something like this:

1 2 1 -1 0 1
0 0 0 and -2 0 2
-1 -2 -1 -1 0 1

The first kernel is for computing gradients in the y direction (horizontally-oriented edges) and the second gradient is for computing gradients in the x direction (vertically-oriented edges).

Convolution means multiplying each element of the kernel with corresponding pixel in the actual neighborhood in the image and forming a sum of the products.

You do that for both kernels, producing two sums, which you can imagine to be the x and y value of a vector field. The gradient is simply the magnitude of that vector.

The result of this is a gradient like you see here. Notice that the center of the line has an empty space in it, corresponding to a zero edge.

My rational mind already knows this through experience. So this means that if I want to use the gradient as a mask, and process the center pixels of the line, I will have to do something to fill in the center of the gradient. Like an annealing operation (a blur followed by an increase of the contrast or exposure of the gradient).

A rational mind mixed with the ability to visualize is probably the best way to get image processing operations done the quickest. But there are times when visualizing is not enough. We must see the intermediate results and check that they are being produced correctly and effectively. This brings us to the next technique: building tools.

Building Tools For Visualizing and Debugging

Any process in image processing, no matter what it is, will have intermediate results. There will be a blurred buffer, morphology applied to something, a gradient, a vector field, some representation that needs to be visualized. And we may need to verify that each step is being accomplished correctly, or verify that the step is even doing what we imagined it would, and is thus useful in the process of finding a solution.

So we need to construct a tool to see the intermediate results, to study them, to inspect them, and to debug their construction when your idea of what they should look like does not match what you get.

I have done this time and time again with large projects I have worked on, and it has enabled me to make much faster progress on a large project. And with a tool such as this, it becomes another thing: your demo environment. Not only can you see what's happening, but others can as well.

In order for a demo to come off smoothly, your implementation has to be fast as well. This means that you will need to implement selective update, and also you will need to make it go as fast as possible through optimization.

It doesn't matter what kind of project you are working on. You will always need to demo to justify your continued work. You will need to show progress. You will need to convince people that it can be done.

Tool construction (a testbed with demo capability) is your best tool to accomplish this!

Choosing the Best System to Build On

When constructing an image processing tool that involves steps, intermediate results, complex staging, or heavy computation, you need to choose a system to build it all on top of. For my purposes, I am considering a Macintosh as my system platform. But there are APIs and methodology that apply to any task.

Core Image is a good API for image processing, when your result is constructed one pixel at a time. It can allow you to utilize a GPU or a multi-core CPU to get the job done, and it can render the task of constructing a pass on your data into a simple thing. This is highly desirable when you have a lot of passes to construct. Core Image kernels are pretty easy to construct. You can reference any number of source images, but you may produce only one pixel in the destination image. This conceptually works pretty easy for blurs, color operations, compositing operations, and even transitions. You can build Core Image filters on top of your operations, and their parameters are entire images. And settings for your operations.

OpenGL is a good system for doing computation and presenting that computation inside a texture on screen. When this texture is transformed in 3D, as in "onto a 3D object" then this is the ideal API to accomplish the task. OpenGL may also be used for computing results on 2D flats that are presented using an orthographic projection. The computation can occur using almost any OpenGL operation or it can occur using a fragment program. This is conceptually the same as Core Image, so there is not much value in going the OpenCL route unless textures are going to be transformed in 3D.

OpenCL is a good system for doing arbitrary computation using the GPU and the CPU. You can support multiple output buffers as well as multiple input buffers. This means that come simulation operations are easier. Also, things like scatter and gather to and from planar color formats are much more natural. For instance, conversion of RGB to YCC where the Y is kept separate from the CbCr information can be supported very easily. One RGB image input, two images, one Y ands the other CbCr output.

Multi-core CPU computation is another good method to get things done fast. Here you can use Grand Central Dispatch to easily queue your computation on multiple CPUs. It has never been easier.

The Dangers of Obsession

You can get buried in a project. It can overcome you. This can have a very confusing effect. Unless you disentangle yourself from it for a while and take a step back, you run the risk of becoming irrevocably lost.

Back in my Caltech days, there were those people who were interested in Dungeons and Dragons (D&D). This sometimes resulted in people becoming obsessed with the rule systems and the immersive game-play.

And sometimes people just got lost. First they forgot to shower, neglecting their basic cleanliness. Then they showed the effects of malnutrition: the endless supply of Coke and little white powdered-sugar donuts. They started talking about fifth-level clerics and trolls. They always carried those little clear twelve- and twenty-sided dice around with them. And one day they didn't come to class. And never appeared again.

These were good, perhaps weak-willed people who were casualties of war. The war against obsession.

Yet I also saw people get obsessed in technical and scientific matters. These were called grad students. They would work on their thesis obsessively, disappearing into a dark cave until they came out with something hard and shiny like a diamond. I observed that obsession had its value, it seems.

Buried in Complexity

You can add more and more to a program over a period of many months. This is called add-on programming. And it can lead to another problem: complexity. A haphazard programmer can continue to kludge up a piece of code using branching and questionable data structures. This can lead to spaghetti code: twisty passages all alike.

The only solution to this problem is rethinking it: it must be rewritten. There is no other way if it is to be modified in the future. If you were adding more and more stuff to it, then this is a virtual certainty. At this point it is time to develop the right control structures and data structures to render the solution in the most effective and extensible way.

Immersive Programming

At some point you will need to debug what you have created and make it work. This requires total immersion. The better you have organized your code, the easier it will be to understand the processes it uses and thus to figure out which steps are correct and which are incorrect. This is the process of debugging.

It's like putting your head into the code and visiting codeland.

One thing is sure: you better have your head on straight when you debug a large project the first time. This will be when your organization and rethinking of control and data structures will pay off.

SOmetimes when debugging a project it becomes clear that there is a logic flaw in the code. This can be a small one, like an off-by-one error, or some statements that are out of order.

Or it can be a very large problem indeed. One with huge ramifications for the code.

My advice is to fix it before going any further, no matter how sweeping the implied changes are.

To Sum It All Up

Once you have been through fifty or so large projects, you begin to see patterns much more clearly. Perhaps you can profit from some of the patterns I have found, and some of the cautionary tales.

All I know is that I mostly had to learn these things the hard way.

Sigh.

17 comments:

shelbyMarch 3, 2012 at 1:24 AM
"the center of a line has a zero gradient, and either side of the line has a non-zero gradient"

I was momentarily confused, until I realized from the context of convolution, that you meant 'edge' (and not end points) where you wrote 'side'.

I hope it is not too overbearing if I relate my similar experience (although I am not nearly as accomplished as you are).

Indeed I agree, it requires total immersion to be one of the great programmers. Also agree that one has to walk that fine line between burnout and productive obsession. You are spot on, that when the diet starts falling apart, that is a warning sign. I recently found that having a fulltime maid improved my sustainable interval.

I was doing 14-18 hour x 7 days from July to Dec. 2011. I mitigate my obsession with sports, mostly running. Apparently you have your artistic outlets (music, etc). I recently exhibited symptoms of neuropathy (progressive nerve damage in feet, hands, and early symptoms in face), which goes into near remission when I am not on the computer and outside doing sports for a few weeks.

I don't want to quit programming, so I am searching for balance. Recently trying to purchase a campus on a hilltop to have fresh air and view to relax the eyes and incite me to take breaks.
ReplyDelete
Replies
MarkMarch 3, 2012 at 3:18 PM
gradients, lines, edges, sides. different nomenclature. I view "ends" as what happens at the start and end of the stroke. A line divides a plane into two sides, so I naturally used that word.

A gradient function is for finding edges, though and I totally get your confusion. A line then has two edges. But I view traveling along a line to divide the plane into a left side and a right side, when moving in a specific direction along a stroke.

This notion of left and right side comes from the scalar cross product.

if a vector goes from point 0 (x0, y0) to point 1 (x1, y1), it can be expressed as (vx, vy) = (x1-x0, y1-y0). An arbitrary point p (x, y) can be seen as lying on one side or the other of the line. define (ux, uy) = (x-x0, y-y0). The point p lies to the left of the line if the scalar cross product is positive.

to_left = vx*uy - ux*vy > 0

Obsessive coding:

Yes I have been through the full project development gamut several times, over the years. In college, when I was working on a project at Calma, I would work over night and survive on Coke (the kind in a bottle) and late night dinners at a local restaurant, like El Faro (Mexican). Tom liked to go to the Burger Pit (and that was a low point in most of our dining pleasure).

But yes, diet does begin to suffer. Nowadays its Hot Pockets for programmers, or stuff from the Junk Food Machine. At Apple, they try to stuff the Junk Food Machine with healthy alternatives, but people do still have access to things like Famous Amos cookies and Snickers bars.

What they do is to provide free dinners to developers who work over the weekend and at night. This encourages a more sensible kind of obsession and keeps the programmers healthy, if you can call obsession healthy.

But, hey, somebody's got to do the programming around here! ;-)

For you, neuropathy may be a sign of diabetes (which can happen to anyone) or perhaps repetitive stress injury (in the case of Carpal Tunnel Syndrome). But to have neuropathy on feet is more likely to be some kind of diabetes-related problem. (but that's not certain in any sense of the word).

I'd see a doctor, of course, when this starts to happen. And check my blood sugar Just In Case.

Searching for balance is a good thing, though balance in the strictest sense leads to stagnation. Really, you are seeking a way to continue moving forwards. To move forwards, you do have to stop and smell the roses from time to time. This is necessary. And its the kind of balance you do need, I think.

There is a need to center as well. This can help to release anxieties that can naturally block your ability to move forwards. But, at the same time, anxiety and the release from it can also be a sustaining energy for creativity.

I use music as a way to release unwanted anxieties. To good effect, I think.
ReplyDelete
Replies
AnonymousOctober 9, 2017 at 7:15 AM
Wow! some beautiful colored pieces there mate! Those aren't chinese are they?
ReplyDelete
Replies

Add comment