relativistic observer: programming

Showing posts with label programming. Show all posts

Sunday, January 20, 2013

Hard Problems, Part 2

Some problems are hard nuts to crack. We have discussed this one before. Other problems are simply statically complex. As we will see, when things get hairy, there are still choices to make and techniques which work every time.

A statically complex problem has so many cases that it becomes daunting just to imagine a solution. This comes to mind because, of course, I am working on one of these problems right now. But I can't discuss it. Instead, we will look at another such problem and what it took to solve it.

In the 1980 I was working on a three-dimensional problem: rendering solid fractals. First Benoit Mandelbrot and then Loren Carpenter made famous the construction of mountains using fractal techniques. These were so-called plane-to-line functions, where the domain was a plane and the range was a line.

How to make nice fractals

In plainer English, you could create a mountain by making the height (z) a function of x and y. So, x and y are subdivided, and each time you subdivide, you add some random amount of z that depends upon the size of the x,y patch. In this way your mountain has a predictable fractal dimension. But now I would create such a thing by using Fourier transforms in the frequency domain: create a circular spot, transform it from the spatial domain to the frequency domain, randomize its phase, then transform it back into the spatial domain. This is shown in my blog post Textures, Part 2.

Once this technique is demonstrated, it becomes tempting to implement it. Which, of course, I did! I remember going to SIGgraph in 1980 and bringing a poster created at Calma on a Versatec raster plotter of some fractal mountains. Each facet was shaded using dither patterns, which I coded specifically for the poster. I presented it to Benoit Mandelbrot.

I'm sure Mandelbrot could've cared less about my rather amateurish fractal forgeries. There were so many people there, also. But I was undaunted and continued to play with fractals.

And so here comes the problem: what if you make a space-to-line function?

A space-to-line function is a function where w can be evaluated as a function of x, y, and z. Imagine that every point in space has a w value. And that w varies fractally. This construction can make clouds and rocks and all sorts of things!

So I decided to create a renderer that could display such a thing. The first way it could be displayed was simply to facet the zeroset of the surface, where w = 0. In a plane-to-line function, the zeroset z = 0 is simply a coastline. In a space-to-line function, the zeroset w = 0 is an actual surface of the object.

The problem can then be refined to this: how can you render the zeroset of the space-to-line function?

My answer to the problem was an early implementation of what is now called Marching Cubes. This technique was invented later by Lorensen and Cline and published in the 1987 SIGgraph proceedings. But I was using it in 1984. I won't say I invented it. Actually, it's a rather obvious solution.

My solution to the problem was to split the volume into small cubes. By evaluating the function for w at each corner of the cube, it will be found to be less than or equal to zero (a 0 bit) or greater than zero (a 1 bit). Since each cube has 8 corners, this was conveniently an 8-bit number.

Thus there were 256 cases to consider. That's a lot of cases, hence it was statically complex.

This is the first time I ever wrote a program to write the program to solve the problem. The program that generated the code for marching cubes was cloudtst.c, written in C on February 6, 1984, designed to run on a DEC VAX 11-780.

When a problem is this complex, it's hard to get every case right and consistent with every other case. This leads to the need to make sure each case is correct by having each case be generated by a program.

I categorized the eight bits into topological cases. And then I used the program to decide, for one of the 256 cases, which topological case it fell into. Then I mapped the solution for the topological case onto the 8-bit case.

Actually, the way I framed the problem was even more clever, since I allowed myself the luxury of subdividing a cube into 8 sub-cubes and faceting them. This meant that I could recursively subdivide the volume I was rendering, only subdividing it where it needed to be subdivided. This would minimize the total number of facets output and thus cut down on the time to render the fractal volume.

Code that writes code

It sounds a bit recursive, doesn't it? But there are several advantages to writing code that writes code. I will discuss them now.

The first and most obvious thing is this: if a program writes the code, then you won't have to. This is advantageous because there might be a lot of code to write. With a lot of complicated cases. And a program can make sure that each case is correct. So you won't have to. All you have to make sure of is that the program that analyzes the cases will get the right answer in each case.

The second advantage happens when you are generating a lower-level language. The program can see to all the details. This is useful when optimizing something. Lower-level languages are inherently more efficient because they are closer to the machine they run on. There is no hidden complexity. Each instruction is atomic. If you have to break each operation into smaller operations, the program can do it slavishly for you. This is the reason that people write compilers.

The third and most important advantage is also useful to you. To fix bugs, even those occurring in several cases, you only have to change the program that writes the code, not all the cases. This turns out to be characteristic of the code-writing-code process. Each bug in the generated code can be attacked one-by-one in the generating code. And eventually a perfectly correct program is the result. This can be done without tediously fixing each case in the generated code one at a time. Actually, many cases in the generated code can come from one part of the program, so a single bug fix in the generating program can fix several bugs in the generated program. This illustrates the power of this approach.

What language should you use? You have two choices to make. First, you choose a language to write the generating program in. This is usually a high-level language. One that you can work in with extreme freedom and ease. This can be something as utilitarian as C or ripe in data structures like C++. It can also be Python, a common choice nowadays. The second choice is the language used for the program you are generating. This should generally be chosen for its efficiency. Sometimes you do not have a choice. For instance, I have written code that generates programs in various forms of assembler, in C, in fragment shaders, and in OpenCL.

Saturday, October 13, 2012

How Old Is Your Software?

Let's look at software vulnerability. What kinds of software are the most vulnerable?

Well, duh! The oldest, most crufty kinds of course! Whenever you add onto software year after year, you unwittingly create opportunities for exploitation. We say that our data are secure, yet we do not test software in anywhere near the rigorous fashion it requires!

This leaves us with highly-functional yet completely-vulnerable software. And the users don't even realize it. Business users, corporate users, individual users, you.

Which Software is the Most Vulnerable?

Means: Programmers only need to be connected to the Internet and have a computer capable of being programmed to become a hacker. This makes up basically every person on the planet in all but the seriously developing nations. So let's just say there is a large sample set of possible hackers.

Motive: To be vulnerable, you also have to be hiding something desirable, interesting, or perhaps embarrassing. In other words: valuable to someone who just needs some street cred. What holds this kind of data? Your computer, your hard disk, your database, managed by operating systems, software that routinely gets installed or updated, things like distributed database server software also that protect huge amounts of data. For more motives for hacking, see my first blog post on Hackers.

Opportunity: So, let's look at software that has enjoyed release after release year after year. These releases are generally done for the purposes of:

increasing their feature set
making them faster
fixing their security holes

So let's examine systems which do this. Operating systems, like Windows, Mac OS X, iOS, and Android certainly are updated quite often. System software for supporting desirable things like videos are updated often as well, like Adobe's Flash. So are things like their suite of programs the Creative Suite. In business, the Oracle SQL Server is updated quite often also, to add features and, more often, to patch vulnerabilities. Programming capabilities like Java site updated a lot also. Even GNU, the Free Software Foundation's operating system, which declares proudly that GNU's Not Unix (though it is identical to it in every way I can see) is updated quite often.

These are the most vulnerable software systems on the planet, merely because they are updated so often. And because so many people and businesses use them.

What Makes These Vulnerabilities?

The best positive marketing driver is the first one: increasing their feature set. To do this, it is often necessary to allow other developers to add to their feature set. We see this in nearly every OS platform in history. Supporting Applications. Allowing Plug-ins. Enabling programmability.

Being able to program something is highly desirable. It is also exactly what causes the vulnerabilities.

In 1984, I bought my first Macintosh. Actually it was an original 128K Mac. And the first thing I did was to take it apart, with a long Torx screwdriver and some splints to crack open the shell. My business partner in Fractal Software, Tom Hedges, was doing the exact same thing in the very same room. We both came to the conclusion that it needed a real hard drive, which was an interesting hardware task. We also came to the conclusion that we wanted to program it.

I wanted to create a new application.

We met an Apple person, Owen Densmore, at Siggraph that year and he put us in touch with a key developer, Bill Duvall, who had built the Consulair C system with a text editor. Owen gave us the external terminal debugging capability, called TermBugA, that we could use to debug our applications. He put us in touch with Steve Jasik, who authored MacNosy, and had disassembled the entire ROMs in a Mac. We built our first apps for the Mac within a couple of weeks and began our development career.

This is the old school method. The very ability to program a device has a name now: pwn. This means "owning it" but it also has a whiff of programmability to it.

If a device is a computer of any kind, then the desire to program it freely is a natural consequence of these old school ways.

But those ways must change.

How Are The Vulnerabilities Exploited?

The goal is to become a privileged user on the computer. This will enable the hacker to install their programs, get access to whatever data is available without restriction, and basically to take over the computer. Once this is done, then malware can be installed. Things that log your keystrokes. Or watch you through your webcam. Or check which web sites you use, remembering whatever passwords you use to access them.

This enables them to steal your identity or your money. Or you can be blackmailed with whatever incriminating data is present. In other words, criminal activity that exploits you, your business, or your customers.

But overwhelmingly, your computer can become something that is not under your control and can be used as a base for expansion, virus propagation, or as a machine to support DDoS attacks as well.

How do they get control of your computer? Often it is with a very small bug.

Now, software above a certain size always has bugs in it, and that's the problem in a nutshell.

The kind of bugs that hackers look for are primarily buffer overrun bugs. Because all machines are Von Neumann machines, data is stored in the same place as code. This means that all the hacker needs to do is insert their code into your system and transfer control to it.

A buffer overrun bug allows them to do this because, by definition, once a buffer (a fixed-size place in memory to store data) is overrun then the program has lost control of what is going into memory. With a little cleverness, after overrunning the buffer, the data will go someplace that is a tender spot. This can cause another bug to happen or it can be a spot where program control will end up soon enough in the future.

And voilá, the hacker is running their own native code on your computer.

Their next trick is to become a superuser. This is sometimes referred to as becoming root. These terms come from UNIX, which is the basis for many operating systems, like Mac OS X and Linux.

This can be done several ways, but the most effective way is apparently to masquerade as a routine install of familiar software. Like Photoshop, Flash, a Windows Service Pack, etc.

But the process of taking over a computer, which comprises a rootkit, is often a several-step process.

Perhaps the computer becomes a bot, simply running jobs for the hacker: sending email spam at random times, using the computer's position in the network to attack other local computers, making the computer be part of a Distributed Denial of Service (DDoS) attack.

Perhaps the hacker only wants to get the data in that computer. The easiest way is to gain superuser access, and then you have the privileges to access all the files. Maybe the hacker just wants to watch the user and gain information like bank account numbers and passwords.

Sometimes the hacker just wants to get access to databases. The databases contain information that might be sensitive, like credit card information, telephone numbers. Since these databases are generally SQL servers, a specific kind of attack is used: SQL Injection attacks.

Poorly-written SQL can have statements in it that evaluate a string and execute it. Rather than running code with pre-specified bind variables. It is these strings that make SQL vulnerable to being co-opted by a hacker, who can modify the SQL program simply by changing its parameters. When the string gets changed to SQL code of the hacker's choice, it can be executed and the hacker can, for instance, extract all of the database records, instead of the usual case where the records on certain date may be accessed. Or the hacker can change the fields that get extracted to all the fields instead of a small number of them.

How Do We Combat This?

It is easy to say there is no way to fight system vulnerabilities, but you would be wrong.

The strongest way to stop it is curation. One form of curation is the ability of a supervisor to prevent malware from becoming installed on a system. When a system allows plug-ins and applications, these must be curated and examined for malware and the backdoors and errors that allow malware to take hold. And they must be limited in their scope to prevent conscription of the operating system and applications that run them.

In the case of Apple, curation means examining every App built for its platform for malware or even the whiff of impropriety. And this is a really good thing in itself, because it means that far less malware attacks iOS than does Android.

In the case of SQL injection attacks, rewrite your SQL to not use executed strings.

But general practices need to be followed religiously. Make sure your passwords are not guessable. Use firewalls to prevent unintended connections. Beware phishing attacks.

Sunday, March 18, 2012

Overly-ambitious Projects

Sometimes we underestimate how difficult a task is. Like nearly all of the time. But this is not why we choose overly-ambitious ways of solving problems. So let's talk about why.

I have tackled several overly-large projects and sometimes I have chosen to split them up along natural lines. When handling large projects, not all of them are easily split up by top-down decomposition.

Let's first look at a project that can be split up: Overhaul Painter's user interface (UI). First off, why did we do it?

Painter's interface was getting too complex, and needed organization. It also needed to be easier and more accessible. More items in the interface simply called for a better way of organizing it. So this was a redesign borne out of necessity.

But by then with the features of Painter 2.0 and also Painter/X2 in one program, it looked like a big task. We needed to split it up.

This could be divided up, a little bit, and, in the development time frame we had at our disposal, John Derry and I sought to split it up between design and implementation. Just so we could get to the independent implementation part of the job. It sounded like an easy task: just basic planning. But it wasn't easy at all.

Redesigning the Painter 3 Interface

First we examined the items that needed to be in the interface. Then we organized them according to function and sometimes according to property. We worked out some basic areas: choosing brushes to use, choosing materials to paint with or design with, adjusting brush properties, and other. You see, things like selections (which had just been renamed paths from the original term, friskets), layers (which were called floaters by some unfortunate consequence that they had come from floating selections), and scripts had no obvious organization.

We were looking at a five-color-icon organization for these palettes. Based on the problem of icon hell: too many icons made for a totally indecipherable interface. I think all interface designers go through this.

We knew colors could be detracting, so we laid out the palettes in predominantly dark gray, with a texture to them. We wanted things to be three-dimensional and basically understandable because they were analogs of the real thing. It is possible that color was a distracting feature of the user interface and added to the complexity and detracted from the accessibility. The modern version of Painter (12) use icons that are grayscale instead. The texture that backs the palettes is a precursor of skeuomorphic design.

You see the Materials palette. With colors, papers, grads (graduations), sets (color sets) and weaves. OK the weaves was less likely to be used, for sure. But the color picker redesign was clean, effective, and artist-centric. No problem understanding it at all. Actually, it became an iconic symbol of Painter.

The brushes were actually pretty easy. We figured five brushes was a pretty common number of brushes to work with. And may even be overkill. But these were brush categories, not variants. So this was a mistake, I think.

You see the Brushes palette, but this is an early mock-up with the pop-ups not yet functional. Our idea of putting the method up front was probably wrong as well.

But the topmost pop-up was the brush variant, and this was certainly a good thing to have here.

As time went on, it was the individual customized brush variants that the artist actually used and often switched between, and that should have been the list to choose from. Painter 12 gets this right, of course. Good job, Corel!

The brush controls turned out to have a lot of sections. But in Painter 3, we had an idea of having five icons per palette, so we worked the properties into ten areas. Five were in the Brush Controls palette. This became a fairly good split, with size, spacing, bristle, randomness (now called jitter), and nozzle sections, for the image hose.

But there were many more controls, and these got organized into the Advanced Controls palette. This contained sections for the rake brush, the color well (which affected mixing of paints in the canvas, and pick-up of color into the brush), looks (brush looks, which were a customized variant plus a paper texture), sliders (which allowed for many features to be controlled by the expression contained in your brush stroke), and water (for the water color brushes).

So far so good? No. We had already made a lot of bad decisions, I think. The next palette was the Objects palette. Here you can see the sections for paths (selections), a path list (for storing paths), floaters (this was the prototype for the portfolio, a place where you can store layers for reuse within a project), the floater list (like a layers palette), and scripts.

Painter was by now recording everything you did. So you could play it back, and also have a backup of your work. Of course, it got recorded to your Painter folder, in a Scripts subfolder. No - we weren't spying on you. We called these Sessions in Painter 3.

Materials (in the Materials palette) were an interesting case to the interface. Papers, Grads, Sets, and Weaves were all materials.

The Papers section shows a drawer, a user interface drawn from Dabbler and created by John and I. The current paper is shown, and some controls are present to adjust it for use.

So materials used a drawer with material swatches in them. The drawer face acts like a Recent Items list. So the most common items will be present to use again and again. When you open the drawer, you can access more, and also go to change the library. Eventually the Materials palette got bigger, with patterns and nozzles making it a bit more useful. But I think that was Painter 4.

The final palette was the Controls palette. And for this palette there was a separate pane for each tool. So you could think of this as a mini-inspector for tools. Useful features were located here. For the brush, the opacity and grain sliders were present. Also freehand vs. straight-line radio buttons were there so you could easily make straight lines. The color swatches were useful for switching between the foreground and background color.

It was really a whole redesign.

This took most of the time. Then individual elements were designed by the pixel-meister (John) and I designed the clever way of putting it all together, called the complied interface file.

So finally, we were able to split up the work. But in so doing, we had bit off more than we could chew in the original design and decision work.

In Painter 4, we did redesign it a bit, and it was more streamlined. But the redesign was also motivated by having more things in the program.

Once we had it planned out and split up, the implementation took at least a month of grueling work by both of us.

Multiple Undo

One project that was way too ambitious was the implementation of multiple levels of undo for Painter.

This project was so large that every single part of Painter had to be redesigned and retooled so that each operation could be undone. This was particularly difficult in Shapes, the vector illustration layers built into Painter during version 4, which became a graph theoretic problem of unprecedented proportions. This was because of grouping, ungrouping, and the operations of cutting up and joining shapes. Making holes. It just went on and on.

There wasn't any clean way to split this project up. But I did implement a set of primitives (kind of a library and an API) for building undo into each operation.

At one point, I sat down with another programmer, Priscilla Shih (now Priscilla Shih Cinque) and we went over huge areas to the program that needed to be retooled for undo.

It was kind of hopeless, we realized. Now there were two programmers that were overwhelmed by the problem. And I did a particularly crappy job of describing the problem. Perhaps it was just too incoherent. I liked in those days to keep a huge amount of details in my head and I made a gigantic list of primitives to tackle for each of us.

In the end, multiple undo took two revisions of Painter to implement it properly (versions 4 and 5) and eventually it was possible to simply implement undo on each new feature we added. I'm sure the legacy of multiple undo sticks to it with every revision that gets done.

In a future post, I will discuss the dangers of add-on programming. The issue of undo will return as a cautionary tale of something that should be designed in from the very start.

Tuesday, March 13, 2012

Hard Problems

I like to solve hard problems, which is good since that's my job. This involves significant analysis, and complex problem-solving time and time again.

I have been sharing some of the details of the complex problem solving required to produce features in Painter, but for once I will discuss a tiny, almost lost bit of work I did at my current job. This involves a problem I have been seeking to solve for at least ten years: the lofting problem.

You might be asking yourself how I can talk about it, since it involves my current work. The answer is that this work is now in the public domain, since the process I am about to describe is detailed quite explicitly in US Patents 7,227,551 and 7,460,129. It is also not particularly secret or unavailable since it can be had by using Core Image, part of Mac OS X, as I will indicate.

The Lofting Problem

A Text Mask to be Shaded

If you have a bit of masked information, like text, it is classic design to create a version of the text that is 3D. I'm not talking about extruded letters, which are kind of passé. I'm talking about surface texture that makes the letters look puffy. And stand out. Here you see some masked text. Optima extra bold.

While working on something quite different, another employee, Kok Chen, and I came across something truly amazing.

We solved the lofting problem.

The Wrong Answer

Now, to introduce you to this hard problem, let me show you what happens when you try to create puffy text in Painter.

Here is the result. To do this, I used Apply Surface Texture using Image Luminance, and cranked up the Softness slider.

But all this can do is soften the edge. It can't make the softness sharper near the edge, which is required to get a really classy rendering.

The Right Answer!

Using the lofting problem solution we thought of, you get a much better rendering. The corners have a shine that goes right into them, because the curvature of the surface is continuously approaching a sharp point at the corner. Yet remains pleasingly rounded in the center.

Think of a piece of rubber or flexible cloth tacked down to a hard surface at all points along its edge, and then puffed full of air from underneath: lofted.

I think you get the point. The difference is like night and day! Before I implemented this in Core Image, this was not an easy effect to achieve, and probably couldn't even be done without huge amounts of airbrushing.

In Core Image, you might use the Color Invert, Mask To Alpha, Height Field From Mask, and Shaded Material effects to achieve this. And you might have to use a shiny ball rendering as the environment map, like I did. By the way, you can mock up most of this in Core Image Fun House, an application that lives in the Developer directory. Core Image is a great tool to build an imaging application. Just look at Pixelmator!

But How Was This Solved?

Exactly! This was a very interesting problem in constraints. You see, the area outside the text is constrained to be at a height of zero. The area inside the text is initialized to be a height of 1. This is the first image in sequence.

Now blur the text with an 8-pixel radius. It creates softness everywhere, and puts some of the softness outside the text area, because blur actually is something that causes shades to bleed a bit. This is the second image in sequence.

Then you clamp the part outside the text to be zero height again. This recreates a hard edge, and creates the third image in sequence. I have recreated this process in Painter. The clamping was done by pasting in the original mask, choosing a compositing layer method of darken, and dropping the layer, collapsing it back onto the canvas.

Next I take that same image and blur it by a 4-pixel radius. This produces the fourth image in sequence.

I clamp it back to the limits of the original text by pasting, choosing a darken layer method, and dropping, producing the fifth image in sequence.

Notice after the clamp step, the edge remains hard but the inside is becoming more accommodating to the edge, yet remains smooth in its interior.

With the next images in the sequence, I blur with a 2-pixel radius, clamp to the edge of the original mask, then blur with a one pixel radius and clamp to the hard edge of the original mask again to get the final lofted mask.

The last image in sequence is the shaded result, using Apply Surface Texture in Painter.

I should confess that these images are done at a larger scale (3X) and then down sampled for your viewing pleasure. Also, Painter only keeps 8 bits per channel, so the result was a bit coarse, and had to be cleaned up using the softness slider in the Apply Surface Texture effect.

In Core Image, we use more than 8 bits per component and thus have plenty of headroom to create this cool effect at much higher resolution and with fewer artifacts.

It becomes clear that the last several frames of this sequence look practically the same. This is because the blurs are smaller and the effect of the clamping against the edge of the original mask is less prominent. Yet these last steps become very important when the result is shaded as a height field, because the derivatives matter.

Smart People and Hard Problems

It should seem obvious, but it really does take smart people to solve hard problems. And a lot of work. And trial and error. So I'm going to tender a bit of worship to the real brains that helped shepherd me on my way through life and career. To whom I owe so much. And I will explain their influences on me.

The first was Paul Gootherts. Paul is an extra smart guy, and, during high school, Paul taught me the basics of computer programming. He, in turn, got it from his dad, Jerome Gootherts. A competent programmer at 16 (or earlier, I suspect), I owe my livelihood to Paul and his wisdom; he also conveyed to me something that (I think his uncle told him): if you are good at one thing, you can find work. But if you can be good at two things, and apply what you know and cross-pollenate the two fields, then you can be a real success. I applied this to Painter, and almost all of my life. And perhaps someday I'll apply it to music as well. Paul helped me with problems in computation and number theory, and his superb competence led me to develop some of my first algorithms in collaboration with him.

The second was Derrick Lehmer, professor emeritus at UC Berkeley. He taught me that, though education is powerful, knowledge and how you apply it is even more powerful. And it's what you do with your life that is going to make the difference. This has guided me in far-reaching ways I can only begin to guess at. Professor Lehmer ("Don't call me Doctor; that's just an honorary degree") helped me with understanding continued fractions and their application to number theory and showed me that prime numbers and factorization are likely to remain the holy grail of mathematical pursuits. Oh, and his wife Emma (Trotskaia) Lehmer, who was also present, provided many hours of interesting conversation on our common subject: factoring repunits (numbers consisting of only ones). I must credit a few of my more clever number theoretic ideas to her.

Other smart people have also provided a huge amount of inspiration. Tom Hedges had a sharp eye and an insightful mind. He taught me that no problem was beyond solving and he showed me that persistence and a quick mind can help you get to the root of a problem. Even when nobody else can solve it. Tom helped me with Painter and validated so much that I did. Without Tom, it would have been hard to ship Painter at all!

I have probably never met as smart a person as Ben Weiss. When the merger between MetaTools and Fractal Design took place, I became aware of his talents and saw first hand how smart this guy is. He can think of things that I might never think of. He confirmed to me that the power of mathematical analysis is central to solving some of the really knotty problems in computer graphics. And he proceeded to famously solve the problem of making the median filter actually useful to computer graphics: a real breakthrough. I've seen Ben at work since Metacreations, and he is still just as clever.

While Bob Lansdon introduced me to Fourier-domain representation and convolutions, it was really Kok Chen that helped me understand intuitively the truly useful nature of frequency-domain computations. He introduced me to deconvolution and also showed me that there were whole areas of mathematics that I still needed to tap to solve even harder problems. Kok retired a few years back and I sincerely hope he is doing well up north! Kok basically put up with me when I came to Apple and helped me time and time again to solve the early problems that I was constantly being fed by Peter Graffagnino. When it came time to work with the GPU and later with demosaicing, Kok Chen consistently proved to be indispensable in his guidance.

There are plenty other people who I will always refer to as smarter than myself in various areas, but they will have to go unnamed until future posts.

Sunday, March 4, 2012

The Things We Throw Away

Upon finishing the development section of a project, I remarked that I had written about three times as much code as finally got shipped. Why is that?

Non-Linear Development

You always hear about FTW moments, when somebody writes some code and it works the first time they run it. Or when somebody is under pressure and he or she comes through in a pinch in record time. Isn't this the common case?

No. That might as well be Hollywood screenwriter glamorizing the world of hackers. This perception works against us in so many ways. Don't get me wrong: I have had many FTW moments. When an attempt goes right and succeeds way past my expectations. When a piece of code that I have written, a non-trivial section of code, works without error the first time. We can relish those moments, for sure. But they are blue moons, people. Lightning strikes. And at most they are tiny little victories.

Development is a non-linear process.

This may not be the case when it comes to writing a small subroutine, or a library function. But when it comes to the difficult, nearly intractable projects, like the ones I prefer to work on, rewriting and rethinking is an essential part of the game. The scope of such projects lends itself to research, which definitely keeps work interesting. By research, I sometimes mean looking for other solutions to the problem. Or related solutions.

I have said that when a problem is deconstructed, it can sometimes come to resemble other problems. We can use alternative approaches, and then reconstruct to provide completely new solutions to the problem that may simply have never been thought of before. This is a bit of a messy way to stay creative. But it has worked time and time again.

But it doesn't work every time I try it.

And this is one reason why development is non-linear: if one method doesn't work, then we have to try another until we do find one that does. This means taking suggestions from others. And trying them out. This means looking at related fields. Turning the problem inside-out, or looking at it backwards: tracing backwards from the solution.

This technique is used when someone else has demonstrated a solution to a problem. The thought goes like this: if they succeeded at solving it, then they would have to have done this. And to do that, they would have to have figured out how to do this other thing. Generally it is a chain of certain reasoning.

But when you are solving a problem for the first time, you are not guaranteed that this technique will lead to a solution. And so development becomes non-linear again. Because you can try to find a solution this way, but you may not succeed.

Hard Problems and Some Ways of Looking At Them

It simply comes down to this: some problems are so hard that you end up grasping at straws to figure out a solution to them. And my least favorite thing is when people say "oh, that should be easy" without even thinking about the problem. And also when they forget that it is not enough to solve the problem, but you must also make it possible for others to easily use what you have created. And that the solution you create has to live in an encomium of other processes as part of a workflow that makes sense and doesn't use too many resources.

The best thing to do is to first imagine what a good solution to the problem would look like. This means knowing how it should be used and in what context it would be most useful. Finally, trying for an interface that has the least number of controls. Here the theory is that the most useful feature is one that people don't actually have to use, because it's applied automatically. This means your implementation must determine what cases there are, and then make an attempt to handle all of them.

The common cases are the first ones that must be addressed. And usually this can lead you to a simpler form of organization that helps you to classify and solve the harder, less common cases. I have learned time and time again that first gearing the solution to the most difficult cases leads to more complex solutions. Instead, try to factor the more complex cases into your simpler model.

An Example

Boolean polygon operations are an example of a complex task that seems to have an easy solution. It all depends upon your imagination as to how easy the operation actually appears. But first, what is this problem?

Diagram 1: polygon

It is the task of computing a union or an intersection between two polygons. Each polygon consists of a finite number of points in the plane, connected sequentially by line segments.

So, we make an informal data structure for a polygon based on the geometry that represents it:

point: float x, y
polygon: int nPoints, array of point

We have used two definitions to make the polygon. The first is a point, represented by an x and a y in the plane. The second is the polygon, represented by a count (of points) and an array (of points).

Diagram 2: two polygons

Now we can see the problem as Boolean operations between polygons. Boolean operations include "and", "or", and "not". You can think of "and" as intersection. You can think of "or" as union. You can think of "not" as negation.

When you apply these to polygons, we can define the operations as applying to the area inside the polygon. Might this be complicated by negation? No, we can factor this in by simply making the definition that negating a polygon is the same as making its outside to be its inside, and its inside to be its outside.

Diagram 3: the intersection

of two polygons

Ironically, Boolean operations are easy to do with pixels (bitmasks), but hard to do with an outline representation.

The intersection between two polygons can be seen to be the area that is inside both polygons. In this case, to represent the intersection as a polygon requires that we compute intersection points.

But this highlights another assumption we need to make: the result of one of these operations is itself a polygon.

Diagram 4: the union

of two polygons

The union between two polygons can correspondingly seen to be the area that is inside either polygon.

So far, to the layman, the union and intersection are operations that are familiar to people who know Venn diagrams.

The notion of inside and outside may seem complicated in itself, but it may easily be formalized by using fairly simple analytic geometry operations. We can define the inside of a polygon to be the area that lies to left of an edge when the edges of a polygon are traced in counterclockwise order.

It turns out that this definition can actually be used to define what is inside and what is outside the polygon. We'll get to that in a bit.

Diagram 5: two concave polygons

But polygons are not all convex. And this leads to the first situation that causes us to throw out previous assumptions.

We see two concave polygons here that touch, so they have a well-defined union and intersection.

The intersection and the union have special properties, both of which may not be allowed for in our definitions. For instance: the definition of a polygon is a closed list of points. And the definition that the result of the Boolean operation is a polygon.

A single polygon.

This is all well and good until we see the results of intersection and union

Diagram 6: the intersection

of two concave polygons

Here we have the intersection of the two concave polygons. The result is actually a set of two disjoint polygons.

So this forces us to revise our assumption of the result of a Boolean operation being a polygon. The result is obviously more correctly stated to be one or more polygons. Of course, if the two polygons don't actually touch, then we should update our assumption of the result of a Boolean operation between polygons to include zero or more polygons.

And this, I think is no big deal, since it involves only the result. But wait, what if we wish to operate on a result? With polygons A, B, and C, we might what to produce (A intersect B) union C. In the case where a result becomes more than one polygon, can the operation possibly be simply iterated over polygons to produce the result? No. If A intersect B produces two polygons, then the union might bridge them and still produce only one result. There is no way to get this result by simply iterating over polygons. However, another solution does exist: if A intersect B produces the two polygons, D and E, then we can evaluate (D union C) union E to produce the result. But there's no guarantee that D union C won't produce two polygons also, and so we must revise our definition of a polygon.

We define a polygon now to consist of one or more outlines, each consisting of an array of points.

point: float x, y

outline: int nPoints, array of point

polygon: int nOutlines, array of outline

Diagram 7: the union of
two concave polygons

But wait, we haven't considered the union of the two concave polygons yet.

Look! The result is also two polygons, but one polygon is a hole in the other.

And can't a hole also contain an island inside it? It looks like we are going to need a recursive definition for polygon, folks!

We will consider an area to be an outline of the edge of a polygon, and a hole to be another area (by definition negative) inside it:

point: float x, y
outline: int nPoints, array of point
area: outline, int nHoles, array of area
polygon: int nAreas, array of area

To facilitate the determination of what's inside and what's outside, we will define outlines to be oriented counterclockwise for a positive area, and clockwise for a negative area (hole).

So we had to throw out some assumptions, and also some definitions (actually our first definition!) once we realized that we had to represent holes and multiple outlines.

The Operation Itself

Now we get to the Boolean operation itself. How are we going to implement it? It seems like we can use a pretty simple rule, considering the union operation as our guide. The set of points comprising the union of polygons A and B appears to be the set of points of polygon A that are outside of polygon B taken together with the set of points on polygon B that are outside of polygon A. Well, plus the intersection points.

So we invent a rule: pick a point on one polygon that is outside of the other polygon. Trace forwards along it until you come to an intersection point. Then switch to the other polygon. Trace forwards along that polygon. When you come to the start point, you close the output polygon.

This rule comes from a simple underlying assumption: at an intersection point, edges cross. At one edge, a polygon crosses the border of the other polygon, either going from the inside to the outside, or going from the outside to the inside.

This rule gets us one of the polygons of the union shown in diagram 7, but how do we get the other outline? When we output a point to the output polygon, we must mark the point in the source polygon as used. Then we simply look for an unused point in one of the polygons that is outside the other polygon. And start another trace from there.

The process stops when there are no more unused points that qualify.

How does intersection work? We may use DeMorgan's law:

intersection(A, B) = negation(union(negation(A), negation(B)))

Practically, this means we need to reverse both polygons, compute their union, and reverse the result. When we reverse a polygon, its inside and outside become swapped topologically.

This also gives us the way to chop one polygon out of another.

Flies In the Ointment

Diagram 8: self-
intersecting polygons

There are a couple of things that will mess up this neat solution. The first is a self-intersecting polygon.

If we use the winding number rule to define what is inside the polygons, then it probably is useful to first convert these cases to a case that isn't self-intersecting. For the bow-tie, this will still result in two polygons that touch each other at a common point. For the flying-V, this will result in one easy (but concave) polygon.

Also, polygons that touch themselves from the outside of from the inside are a form of self-intersecting polygons, and must be resolved to a form that is less ambiguous.

The other problem case that needs to be considered is this: what happens when the edges of the two polygons become collinear?

I have concocted an example. For readability, I have separated the lines of the two polygons. But just imagine for a moment that they are exactly coincident at edges and points.

A solution to the intersection between these two shapes can easily lead to a polygon that touches itself from the inside. It would be nice to prevent this kind of result from being produced. Or, more simply, we can just resolve it into a non-self-intersecting polygon on output.

The Example Applied

In Painter 4, I implemented the mosaic feature. This capability defined each tile as a polygon. Each polygon was initially a 4-edge quadrilateral. But there was an additional problem of grout. The tiles could never really get closer together than a certain grout distance.

The grout distance was implemented by creating an outset polygon for every existing polygon. So a polygon was outset by the grout distance. Any new tile being placed into the mosaic was chopped by the outsets of the existing tiles so the new tile could come no closer to them than the grout distance. This required the implementation of intersection and negation.

Once a tile was chopped by one neighbor, it might be chopped by another neighbor, etc.

The good thing was that the tile polygons, at least initially, could not self-intersect. This cut down on some computation. Also, they were initially only four points. This cut down on intersection searching.

What I Haven't Actually Described

I haven't actually described the point inside polygon test, the point to left of line test, and the find all intersections between two polygons process. There exist fairly simple processes for all of them.

I also haven't described what happens to the algorithm when a polygon's edges are collinear. This can happen if three successive points fall on a line. Suffice it to say that we must remove the center point.

It's also nice to know how to orient a polygon counterclockwise. The simplest method is to determine its area using trapezoidal integration and arrange it so a positive area implies that the polygon is counterclockwise and a negative area implies that the polygon is clockwise.

To Sum It All Up

All I know is that you can spend quite a bit of time getting this programming solution right, and along the way you might change your assumptions or your internal data structures several times before coming to a fruitful conclusion.

Like I said, non-linear development.

Saturday, March 3, 2012

Intense Development

There are periods of time during a project when I don't even want to sleep. Others around me get very annoyed. But when I come out the other end, something magical can be seen. This is partly because I, thankfully, work in the realm of computer graphics. And partly because I'm a visual person who can imagine a visual result that others can appreciate.

And it's all in the demo.

There is no sleight of hand in a demo. Not when people are to be impressed. But sometimes people just don't get the value in what you construct. This is where you have to educate them, to show them the value, to connect it to something they can understand. You have to make all that obsessive development time mean something.

You need to become tolerable again.

I have talked about where ideas come from. About the different frames of mind we can be in. About how to foster creativity in the first place. But, once you get the idea and reason out how it can be implemented, there is a new phase that needs to be explored. How does this process unfold, this intense development? How does the large feature or the complex technique get implemented? How can we, as mere humans, even manage something like this? What tools do we use to do the seemingly impossible? What parts of our brains do we have to use to accomplish our goals?

Organization

The best method to tackle a large project is to get organized. I do this by taking notes, drawing pictures, and building tools.

I have found that some of the best notes to take are these:

new ideas or features that you would like to explore
problems that need to be resolved
places to look when updating to some new arrangement of the code

For most people, the note-taking process is a hassle. But you really need to start taking those notes to accomplish a project that is so big you can't keep it all in your head!

When drawing a picture, sometimes a flowchart is useful. Here we have the basic step in constructing a Laplacian pyramid. The objective is to decompose the step into smaller operations, a process known as top-down decomposition.

Here the basic step gets split into reduction, expansion, and difference substeps.

The reduction step is the process of converting an image into another image that is half the size in both width and height. And one which thus does not contain any of the highest-frequency information in the original image. The expansion step is the process of resizing the half-sized image back into full size. This image will be blurrier than the original by definition. The difference step is the process of determining the differences between the original full-sized image and the blurred full-sized image. These differences form the highest frequency detail in the image.

This step can be repeated to create a quarter-sized image and a half-sized detail image.

So not only is the image decomposed into various frequency bands, but the process of decomposing the image has also been decomposed into steps!

Rational Processes

Using your rational mind is partly deduction, and partly experience. For instance, when you implement a gradient operation, experience tells you that the center of a line has a zero gradient, and either side of the line has a non-zero gradient. As a practical demonstration of this, consider the Painter brush stroke. It is from an airbrush at high opacity with a 2 pixel diameter: a typical thin line.

If you compute the gradient using a Sobel technique, each 3x3 neighborhood of the image is convolved with two 3x3 kernels. There are variations on this theme, but usually the kernels will look something like this:

1 2 1 -1 0 1
0 0 0 and -2 0 2
-1 -2 -1 -1 0 1

The first kernel is for computing gradients in the y direction (horizontally-oriented edges) and the second gradient is for computing gradients in the x direction (vertically-oriented edges).

Convolution means multiplying each element of the kernel with corresponding pixel in the actual neighborhood in the image and forming a sum of the products.

You do that for both kernels, producing two sums, which you can imagine to be the x and y value of a vector field. The gradient is simply the magnitude of that vector.

The result of this is a gradient like you see here. Notice that the center of the line has an empty space in it, corresponding to a zero edge.

My rational mind already knows this through experience. So this means that if I want to use the gradient as a mask, and process the center pixels of the line, I will have to do something to fill in the center of the gradient. Like an annealing operation (a blur followed by an increase of the contrast or exposure of the gradient).

A rational mind mixed with the ability to visualize is probably the best way to get image processing operations done the quickest. But there are times when visualizing is not enough. We must see the intermediate results and check that they are being produced correctly and effectively. This brings us to the next technique: building tools.

Building Tools For Visualizing and Debugging

Any process in image processing, no matter what it is, will have intermediate results. There will be a blurred buffer, morphology applied to something, a gradient, a vector field, some representation that needs to be visualized. And we may need to verify that each step is being accomplished correctly, or verify that the step is even doing what we imagined it would, and is thus useful in the process of finding a solution.

So we need to construct a tool to see the intermediate results, to study them, to inspect them, and to debug their construction when your idea of what they should look like does not match what you get.

I have done this time and time again with large projects I have worked on, and it has enabled me to make much faster progress on a large project. And with a tool such as this, it becomes another thing: your demo environment. Not only can you see what's happening, but others can as well.

In order for a demo to come off smoothly, your implementation has to be fast as well. This means that you will need to implement selective update, and also you will need to make it go as fast as possible through optimization.

It doesn't matter what kind of project you are working on. You will always need to demo to justify your continued work. You will need to show progress. You will need to convince people that it can be done.

Tool construction (a testbed with demo capability) is your best tool to accomplish this!

Choosing the Best System to Build On

When constructing an image processing tool that involves steps, intermediate results, complex staging, or heavy computation, you need to choose a system to build it all on top of. For my purposes, I am considering a Macintosh as my system platform. But there are APIs and methodology that apply to any task.

Core Image is a good API for image processing, when your result is constructed one pixel at a time. It can allow you to utilize a GPU or a multi-core CPU to get the job done, and it can render the task of constructing a pass on your data into a simple thing. This is highly desirable when you have a lot of passes to construct. Core Image kernels are pretty easy to construct. You can reference any number of source images, but you may produce only one pixel in the destination image. This conceptually works pretty easy for blurs, color operations, compositing operations, and even transitions. You can build Core Image filters on top of your operations, and their parameters are entire images. And settings for your operations.

OpenGL is a good system for doing computation and presenting that computation inside a texture on screen. When this texture is transformed in 3D, as in "onto a 3D object" then this is the ideal API to accomplish the task. OpenGL may also be used for computing results on 2D flats that are presented using an orthographic projection. The computation can occur using almost any OpenGL operation or it can occur using a fragment program. This is conceptually the same as Core Image, so there is not much value in going the OpenCL route unless textures are going to be transformed in 3D.

OpenCL is a good system for doing arbitrary computation using the GPU and the CPU. You can support multiple output buffers as well as multiple input buffers. This means that come simulation operations are easier. Also, things like scatter and gather to and from planar color formats are much more natural. For instance, conversion of RGB to YCC where the Y is kept separate from the CbCr information can be supported very easily. One RGB image input, two images, one Y ands the other CbCr output.

Multi-core CPU computation is another good method to get things done fast. Here you can use Grand Central Dispatch to easily queue your computation on multiple CPUs. It has never been easier.

The Dangers of Obsession

You can get buried in a project. It can overcome you. This can have a very confusing effect. Unless you disentangle yourself from it for a while and take a step back, you run the risk of becoming irrevocably lost.

Back in my Caltech days, there were those people who were interested in Dungeons and Dragons (D&D). This sometimes resulted in people becoming obsessed with the rule systems and the immersive game-play.

And sometimes people just got lost. First they forgot to shower, neglecting their basic cleanliness. Then they showed the effects of malnutrition: the endless supply of Coke and little white powdered-sugar donuts. They started talking about fifth-level clerics and trolls. They always carried those little clear twelve- and twenty-sided dice around with them. And one day they didn't come to class. And never appeared again.

These were good, perhaps weak-willed people who were casualties of war. The war against obsession.

Yet I also saw people get obsessed in technical and scientific matters. These were called grad students. They would work on their thesis obsessively, disappearing into a dark cave until they came out with something hard and shiny like a diamond. I observed that obsession had its value, it seems.

Buried in Complexity

You can add more and more to a program over a period of many months. This is called add-on programming. And it can lead to another problem: complexity. A haphazard programmer can continue to kludge up a piece of code using branching and questionable data structures. This can lead to spaghetti code: twisty passages all alike.

The only solution to this problem is rethinking it: it must be rewritten. There is no other way if it is to be modified in the future. If you were adding more and more stuff to it, then this is a virtual certainty. At this point it is time to develop the right control structures and data structures to render the solution in the most effective and extensible way.

Immersive Programming

At some point you will need to debug what you have created and make it work. This requires total immersion. The better you have organized your code, the easier it will be to understand the processes it uses and thus to figure out which steps are correct and which are incorrect. This is the process of debugging.

It's like putting your head into the code and visiting codeland.

One thing is sure: you better have your head on straight when you debug a large project the first time. This will be when your organization and rethinking of control and data structures will pay off.

SOmetimes when debugging a project it becomes clear that there is a logic flaw in the code. This can be a small one, like an off-by-one error, or some statements that are out of order.

Or it can be a very large problem indeed. One with huge ramifications for the code.

My advice is to fix it before going any further, no matter how sweeping the implied changes are.

To Sum It All Up

Once you have been through fifty or so large projects, you begin to see patterns much more clearly. Perhaps you can profit from some of the patterns I have found, and some of the cautionary tales.

All I know is that I mostly had to learn these things the hard way.

Sigh.

Wednesday, February 22, 2012

Respecting the User's Input

I had the privilege of working at a Computer-Aided Design (CAD) company, Calma, in the late 70s, and thus I experienced first-hand the early years of computer graphics. And it was at Calma that I learned a principle that has served me well ever since: respect the user's input.

The users of the Calma workstations used a CalComp Talos table digitizer. You would tape your sheet of VLSI artwork down on the digitizer and then you could digitize the traces (conduits of electrons on the surface of the chip you were designing) using a puck. Digitizing was a tedious process and eventually this process was eliminated by allowing the designer to build their design directly on the CAD system in the first place. Which is obvious, of course. But computers were really primitive back then and so this was a necessary stepping stone.

It was considered massively incorrect to require the operator (what the user was called back then) to digitize a point more than once. The problem occurred so often, that the users called it double digitizing. This was clear to me, since when I wrote something, I was also its first user when testing it. And double digitizing was incredibly bad because it required extra work. And it didn't respect the user's input.

As time went on, I became the author of the graphic editor (GED) on the Calma GDS-II system. You may not recognize this system, bit it was used to design the Motorola 68000, so you probably know its effects. GED fulfilled the promise of digitizing the geometry within the system, although it wasn't unique, and we all considered this requirement to be obvious. When I wrote it, I paid particular attention to caching the user's input so I wouldn't require anything at all to be entered twice. Even a closure point. Because by then, tablets were being used (but with wire-connected styluses) and often it was hard to hit the same point twice. So, I invented a snapping distance specifically so you only had to get close to the starting point and still specify closure.

Constraint-based systems were still in their infancy in 1978.

Programming

As I did more programming, I became aware of another corollary to the principle of respecting the user's input. It was called the Perlin principle of modular programming: we hide the similarities and highlight the differences. Sometimes the similarities are simply encapsulated in classes or modules. Usually the differences become parameters to the modules, or the differences are wrapped up in the way you call the modules or use the classes. Here the user is the programmer and their input is respected in some ways because they won't have to write something twice. There is another way to respect the programmer's input: transparency. When an API (application programming interface) is complicated, then it becomes less transparent and the possibility of bugs can increase. On the other hand, when an interface is clean and well-defined, the programmer doesn't have to learn too many things, or understand the assumptions made by the writer of the API. All these things make fewer problems and also fewer chances for bugs, because the user's model of the code is very close to the way the code actually works. In this way, the programmers intent is better respected.

MIDI

Later on, in the late 1980s the user input became more complex. In 1988, I built a program for recording and playing back MIDI data from keyboards and drum machines. I called it MIDI '88 and it's a project that really has never seen the light of day, except for in my studio. Well, I did show it to Todd Rundgren once. And, of course, John Derry.

To get the MIDI data, I had to interface to a driver and write an input spooler. This spooler, implemented as a ring buffer (seen to right), kept all the data around without dropping samples and events. In this way, I was respecting my input to the keyboard and allowing me to record it. I recorded the events with a millisecond-accurate timestamp. And this was the beginning of my quest to accurately preserve the user's input and faithfully reproduce the user's gestures, allowing the style of the user to come through.

Tablets

When I first got a Wacom tablet in 1990, I sought to do the exact same thing: create an input spooler for the tablet data, and timestamp the data accurately. But with the Wacom, there was pressure data also that needed to be captured. And then there was tilt and bearing information.

The input spooler captured all these things.

But I soon learned that the data I got from the tablet wasn't perfect. It didn't really accurately capture the user's input at all. One source of the error was temporal aliasing. So I learned that input data often had to be massaged.

Some tablets had very few samples per second and others had quite a few. But, if you assumed the data from these tablets are regularly spaced, you could get kinks and ugly straight lines in your brush strokes. So I invented a limited FIFO that smoothed out these time-and-space irregularities. And I had to pay special attention to the tight turns in the path of the stylus. Changing the extrema of a brush stroke was highly undesirable since it made common actions like cross-hatching very hard. And sketching simply looked wrong if too much processing was done on the user's input. Usually, but not always, the extrema of the stylus' path was a place where velocity was at a minimum.

But, conversely, when the stylus was moving fast, less control was exerted on the actual position of the brush. So I could afford to use parametric cubic interpolation to create more points in the brush stroke in fast-moving sections of the brush stroke. This was a good thing, because there were fewer points per inch in fast-moving sections due to the fixed sampling interval: when your hand moves the stylus faster, the sample points are spaced farther apart.

All this made for smoother scribbling in Painter.

When John Derry came to Fractal Design in 1992, his enthusiasm for mark-making actually meshed quite well with my desire to accurately capture the input. It made Painter a very good tool indeed for an artist for these reasons.

We perfected the motto with respect to mark-making: to accurately capture the nuances of human expression and faithfully reproduce them, allowing the artist's style to come through.

It is this statement that I stumble through towards the end the video in my early post Creativity and Painter, Part 1. Ah, the early days.