relativistic observer: codebreaking

Showing posts with label codebreaking. Show all posts

Wednesday, June 26, 2013

Weaponized Computation

Ever since the early 20th century when primitive analog computers were built to help compute solutions for naval gunnery fire control and increasing bomb accuracy, computing machinery has been used for weaponry. This trend continues to accelerate into the 21st century and has become an international competition.

Once upon a time

I had an early gift for mathematics and understanding three-dimensional form. When I was 16 or so, I helped my dad understand and then solve specific problems in spherical trigonometry. It eventually became clear to me that I was helping him verify circuitry specifically designed for suborbital mechanics: inertial guidance around the earth. Later I found out in those years he was working on the Poseidon SLBM for Lockheed, so, without completely understanding it, I was actually working on weaponized computation.

This is the period of my life where I learned about the geoid: the specific shape of the earth, largely an oblate ellipsoid. The exact shape depends upon gravitation, and thus mass concentrations (mascons). Lately the gravitational envelope of the moon caused by mascons has been an issue for the Lunar Orbiters.

At that point in history, rocket science was quite detailed and contained several specialized areas of knowledge. Many of which were helped by increasingly complex calculations. But there have been other fields that couldn't have advanced, where specific problems couldn't be solved, without the advances in computation. Ironically, some basic advances in computation we enjoy today owe these problems for their very existence. Consider this amazing article that details the first 25 years or so of the supercomputing initiatives at Lawrence Livermore National Laboratory.

Bombs

Throughout our computing history, computation has been harnessed to aid our defense by helping us create ever more powerful weapons. During the Manhattan Project at Los Alamos, Stanley Frankel and Eldred Nelson organized the T5 hand-computing group, a calculator farm populated with Marchant, Friden, and Monroe calculators and the wives of the physicists entering data on them. This group was arranged into an array to provide one of the first parallel computation designs, using Frankel's elegant breakdown of the computation into simpler, more robust calculations. Richard Feynman, a future Nobel prize winner, actually learned to fix the mechanical calculators so the computation could go on unabated by the huge time-sync of having to send them back to the factory for repair.

I was fortunate enough to be able to talk with Feynman when I was at Caltech, and we discussed group T-5, quantum theory, and how my old friend Derrick Lehmer was blacklisted for having a Russian wife. He told me that Stanley Frankel was also blacklisted. Also, I found 20-digit Friden calculators particularly useful for my computational purposes when I was a junior in High School.

The hunger for computation continued when Edward Teller began his work on the Super, a bomb organized around thermonuclear fusion. This lead John von Neumann, when he became aware of the ENIAC project, to suggest that the complex computations required to properly understand thermonuclear fusion could be carried out on one of the world's first electronic computers.

Codebreaking

In the history of warfare, codebreaking has proven itself to be of primary strategic importance. It turns out that this problem is perfectly suited to solution using computers.

One of the most important first steps in this area was taken at Bletchley Park in Britain during World War II. There, in 1939, Alan Turing constructed the Bombe. This was an early electromechanical computer and it was specifically designed to break the cipher and daily settings used in the German Enigma machine.

This effort required huge amounts of work and resulted in the discovery of several key strategic bits of information that turned the tide of the war against the Nazis.

The mathematical analysis of codes and encoded information is actually the science of decryption. The work on this is never-ending. At the National Security Agency's Multiprogram Research Facility in Oak Ridge, Tennessee, hundreds of scientists and mathematicians work to construct faster and faster computers for cryptanalytic analysis. And of course there are other special projects.

That seems like it would be an interesting place to work. Except there's no sign on the door. Well, this is to be expected since security is literally their middle name!

And the NSA's passion for modeling people has recently been highlighted by Edward Snowden's leaks of a slide set concerning the NSA's metadata-colecting priorities. And those slides could look so much better!

Passwords

In the modern day, hackers have become a huge problem for national and corporate security. This is partly because, recently, many advances in password cracking have occurred.

The first and most important advance was when RockYou.com was hacked with an SQL injection attack and 32 million (14.3 million unique) passwords were posted online. With a corpus like this, password crackers suddenly were able to substantially hone their playbooks to target the keyspaces that contain the most likely passwords.

A keyspace can be something like "a series of up to 8 digits" or "a word of up to seven characters in length followed by some digits" or even "a capitalized word from the dictionary with stylish letter substitutions". It was surprising how many of the RockYou password list could be compressed into keyspaces that restricted the search space considerably. And that made it possible to crack passwords much faster.

Popular fads like the stylish substitution of "i" by "1" or "e" by "3" were revealed to be exceptionally common.

Another advance in password cracking comes because passwords are usually not sent in plaintext form. Instead, a hashing function is used to obfuscate them. Perhaps they are only stored in hashed form. So, in 1980 a clever computer security professor named Martin Hellman published a technique that vastly sped up the process of password cracking. All you need to do is keep a table of the hash codes around for a keyspace. Then, when you get the hash code, you just look it up in the table.

But the advent of super-fast computers means that it is possible to compute billions of cryptographic hashes per second, allowing the password cracker to iterate through an entire keyspace in minutes to hours.

This is enabled by the original design of the hashing functions, like SHA, DES, and MD5, all commonly used hashing functions. They were all designed to be exceptionally efficient (and therefore quick) to compute.

So password crackers have written GPU-enabled parallel computation of the hashing functions. These run on exceptionally fast GPUs like the AMD Radeon series and the nVidia Tesla series.

To combat these, companies have started sending their passwords through thousands of iterations of the hashing function, which dramatically increases the time required to crack passwords. But really this only means that more computation is required to crack them.

The Internet

Many attacks on internet infrastructure and on targeted sites depend upon massively parallel capabilities. In particular, hackers often use Distributed Denial of Service (DDoS) attacks to bring down perceived opponents. Hackers often use an array of thousands of computers, called a botnet, to access a web site simultaneously, overloading the site's capabilities.

Distributed computing is an emerging technology that depends directly on the Internet. Various problems can be split into clean pieces and solved by independent computation. These include peaceful projects such as the spatial analysis of the shape of proteins (folding@home), the search for direct gravitational wave emissions from spinning neutron stars (Einstein@home), the analysis of radio telescope data for extraterrestrial signals (SETI@home), and the search for ever larger Mersenne prime numbers (GIMPS).

But not only have hackers been using distributed computing for attacks, they have also been using the capability for password cracking. Distributed computing is well suited to cryptanalysis also.

Exascale weapons

Recently it has been discussed that high-performance computing has become a strategic weapon. This is not surprising at all given how much computing gets devoted to the task of password cracking. Now the speculation is, with China's Tianhe-2 supercomputer, that weaponized computing is poised to move up to the exascale. The Tianhe-2 supercomputer is capable of 33.86 petaflops, less than a factor of 30 from the exascale. Most believe that exascale computing will arrive around 2018.

High-performance computing (HPC) has continually been used for weapons research. A high percentage of the most powerful supercomputers over the past decade are to be found at Livermore, Los Alamos, and Oak Ridge.

Whereas HPC has traditionally been aimed at floating-point operations (where real numbers are modeled and used for the bulk of the computation) the focus of password cracking is integer operations. For this reason, GPUs are typically preferred because modern general-purpose GPUs are capable of integer operations and they are massively parallel. The AMD 7990, for instance, has 4096 shaders. A shader is a scalar arithmetic unit that can be programmed to perform a variety of integer or floating-point operations. Because a GPU comes on a single card, this represents an incredibly dense ability to compute. The AMD 7990 achieves 7.78 teraflops but uses 135W of power.

So it's not out of the question to amass a system with thousands of GPUs to achieve exascale computing capability.

I feel it is ironic that China has built their fastest computer using Intel Xeon Phi processors. With 6 cores in each, the Xeon Phi packs about 1.2 teraflops of compute power per chip! And it is a lower power product than other Xeon processors, at about 4.25 gigaflops/watt. The AMD Radeon 7990, on the other hand, has been measured at 20.75 gigaflops/watt. This is because shaders are much scaled down from a full CPU.

What is the purpose?

Taking a step back, I think a few questions should be asked about computation in general. What should computation be used for? Why does it exist? Why did we invent it?

If you stand back and think about it, computation has only one purpose. This is to extend human capabilities; it allows us to do things we could not do before. It stands right next to other machines and artifices of mankind. Cars were developed to provide personal transportation, to allow us to go places quicker than we could go using our own two feet. Looms were invented so we could make cloth much faster and more efficiently than using a hand process, like knitting. Telescopes were invented so we could see farther than we could with our own two eyes.

Similarly, computation exists so we can extend the capabilities of our own brains. Working out a problem with pencil and paper can only go so far. When the problems get large, then we need help. We needed help when it came to cracking the Enigma cipher. We needed help when it came to computing the cross-section of Uranium. Computation was instantly weaponized as a product of necessity and the requirements of survival. But defense somehow crossed over into offensive capabilities.

With the Enigma, we were behind and trying to catch up. With the A-bomb, we were trying to get there before they did. Do our motivations always have to be about survival?

And where is it leading?

It's good that computation has come out from under the veil of weapons research. But the ramifications for society are huge. Since the mobile revolution, we solve problems that can occur to any of us in real life, and build an app for it. So computation continues to extend our capabilities in a way that fulfills some need. Computation has become commonplace and workaday.

When I see a kid learn to multiply by memorizing a table of products, I begin to wonder whether these capabilities are really needed, given the ubiquity of computation we can hold in our hands. Many things taught in school seem useless, like cursive writing. Why memorize historical dates when we can just look it up in Wikipedia? It's better to learn why something happened then when.

More and more, I feel that we should be teaching kids how to access and understand the knowledge that is always at their fingertips. And when so much of their lives is spent looking at an iPad, I feel that kids should be taught social interaction and be given more time to play, exercising their bodies.

It is because knowledge is so easy to access that teaching priorities must change. There should be more emphasis on the understanding of basic concepts and less emphasis on memorization. In the future, much of our memories and histories are going to be kept in the cloud.

Fundamentally, it becomes increasingly important to teach creativity. Because access to knowledge is not enough. We must also learn what to do with the knowledge and how to make advancements. The best advancements are made by standing on the shoulders of others. But without understanding how things interrelate, without basic reasoning skills, the access to knowledge is pointless.

Saturday, June 30, 2012

Hackers, Part 4: Flame

Remember World War II? Well I don't, because I wasn't alive then! But seriously there is a story or two from WWII that caught my attention a few years ago. In particular, the story of Bletchley Park, of the Enigma cipher and of the mathematicians that broke the code. This heroic story was repeated in several places simultaneously, like the Hawaii-based group that helped break the Japanese Naval cipher JN-25, resulting in a decisive victory at the Battle of Midway. And they were in turn aided by Dutch and British groups.

It was the code breakers at Bletchley Park that pioneered progress in computers, with Claude Shannon and Alan Turing. Often only a nation-state has vast resources and is willing to do the research and gather the best people to make that progress. A similar thing happened at Los Alamos with the Manhattan Project, only on a much larger scale, and in a different field.

With hacking, a similar thing happens. Though individual hackers are very resourceful, the majority of their capabilities builds on the shoulders of others. The zero-day exploits are available on the web. The tools for hacking are available on warez sites. Capture one virus and disassemble it, then modify it. No, individuals rarely are the sharp point on the spear of progress in the really hard problems. They may make discoveries, but not usually the breakthroughs. It has been the nation-state that usually makes that progress and funds the research. The US has a very secret organization based in Fort Meade, Maryland that does this research in signals gathering and code breaking, called the NSA. While long ago this used to jokingly called No Such Agency, today it is simply known as the National Security Agency.

An Impressive Attack

With the Flame virus, the successor to the Stuxnet virus, a very interesting thing happened. The virus posed as a Windows update to be installed, and contained a rogue Microsoft certificate authority. To create this, the virus' creators had to mount a successful attack on the venerable MD5 hash algorithm. This attack allowed them to generate a collision, a file that generates the same hash code as the original plain text.

Such an attack is somewhat time-consuming, and depends upon generating a prefix (called a chosen prefix) that two files have in common. Then the rest of the two files (their suffixes) are adjusted so that they generate the same hash code. This is only part of the attack. Then it becomes clear that, to forge a certificate authority, it is necessary to guess the prefix of the certificate (which Microsoft has probably made it easy to do by generating them in sequential order) and then it is just a matter of having the right amount of computer time to perform a suffix search.

This could be months of computer time, or years, depending on how sophisticated the suffix-generation algorithm is.

This sounds like a world-class attack, not really possible without the resources of a nation state. In the case of Flame, this nation state is the United States. And thus it is highly likely that the NSA has something to do with Flame.

When I heard this, actually I was thinking way to go US. Why? Because I was tired of hearing of all the cyberwar attacks from China and Russia. I was tired of thinking that we were way behind in the US. It looks like both Stuxnet and Flame were the joint product of the US and Israel. If we are on the attack, then we are also on the defensive and that's a good thing.

But there is an inherent danger in the technology of Stuxnet and Flame: it becomes public.

One of the main techniques of the individual hacker, as I mentioned before, is the modification of an existing virus to create a new one. This has already been done with Stuxnet, and soon with Flame. This will cause a serious acceleration of hackers' capabilities. Even in other nation-states.

In particular, it is possible that MD5 is now completely insecure, which will be a real problem for business.

Of course, the other possibility was that Microsoft actually helped the agency responsible for this hack. And actually, I think it may be even more likely that this is true than it might possibly be true that a serious breach of MD5 has occurred. Hmm.

Which one it is remains to be seen.

And you thought that was the interesting part? Well, there are plenty of interesting parts to the Flame virus. In particular, its goals.

Goals

This Flame virus (also known as Skywiper) is intended to infect machines in Iran and gather intelligence. Which it does by hijacking Windows 7 server. And it did this by forging the authority certificate so it could masquerade as a certified Microsoft update to Windows 7 server. Flame has been in the wild since October 2010.

How It Functions

This impressive virus, contained in an executable called Flamer.A commandeers machines on the network and installs various modules for intelligence gathering. They are organized into at least 39 modules, many of them written in LUA. Another incredible analysis of Flamer.A. The known and understood modules are listed below. It makes interesting reading for any student of computer security.

Autorun_infector

This creates the autorun.inf file. This spoofs sutorun.ini, which causes an insertable medium to automatically run. This is commonly used in installers to make it totally automatic.

Beetlejuice

This component uses a bluetooth card, if one exists, in the infected machine to discover any bluetooth devices like phones and other gadgets. Turns the computer into a discoverable bluetooth device so other devices will interact with it.

Boost

Compiles a list of files that appear to be of interest to Flame's creators. This module leaks whole files, like CAD (.dwg) and pictures (.jpg).

Boot_DLL_loader

This is a configuration module, and it contains the list of modules that can be run on this particular infected computer.

Flask

This module extracts local information from the computer that profiles it and its user. Stuff like the names and serial numbers of the volumes, the name of the computer, a list of applications installed, open TCP/IP connections, DNS servers used, files and history from Internet Explorer, contact lists, and even whether the user has a mobile phone. The data is assembled and encrypted using RC4 and also an additional base64 algorithm of unspecified nature. The product data is sent over HTTP in a compressed form.

Jimmy

Looks for documents with extensions like .doc, .docx, .xls, .ppt, etc. and assembles and encrypts them for delivery.

Euphoria

This creates a special desktop.ini and target.lnk file, useful as a clever way of launching Flame automatically when the machine starts up.

Frog

This component actively infects computers within the local network. It uses backdoor accounts named "HelpAssistant", created by Limbo.

Gadget

This component is the one that acts like a legal Windows update server.

Gator

This component connects with the command and control server. In other words, it reports back to its masters. It sends all the collected data back. The data is stored in a database named StorageProducts. The product is the leaked data, of course. In Flame's sophisticated approach, data is graded by desirability. Documents (collected by Jimmy) have highest desirability, CAD drawing files are in the middle, and JPEG files (collected by Boost) are at the bottom. If the database gets filled with leaked pictures, they will get thrown out and replaced by more valuable documents.

In restricted networks, a clever technique is used. When the virus spreads, a message is kept which indicates which computers can connect with the command and control server. The data transmission then happens via USB sticks, which get infected by the Euphoria component. When a computer sees a USB thumb drive, and it can connect with the command and control server, then it reads and sends the data collected on the restricted network computer.

All server communication is done in encrypted form so it can't be detected easily.

In an amazing twist, this module can also download new modules from the command and control server, which keeps the virus current, particularly when new threats are noticed or when bugs have been found and fixed.

Headache

This module contains a configuration that customizes the particular personality of the attack against the infected computer and its network.

Infectmedia

This component decides which is the best method for infecting media, such as USB thumb drives, with Flame for the purposes of propagation. This includes the possibility of using the Autorun mechanism, or the Euphoria mechanism. Also, the stolen data (the contents of a StorageProducts database) that is stored on the USB drive is in a file called dot ("."). This particular name looks like the current directory to Windows and this simple trick ensures that it can't be opened or displayed!

Limbo

This creates new accounts in the other machines in the network with the innocuous name "HelpAssistant" if possible and if the right privileges are available to the module. These become backdoors.

Microbe

This component records audio from built-in microphones. It examines all the multimedia devices and selects the appropriate recording device.

Munch

This component provides the binary certificate of a Windows server. An HTTP server which responds to /view.php and /wpad.dat (Web Proxy Autodiscovery) requests. So this basically helps to fool the DNS search for a Windows update server.

Rear Window

This is a spying component.

Many spying capabilities have been detected in Flame. For example, it installed keystroke recording malware, took pictures with the computers' webcams, accessed machines' microphones to intercept Skype conversations, make screen captures, and it even used Bluetooth to access local cellphones and extract contacts!

Security

This module detects processes and programs that might be harmful to Flame. This is used to pause Flame when the processes are around, to avoid detection of things like a wholesale directory search.

Snack

This module pays close attention to the network traffic. It logs NetBIOS Name Service (NBNS) packets, which helps the virus to determine which computers can be spread to. Sometimes this module only runs when Munch is run.

Spotter

This contains all the scanning modules. Network scanning, file system scanning, multimedia device scanning, etc.

Suicide

This component removes the virus from the infected computer when the command and control server gives the word. Flame maintains a stealthy profile by cleaning up after itself.

Telemetry

This is the keystroke logging component.

Transport

This contains all the ability to replicate the virus. Copying the files, packaging them into an auto-installing file, etc. The ability to change filename and extension of each transported file is a clever part of this module.

Weasel

This module prepares a list of all the files on the infected computer. It is careful to pause whenever a process runs that might be looking for a suspicious search of the entire computer's file system, as determined by Security.