relativistic observer: cipher

Showing posts with label cipher. Show all posts

Saturday, June 30, 2012

Hackers, Part 4: Flame

Remember World War II? Well I don't, because I wasn't alive then! But seriously there is a story or two from WWII that caught my attention a few years ago. In particular, the story of Bletchley Park, of the Enigma cipher and of the mathematicians that broke the code. This heroic story was repeated in several places simultaneously, like the Hawaii-based group that helped break the Japanese Naval cipher JN-25, resulting in a decisive victory at the Battle of Midway. And they were in turn aided by Dutch and British groups.

It was the code breakers at Bletchley Park that pioneered progress in computers, with Claude Shannon and Alan Turing. Often only a nation-state has vast resources and is willing to do the research and gather the best people to make that progress. A similar thing happened at Los Alamos with the Manhattan Project, only on a much larger scale, and in a different field.

With hacking, a similar thing happens. Though individual hackers are very resourceful, the majority of their capabilities builds on the shoulders of others. The zero-day exploits are available on the web. The tools for hacking are available on warez sites. Capture one virus and disassemble it, then modify it. No, individuals rarely are the sharp point on the spear of progress in the really hard problems. They may make discoveries, but not usually the breakthroughs. It has been the nation-state that usually makes that progress and funds the research. The US has a very secret organization based in Fort Meade, Maryland that does this research in signals gathering and code breaking, called the NSA. While long ago this used to jokingly called No Such Agency, today it is simply known as the National Security Agency.

An Impressive Attack

With the Flame virus, the successor to the Stuxnet virus, a very interesting thing happened. The virus posed as a Windows update to be installed, and contained a rogue Microsoft certificate authority. To create this, the virus' creators had to mount a successful attack on the venerable MD5 hash algorithm. This attack allowed them to generate a collision, a file that generates the same hash code as the original plain text.

Such an attack is somewhat time-consuming, and depends upon generating a prefix (called a chosen prefix) that two files have in common. Then the rest of the two files (their suffixes) are adjusted so that they generate the same hash code. This is only part of the attack. Then it becomes clear that, to forge a certificate authority, it is necessary to guess the prefix of the certificate (which Microsoft has probably made it easy to do by generating them in sequential order) and then it is just a matter of having the right amount of computer time to perform a suffix search.

This could be months of computer time, or years, depending on how sophisticated the suffix-generation algorithm is.

This sounds like a world-class attack, not really possible without the resources of a nation state. In the case of Flame, this nation state is the United States. And thus it is highly likely that the NSA has something to do with Flame.

When I heard this, actually I was thinking way to go US. Why? Because I was tired of hearing of all the cyberwar attacks from China and Russia. I was tired of thinking that we were way behind in the US. It looks like both Stuxnet and Flame were the joint product of the US and Israel. If we are on the attack, then we are also on the defensive and that's a good thing.

But there is an inherent danger in the technology of Stuxnet and Flame: it becomes public.

One of the main techniques of the individual hacker, as I mentioned before, is the modification of an existing virus to create a new one. This has already been done with Stuxnet, and soon with Flame. This will cause a serious acceleration of hackers' capabilities. Even in other nation-states.

In particular, it is possible that MD5 is now completely insecure, which will be a real problem for business.

Of course, the other possibility was that Microsoft actually helped the agency responsible for this hack. And actually, I think it may be even more likely that this is true than it might possibly be true that a serious breach of MD5 has occurred. Hmm.

Which one it is remains to be seen.

And you thought that was the interesting part? Well, there are plenty of interesting parts to the Flame virus. In particular, its goals.

Goals

This Flame virus (also known as Skywiper) is intended to infect machines in Iran and gather intelligence. Which it does by hijacking Windows 7 server. And it did this by forging the authority certificate so it could masquerade as a certified Microsoft update to Windows 7 server. Flame has been in the wild since October 2010.

How It Functions

This impressive virus, contained in an executable called Flamer.A commandeers machines on the network and installs various modules for intelligence gathering. They are organized into at least 39 modules, many of them written in LUA. Another incredible analysis of Flamer.A. The known and understood modules are listed below. It makes interesting reading for any student of computer security.

Autorun_infector

This creates the autorun.inf file. This spoofs sutorun.ini, which causes an insertable medium to automatically run. This is commonly used in installers to make it totally automatic.

Beetlejuice

This component uses a bluetooth card, if one exists, in the infected machine to discover any bluetooth devices like phones and other gadgets. Turns the computer into a discoverable bluetooth device so other devices will interact with it.

Boost

Compiles a list of files that appear to be of interest to Flame's creators. This module leaks whole files, like CAD (.dwg) and pictures (.jpg).

Boot_DLL_loader

This is a configuration module, and it contains the list of modules that can be run on this particular infected computer.

Flask

This module extracts local information from the computer that profiles it and its user. Stuff like the names and serial numbers of the volumes, the name of the computer, a list of applications installed, open TCP/IP connections, DNS servers used, files and history from Internet Explorer, contact lists, and even whether the user has a mobile phone. The data is assembled and encrypted using RC4 and also an additional base64 algorithm of unspecified nature. The product data is sent over HTTP in a compressed form.

Jimmy

Looks for documents with extensions like .doc, .docx, .xls, .ppt, etc. and assembles and encrypts them for delivery.

Euphoria

This creates a special desktop.ini and target.lnk file, useful as a clever way of launching Flame automatically when the machine starts up.

Frog

This component actively infects computers within the local network. It uses backdoor accounts named "HelpAssistant", created by Limbo.

Gadget

This component is the one that acts like a legal Windows update server.

Gator

This component connects with the command and control server. In other words, it reports back to its masters. It sends all the collected data back. The data is stored in a database named StorageProducts. The product is the leaked data, of course. In Flame's sophisticated approach, data is graded by desirability. Documents (collected by Jimmy) have highest desirability, CAD drawing files are in the middle, and JPEG files (collected by Boost) are at the bottom. If the database gets filled with leaked pictures, they will get thrown out and replaced by more valuable documents.

In restricted networks, a clever technique is used. When the virus spreads, a message is kept which indicates which computers can connect with the command and control server. The data transmission then happens via USB sticks, which get infected by the Euphoria component. When a computer sees a USB thumb drive, and it can connect with the command and control server, then it reads and sends the data collected on the restricted network computer.

All server communication is done in encrypted form so it can't be detected easily.

In an amazing twist, this module can also download new modules from the command and control server, which keeps the virus current, particularly when new threats are noticed or when bugs have been found and fixed.

Headache

This module contains a configuration that customizes the particular personality of the attack against the infected computer and its network.

Infectmedia

This component decides which is the best method for infecting media, such as USB thumb drives, with Flame for the purposes of propagation. This includes the possibility of using the Autorun mechanism, or the Euphoria mechanism. Also, the stolen data (the contents of a StorageProducts database) that is stored on the USB drive is in a file called dot ("."). This particular name looks like the current directory to Windows and this simple trick ensures that it can't be opened or displayed!

Limbo

This creates new accounts in the other machines in the network with the innocuous name "HelpAssistant" if possible and if the right privileges are available to the module. These become backdoors.

Microbe

This component records audio from built-in microphones. It examines all the multimedia devices and selects the appropriate recording device.

Munch

This component provides the binary certificate of a Windows server. An HTTP server which responds to /view.php and /wpad.dat (Web Proxy Autodiscovery) requests. So this basically helps to fool the DNS search for a Windows update server.

Rear Window

This is a spying component.

Many spying capabilities have been detected in Flame. For example, it installed keystroke recording malware, took pictures with the computers' webcams, accessed machines' microphones to intercept Skype conversations, make screen captures, and it even used Bluetooth to access local cellphones and extract contacts!

Security

This module detects processes and programs that might be harmful to Flame. This is used to pause Flame when the processes are around, to avoid detection of things like a wholesale directory search.

Snack

This module pays close attention to the network traffic. It logs NetBIOS Name Service (NBNS) packets, which helps the virus to determine which computers can be spread to. Sometimes this module only runs when Munch is run.

Spotter

This contains all the scanning modules. Network scanning, file system scanning, multimedia device scanning, etc.

Suicide

This component removes the virus from the infected computer when the command and control server gives the word. Flame maintains a stealthy profile by cleaning up after itself.

Telemetry

This is the keystroke logging component.

Transport

This contains all the ability to replicate the virus. Copying the files, packaging them into an auto-installing file, etc. The ability to change filename and extension of each transported file is a clever part of this module.

Weasel

This module prepares a list of all the files on the infected computer. It is careful to pause whenever a process runs that might be looking for a suspicious search of the entire computer's file system, as determined by Security.

Tuesday, May 15, 2012

Cryptography, Part 1

One of the most important methods of security are cryptosystems and their application. They are the basis for security. But in the past they have been broken notably in times of war, when necessity was at its most dire. For each post in this series, I will concentrate a bit on history and also a bit on the systems used in the modern day.

How They Work

The most obvious form of cryptography is simply the encryption of a message by a sender, sending the message in its encrypted form, and the subsequent decryption of that message by the receiver. In its original form, the message is called plaintext and the encrypted form of the message is called ciphertext. This kind of encryption has been used for thousands of years, though the methods of encryption have been getting better and better.

Letter Substitution Ciphers

Early forms of encryption were simple letter-substitution ciphers. The Caesar cipher was quite simple, just treat the letters of the alphabet as though they were a circular group and rotate the wheel. If we rotate by one, then TEMPUS FUGIT becomes UFNQVT GVHJS. This appears to be quite unreadable at first glance. But once you know the method, there are only 25 possibilities to try. Well, there should be 26, but that would include the case where the wheel was not turned. In this case the ciphertext is exactly the same as the plaintext: and so we ignore it.

A graphical example of letter substitution is Polybius' square. Here, a letter is substituted by two numerical digits, a row and a column. This makes TEMPUS FUGIT into 44 15 32 35 45 43 21 45 22 24 44. Note that the blank, or word separator, is not encoded. Yet this substitution cipher is really just an early attempt at making an ASCII representation of the characters.

If you can't encode a blank, the phrase NOW IN can be decoded as NO WIN. This is a potential misread. So the better ciphers allow for more than 25 letters, as we will see. Well, the very fact that I and J share a square seems to imply yet another kind of ambiguity would arise from the use of this cipher.

Nonetheless, letter substitution ciphers fall prey to cryptanalysis, the science of breaking a code. To break the code, all you need is a long message. The letters of a message have a very likely probability distribution: the Zipf distribution for English. So we can use frequency analysis to determine likely decodings and pretty soon we have cracked the code.

How does this work? First off, we analyze the frequency of occurrence of the ciphertext letters. Then we match that up to the frequency distribution of typical plaintext. This will give us a few likely substitutions to try.

Well, actually one more thing might be needed in practice: a list of letter pairs. Some letter pairs will be commonly-occurring and others will not occur at all. We can use this to automatically determine whether a prospective substitution is valid.

So, you see, a simple letter substitution cipher is quite insecure.

So it wasn't very long in the scheme of things that this cipher was improved on. As it turns out, simply scrambling the letters in the Polybius square is not enough to make it more difficult. This just turns it into another letter-substitution cipher.

Codes During World War I

So, what can be done to make it harder to crack? During WW I, the Germans fixed the Polybius square in two ways. First, they used a scrambled alphabet. Also they used ADFGX as the row and column numbers instead of 12345. This really only made it a bit more visually confusing, since it is still a substitution cipher.

Here you see the result of modifying the Polybius square, using letters for the row and column, and scrambling the alphabet. This is a permutation. Each message could change the code book by using a different scramble. But there was more to the key than this, as you will see.

The next step is to substitute for the letters of the message, in this case MOVE GUNS WEST is converted to AA DX XF AX XG GD GX DA FA AX DA FG.

Then we lay the encoded result into the same 5X5. Note that an X is added at the end. If the message is more than twelve letters, we do this potentially multiple times, into multiple 5X5 arrays. It is important to pad the end of the message with random text (not just X's), or it may be easier to analyze!

Then we put a 5-letter word at the top, this is the next part of the key. And this is what makes the cipher so interesting. It creates a second permutation, on the columns of the text. What we do is to sort the letters of the word, and move the appropriate columns in the array as the letters move. So this means your word can't contain the same letter twice, like TWEET, nor can it already be sorted, like ABCDE.

Once we sort the columns, we get a modified array of text. The last thing to do is to read it out in columns to produce the ciphertext.

Although this method is better than simple substitution, it is vulnerable in several ways. First, there are only 120 (5 factorial) possible sorting orders (permutations) for 5 letters. If we try them all, then there will be one ordering that gives a better frequency distribution than all the others. Even if this is not so, you can try all the orderings with likely frequency distributions, and break them using known substitution cipher attack schemes. A poor fellow named Lieutenant Georges Painvin did this by hand in 1918 and successfully broke the German code (even after they had added another row and column to their array!). It nearly drove him crazy too.

Here is the cipher text for the original message. The reason it is longer is that the result is essentially in base 5, which takes roughly twice the space in symbols vs. base 26.

What the Germans wanted was a system where they could freely transmit the message in the clear (in ciphertext form) but not have it decoded by an interloper, in their case the French. To make this work, the sender and receiver both must know the same key. This is called a shared secret in cryptography. A system where one key is used to both encrypt and decrypt the message is called a symmetric-key cryptosystem.

The advent of computers really did change cryptography. But it also simultaneously changed cryptanalysis. This is where cooler, and more mathematically-oriented, heads prevailed and systems were developed that were extremely hard to crack, even using modern computers.

Public-Key Cryptography

A fellow named William Stanley Jevons figured out that one-way functions could be applied to cryptography in 1874. This was exploited by Rivest, Shamir, and Adelman at MIT in 1977 to create the RSA algorithm.

The basic idea is that there are two keys. One, the public key, is used to encrypt the plaintext, and another, the private key, is used to decrypt it. The keys are related mathematically, but computationally it is very difficult to extract the private key from the public key.

The technique for relating the public and private key pair in RSA is factorization. It's really quite clever. The public key is the product of two large (and I mean large) prime numbers. The private key is one of the prime numbers. What makes it work is this: it is relatively easy to determine if a large number is a prime. However, when a number is not a prime, it is very hard to factor it into a product of primes.

There are many wrinkles to public-key cryptography. For instance, the protocol for key revocation or replacement is one. Timestamps can be added for additional limits on the spread and validity of the privilege of decoding.

Authentication

The main reason for the private key is, of course, the authentication of the intended receiver. But can an interloper do something to compromise the message? Absolutely. Modifying the ciphertext when it is en route from the sender to the receiver is one way to compromise the message. This gives rise to authentication schemes.

When it comes to security, it is important to have three bits of knowledge: The first is that the message is being received by its intended recipient. If you are sending a message an ally, you would like to prevent your enemy from getting it. The second is to verify that the message did, in fact, come from the origin that is advertised for the message. If your enemy sends you a message that says it comes from your friend, this can be used to deceive you. The third is to know who had the message along the way. This is akin to the chain of custody in forensics. The point is this: can you trust the message?

We now use digital signatures to authenticate messages. More on this in a future installment.