This is an attempt of a mid-level overview of how Tor combines public-key encryption and symmetric-key cryptography to allow it to function. I should make it clear that, I’m not an expert and these ideas really are just a compendium of things I’ve read here and there, and how I believe it to function. I’d be more than happy to edit the post if someone has a better idea and wants to correct the article.

# High-level overview

First a high-level overview. I’m not going to talk at all about what Tor is, the history of Tor or what it’s used for – there are plenty of other resources where you can read about that. You should know roughly how Tor operates (connecting nodes in a circuit between your computer and it's ultimate destination) and other high-level things about it before continuing.

This image was directly pilfered from Wikipedia. What this image represents is the layers of the “onion”. The data-packet is wrapped in successive layers of encryption, and each time the message traverses a node in the circuit, the outer layer is decrypted (or “peeled” to keep inline with onion nomenclature and analogy). The key feature of this is that no single node knows both the origin of the packet (client’s computer) and destination of the packet (e.g. some web-server). A single node knows only the address of the previous node and the following node. Of course if the packet is ever to reach its ultimate destination, the IP of this destination must be there somewhere in the bundle, but the key is that is it buried under the layers of the onion.

Of course the final node, or “exit node”, could see the contents of the message itself if that message wasn’t also encrypted (for example if the client does not establish a HTTPS connection with the webserver), so it’s important to keep that in mind when using Tor and ensure HTTPS everywhere.

Another feature is that no node knows whether the previous node was the client or whether it was just another node in the chain like itself.

The nodes that form a circuit or chain are mandated by what is known as the directory authority nodes. These also provide the client with the public key for each of the nodes in the chain, but more about that later.

# Asymmetric key cryptography

One of the pre-requisites to understanding how Tor works is understanding how asymmetric-key cryptography works. A full account would be beyond the scope of this article, but the basic idea is that we have a pair of keys different from each other – a public key and private key. As the names suggest, the public key is available to the world and the private key must be kept secret.

Using the public key anyone can encrypt a given message (or given block of data), and then only the holder of the private key can decrypt it.

Contrarily, the private key could also be used to encrypt some msg that the public key could decrypt. This is a feature of the mathematics. However given that the public key is public, this kind of usage would be pointless for encryption purposes. However when it is used in this manner we call it "signing" as it is useful to prove the authenticity of the message if only the originator has the private key.

One feature of asymmetric-key encryption is that it is computationally much slower than symmetric-key encryption.

Some examples of asymmetric key algorithms are RSA and ECC. Here is a nice guide to the maths of RSA and another here

# Symmetric key cryptography

This is where both parties have the same (secret) key for both encrypting and decrypting messages.

One of the main challenges is how to share such a secret between two physically distant parties without a third-party, who is listening in, also obtaining this secret key. Diffie-Hellman (DH) key exchange provides a solution to this.

I'm going to include an example here (again stolen from Wikipedia) just to make this concrete so we can talk more concretely when it comes to Tor by referring back to this.

The simplest and the original implementation of the protocol uses the multiplicative group of integers modulo p, where p is prime, and g is a primitive root modulo p. These two values are chosen in this way to ensure that the resulting shared secret can take on any value from 1 to p–1.

Here is an example of the protocol:

1) Alice and Bob agree to use a modulusp = 23 and baseg = 9 (which is a primitive root modulo 23). Both of these things are public info that Eve can also note down.

2) Alice chooses a secret integer a =4, then sends Bob A = g^a mod p which for the values chosen means A= 6. Note Eve can see this, but if we're using big enough numbers then because of the discrete log problem then Eve cannot simply invert to get a.

3) Likewise, Bob chooses a secret integer b = 3, then sends Alice B = g^b mod p, which for these values gives B = 16.

4) Alice computes s = B^a mod p, which gives s = 9

5) Bob computes  s = A^b mod p, which gives s = 9 also.

6) Alice and Bob now share a secret (the number 9).

They got the same number because of exponentiation is commutative under mod.Mathematically: g^(ab) equiv g^(ba) (under modp).

Note that only ab, and (g^ab mod p = g^ba mod p) are kept secret. All the other values – pg,A, B – are sent in the clear. Once Alice and Bob compute the shared secret they can use it as an encryption key, known only to them, for sending messages across the same open communications channel.

Take a look also at the video

# Tor Steps :

## Step 1:

The client has the public key of node 1 (N1) the first node in the circuit; it has obtained this from the directory server. Since we already have the public key of N1 we are able to send messages to N1 that N1, and only N1, can decrypt and read. You may wonder if since RSA private keys can be used to “encrypt” a message, which the public key can decrypt, can’t N1 also send us back encrypted messages? Well yes, mathematically speaking, but they wouldn’t be secure because the public key is well, public. Using the keys that way around therefore is only good for signing. We would like to establish a session with N1. In other words, establish a shared secret between us (CLIENT) and N1 so we may use symmetric-key-exchange crypto.
To do this we will use Diffie-Hellman (DH) techniques. Taking the (toy) example from the previous section, we agree (publically) to use modulus p=23 and base g=9. Then we (CLIENT) pick a secret integer a=4. We then compute A=(g^a)modp = (9^4)mod23.

So far just like regular DH, but this time we use PKN1 (N1’s public key) to encrypt A, and we send encPKN1(A) to N1. Just like in regular DH, N1 now picks his random integer b, and computes B=(g^b)modp. It sends this to us in plain text as the image above shows, no envelope. This however is not an issue because of the initial request being encrypted with the public key, as I hope the following aside will make clear. The rest now proceeds as in regular DH exchange. N1 now computes the secret s = A^b, and CLIENT now computes the secret s=B^a.

### Aside on why the public key encryption is needed at all.

You may wonder why we need to encrypt our message to N1 with the public key at all, since isn’t DH supposed to be immune to eavesdroppers and a way to establish a secret even when someone is watching? The caveat however is that this is only true when the eavesdropper is passive and does nothing but listen; if we have an active eavesdropper, then MITM attacks could thwart our initial establishment of a secret without using the public key to encrypt the CLIENT→N1 component of the setup.

You can imagine Eve sat in the middle of CLIENT AND N1 as in the image

She intercepts A=g^a(modp), and instead of simply reading it passively then forwarding it on to N1, she decides to choose her own secret, c, and actually forwards A’=g^c(modp). N1 has no idea that this came from Eve and not the client. Now N1 will compute s’=(A’)^b, a different secret. Also, when Eve received B she will compute the same s’=(B)^c. In other words it is now Eve who has used DH techniques to establish a secret with N1, but N1 thinks this is a secret between himself and CLIENT. In the exact same way, Eve can pick another random integer on the return journey, d, to compute B’=g^d(modp) and pass this back to the client instead of B. Eve can then compute s’’=A^d and the client will compute the same s’’=(B’)a. In other words with an active snooper, DH would be vulnerable to MITM style attacks without using public key encryption for at least one leg of the journey. The client would think it has a session key with N1 and N1 would think is has a session key with the client, but really we’d have 2 sessions, CLIENT<→ EVE and EVE<→N1. Not good! Also check out this video explaining an active DH attack in more detail.

But what about the return trip being in plain-text? It’s true that active Eve could intercept and modify this data too right? Yes, but it would do her little good. N1 would have still received the intended A=g^a(modp) from the client and using it N1 will compute a shared secret s=A^b, where b is N1’s random private number. Now if Eve had tampered with the return packet B=(g^b)modp and made it say B’=(g^d)modp, then when CLIENT tried to compute the same shared secret that N1 holds he’d do (B’)^a=(g^da)modp, which is not the same as (g^ab)modp. In this manner, CLIENT and N1 would end up with different secrets, and they wouldn’t be able to communicate at all. Eve has therefore the ability to disrupt the communications in this manner, but not the ability to snoop on them.

I imagine one more layer of protection that public key encryption is giving us is that N1 will sign the return message with its private key, which the CLIENT (or of course anyone else), using the public key, can verify the message came from N1 not someone else.

## Step 2

Now CLIENT and N1 have established a session (they have a shared secret) and can efficiently communicate with the faster methods of symmetric-key exchange and dispense with the slow and weighty computational expense of public-key crypto. This is the first layer of our onion!

What we’d like to do next is extend the circuit to the second node (N2) that the directory server had chosen for us.

Again we have N2’s public key (PKN2) and we can use it to encrypt a message that N2, and only N2, can read. This is important, not even N1 can read that message nor Eve.

We once again begin the DH dance. We choose our random number, a, (different from the first time we did this with N1!), and we need to find a way to send our A=g^amodp to N2. We only want N2 to be able to read this so once again we encrypt it using the public key: encPKN2(g^amodp), and then we use our shared secret (SS1) with N1 to further wrap it, i.e. SS1(encPKN2(g^amodp)). This means Eve has 2 layers of encryption between her and the juicy data now.

When N1 receives this data, it can unwrap the SS1 layer using the shared secret we established, and find under it encPKN2(A). Note it cannot see what A is itself, so one more precaution against it doing a MITM attack in the manner described in the earlier aside. N1 will just see that the packet should be forwarded to N2 and do so. N2 will receive this packet, encPKN2(A), and it will decrypt it using its private key to obtain A. It will pick a private random number b , and compute the session secret s2=A^b and B=g^bmodp, the latter which it will send back to N1. It may sign the message with B using it’s private key to authenticate the messages origin, however Eve or anyone else could use the public key to read that message, but just like explained earlier, this does not matter.

N1 will encypt B with SS1 before sending it back to CLIENT (not that it really matters as the N2→N1 was in clear text anyway).

CLIENT can now decrypt using SS1 to get B, and compute s2=B^a.

Now the CLIENT and N2 have a shared secret that N1 (nor anyone else) doesn’t know.

## Step 3

Hopefully it’s obvious how this could be extended to the third node and so on if desired. The client would send SS1(SS2(encPK3(g^amodp))). N1 would peel SS1 and forward to N2. N2 would peel the remaining SS2 and forward encPK3(g^amodp) to N3 who could decrpyt it using it's private key PK3. In this way CLIENT and N3 would end up establishing a secret, s3, that both N1 and N2 did not know nor anyone else.

## Step 4

By now we have extended the circuit to include a given number of nodes (probably at least three). We, the CLIENT, have established shared secrets with multiple nodes, N1, N2, N3 by leveraging a combination of public key cryptography and DH key exchange techniques, and now we can communicate with each using symmetric key exchange with these secrets in a manner than only that particular node can read.

If we want to send a message to some web-server, then what we can do, therefore, is take our message (and message header that will contain the destination IP address of the message, e.g. IP for google.com), then we first encrypt it (wrap it) using SS3 (the secret with the exit node node N3) Next we wrap it with SS2 (the secret with exit node N2) and add an intermediary header containing the IP address of N3. Next we wrap it with SS1 (the secret with exit node N1) and add an intermediary header containing the IP address of N2. In this way we have SS1(SS2(SS3(msg)))….and the layers of encryption are stacked up like an onion.

N1 gets this onion and strips off the outer later SS1. It can see the header that tells it to pass it N2’s IP and obviously know the IP of the previous node, but it doesn’t know that the previous node was the originator and not just another node in the circuit. Nor can is break the other 2 layers of encryption remaining to read the message or lower-level forwarding information. It forwards accordingly SS2(SS3(msg)) to N2. N2 now strips away SS2 leaving SS3(msg). Note again it can’t read the msg. It can read the header to tell it to forward the package to N3 and it knows the previous node N1 but nothing about the client.

Finally, N3 gets SS3(msg). It strips the final layer, to show the message. Note that if the message itself isn’t encrypted (between the client and web-server) then N3 can read the message in plain-text. This is why it's still important to use HTTPS with Tor.

On the return trip with the message from the webserver, rmsg, the exit node wraps it in SS3(rmsg) and passes to N2, which wraps in SS2(SS3(msg)) before passing to N1, which wraps as SS1(SS2(SS3(rmsg))) before passing back to the client. The client has all these secrets so can unwrap the full onion to get rmsg.

Note that the exit could read the response from the server, but N2 and N1 cannot because it has again been wrapped in SS3 and SS2 successively at each hop.

Note the return message, doesn’t contain any information in the header (or headers) that can be used to route it back to the client (not even encrypted ones), it’s simply a case of each node forwarding the data to the place that made the request to it initially, and them playing pass the pacel parcel back (I believe anyway).

Currently unrated