1.1 Overview of Freenet
Freenet is a peer-to-peer network application that aims to permit the publication, replication, and retrieval of data while protecting the anonymity of both authors and readers. Freenet operates as a network of identical nodes that collectively pool their storage space to store data files, and cooperate to route requests to the most likely physical location of data. Files are referred to in a location-independent manner, and are dynamically replicated in locations near requestors and deleted from locations where there is no interest.
Freenet claims that it is unfeasible to discover the true origin or destination of a file passing through the network, and difficult for a node operator to determine or be held responsible for the actual physical contents of his own node. This paper will conduct an examination of Freenet to determine if these claims can be supported.
1.2 Freenet Security Goals1.3. Organization of the paper
Section 2 summarizes the Freenet protocol and Section 3 describes various attacks that the current protocol is vulnerable to, including eavesdropping, man in the middle, node discovery, routing, traffic analysis, and denial of service attacks. The author makes an attempt to provide sufficient Freenet details in each section in order to understand the attacks. The last section concludes with some advice for Freenet designers and areas for future work.
2. Freenet Protocol
2.1 Overview
The Freenet protocol is packet-oriented and uses self-contained messages. Each message includes a transaction ID so that nodes can track the state of inserts and requests. Node addresses consist of a transport method plus a transport-specific identifier (such as an IP address and port number), e.g. tcp/192.168.1.1:19114.
A Freenet transaction begins with a Request.Handshake message from one node to another, specifying the desired return address of the sending node. If the remote node is active and responding to requests, it will reply with a Reply.Handshake specifying the protocol version number. Handshakes are remembered for a few hours, and subsequent transactions between the same nodes during this time may omit this step.
All messages contain a randomly-generated 64-bit transaction ID, a hops-to-live counter, and a depth counter. Hops-to-live is set by the originator of a message and is decremented at each hop to prevent messages being forwarded indefinitely. To reduce the information that an attacker can obtain from the hops-to-live value, messages do not automatically terminate after hops-to-live reaches 1 but are forwarded on with finite probability (with hops-to-live again 1). Depth is incremented at each hop and is used by a replying node to set hops-to-live high enough to reach a requestor. Requestors should initialize it to a small random value to obscure their location. As with hops-to-live, a depth of 1 is not automatically incremented but is passed unchanged with finite probability.
To request data, the sending node sends a Request.Data message specifying a transaction ID, initial hops-to-live and depth, and a search key. The remote node will check its datastore for the key and if not found, will look up the nearest key in its routing table to the requested key and forward the request to the corresponding node. If the request is successful, the remote node will reply with a Send.Data message containing the data requested and the address of the node which supplied it (possibly faked). The node will cache the data in its own datastore and and create a new entry in its routing table to associate that file with the corresponding datasource address.
To insert data, the sending node sends a Request.Insert message specifying a randomly-generated transaction ID, an initial hops-to-live and depth, and a proposed key. The remote node will check its datastore for the key and if not found, it will forward the insert to the node that has the nearest key in its routing table.
If the insert ultimately results in a key collision, the remote node will reply with either a Send.Data message containing the existing data or a Reply.NotFound (if existing data was not actually found, but routing table references to it were). If the insert does not encounter a collision, yet runs out of nodes with nonzero hops-to-live remaining, the remote node will reply with a Request.Continue. In this case, Request.Continue is a failure result meaning that not as many nodes could be contacted as asked for. These messages will be passed along upstream as in the request case. Both messages terminate the transaction and release any resources held. However, if the insert expires without encountering a collision, the remote node will reply with a Reply.Insert, indicating that the insert can go ahead. The sending node will pass along the Reply.Insert upstream and wait for its predecessor to send a Send.Insert containing the data. When it receives the data, it will store it locally and forward the Send.Insert downstream, concluding the transaction. In this way, the data is propagated along the path established by the initial query and stored in each node along the way.
2.2 Protocol documentation
The first obvious weakness of Freenet is its lack of protocol documentation. The intent of the protocol is published in various papers [2,3], but the actual specification of the protocol in spread out in various incomplete and inaccurate web documents [4,5,6] and some details can be gleaned from various Freenet newsgroups. Currently the best source of protocol documentation is the source code, but several diverging development efforts and rapidly changing code make this a moving target. To avoid the security pitfalls that have plagued many closed design efforts [7, 8] and to discover flaws early in the design process, Freenet designers should publish their protocol as soon as it is available and make it available to the widest audience possible.*
*One troubling fact is that designers have made a tentative agreement with Linux Journal to only publish the complete protocol specification on Freenet (not on the web or in newsgroups) in order to grant the magazine exclusive first publication. This effectively limits access for a period of time to those who already use Freenet, and who are aware that the document will be published there and who know the search key to find it (i.e., to a small set of Freenet developers).
3. Attacks
3.1 Eavesdropping
In this section we assume that a hypothetical attacker, known as Eve, can observe connections between nodes, but she can not change, remove, or add extra messages. There is no protection against Eve between the user and the first node contacted, therefore users are encouraged to first connect to a node running on their own machine. Nodes currently perform a normal Diffie Hellman key exchange protocol in order to agree upon the key with which to encrypt communications. Thus, messages between nodes are encrypted and are safe against Eve.
However simply
by setting up a node, Eve can "eavesdrop" on any messages that are
routed through her. The fields in all messages are human readable, with
the exception of the Data field (the actual document), which is often encrypted.
For example, here is a sample message that contains a plain text document:
Reply.Data UniqueID=C24300FB7BEA06E3 Depth=10 HopsToLive=54 Source=tcp/127.0.0.1:2386 DataSource=tcp/192.235.53.175:5822 Storable.InfoLength=0 DataLength=131 Data 'Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves And the mome raths outgrabe.
Currently, documents can be encrypted in two ways:
1) Keyword hashes. The document creator starts with a readable_name (like /music/mp3/artist/song) and uses DSS signatures to generate a public/private key pair given a hash. The private key is set to the hash, k = hash(readable_name) and the public key is then y = g^k mod p.
Conceptually, y is the Freenet "search key" (used for routing inserts and requests for a document-- In actuality, hash(y) is used as the search key and the full value of y is included in metadata; this is equivalent but uses a shorter search key.)
This method obscures the search key which is being requested or inserted, but includes a proof that the original creator of this data packet knew the readable_name which hashes to the routing key.
2) Content hashes. These are generated using the SHA1 hash of encrypted data. The search key in this case is hash(encrypted data). Content hashes are used to verify that a document matches its search key and to verify its integrity. Content hashes also reduce the redundancy of data since two identical documents a will have the same content hash and will collide on insert (unlike keyword hashes, which allow different documents to have the same keyword hashes).
Since routing depends on knowledge of the search key, key anonymity is not possible. The use of hashes as keys provides some obscurity against casual eavesdropping, but is vulnerable to dictionary attacks. Thus by setting up a node and monitoring messages, Eve can log which nodes request what keys, which nodes reply with what data, which nodes insert what keys and which nodes reply with RequestFailed or TimedOut for what keys. This information can greatly aide in traffic analysis attempts discussed later in the paper.
3.2. Man in the Middle
Without authentication, the encrypted connections between nodes are vulnerable to active attacks; Mallory can interfere with the key agreement protocol and insert himself as a "man in the middle", convincing Alice that he is Bob, and Bob that he is Alice [9].
Mallory can force Alice to hold objectionable material by requesting it through her (causing her to cache a copy on her node). He can also make it seem like Alice is inserting particular documents into Freenet.
One grave security threat is an "all sides" man in the middle attack, where Mallory inserts himself between all of Alice's connections to Freenet. This gives Mallory full control over Alice's communications and greater control over her datastore. He can control which requests leave Alice's node and the replies that enter. One of Mallory's nodes can request a document through Alice, which can be supplied from an upstream node (also under Mallory's control) and force it to be copied onto Alice's node. Because he can make it appear that the document doesn't exist elsewhere (by destroying his own copies of it and by failing to successfully request it from other nodes), he can raise suspicion that Mallory is the author. Similarly, he can insert documents into Freenet that appear to originate from Alice. He can also monitor all of Alice's traffic, though he can't determine what she is inserting or requesting without knowing the search keys involved. As discussed previously, he can use a dictionary attack to discover search keys easily if they are keyword hashes.
The only way to defeat Mallory is for nodes to authenticate other nodes. This can be achieved by obtaining a node's public key through a secure channel (e.g., a trusted party such as certificate vendor, or through a web of trust like PGP) and validating their fingerprints. Many have suggested using Freenet as a key server, however this approach leaves the problem of how to validate keys. Anyone can insert a key claiming to belong to any node. Setting up a working web of trust to ensure these validations has proven to be a major stumbling block.
One intermediary solution Freenet can adopt to raise the bar against Mallory, is to allow nodes to include their public key (or fingerprint) with their node address. When a node forwards another node's address (as a DataSource field in a reply for example), it can include the corresponding public key fingerprint (e.g., tcp/123.456.789.10:19114/<public key fingerprint>).
A field could be added to the HandshakeRequest that would request the public key if Alice didn't have Bobs key. Bob would return his public key in the HandshakeReply. Alice can check that the public key sent back by Bob matches the fingerprint. Alice can also check for consistency when she gets a reference to a node that she already knows about by ensuring that the PK field is the same. Alice can also ask her neighbors about Bob, to see if they agree on his IP/Key mapping. If some number of neighbors agree, then Alice can trust Bob. If neither of them do, then she will reject his connections.
In this scenario, Mallory can still arrange for Alice to get a bogus public key for Bob. However, if Mallory wanted to provide a bogus key, he could have just as easily provided a bogus IP (In some ways providing a bogus IP is an easier attack to mount because he no longer has to deal with the usual IP level MITM issues of intercepting IP packets in real time, altering them, dealing with fragmentation, ordering, retransmission, etc.). Still, Mallory can provide a public key that can be completely authenticated and the attack is not prevented.
One advantage to this solution, however, is that it increases the chances Alice can detect Mallory. Now, to prevent detection Mallory must not only be there from the beginning of the negotiation. He must also detect and alter every single Freenet reference that Alice receives, throughout her use of Freenet. If Alice starts using Freenet before Mallory arrives, she will have some valid node addresses and valid public keys. Mallory can do nothing about the connections she makes with those nodes. They in turn will provide her with more valid public keys with every node address she receives. Mallory cannot substitute bogus keys for Alice's prior contacts (because they are sent over encrypted connections he cannot read).
Now, the challenge for Mallory is to corrupt the nodes Alice is talking to and somehow get them to send her new node addresses with bad keys. This is risky if he does not know which nodes she already knows about. If he attempts to sends her a bad key for a node, but she previously received a valid key for that node, she may detect the attack.
If Mallory fails to completely surround Alice, then she has a chance at detecting him even if she starts with only one valid address. If Mallory completely surrounds Alice, the only way he can be detected is via out-of-band fingerprint checking. Paranoid users must still rely on trusted parties to get their node addresses and keys though secure channels.Mallory can save himself a lot of work if he can get his address widely distributed as a good starting point into the network. If he is successful, those who start nodes may find themselves connected to one or more nodes under Mallory's control. Mallory can then accomplish everything the MITM discussed in the last section can do.
Freenet nodes learn about other nodes in 3 ways:
1) Freenet utilizes a PHP script on their website which will allows Freenet nodes to find out about each other [10]. Pointing clients to the URL reveals a Freenet nodes.config style list of addresses. A node can obtain addresses of other nodes or add its own address to the list by passing or receiving two variables (via GET or POST) called "ipaddr" and "port", which are assumed to be the IP address and listening port of a Freenet node. The list is restricted to a maximum of 20 addresses, with old addresses being replaced by new ones. The script actually ignores the "ipaddr" variable and uses the IP address which the request appears to be coming from to add minimal authentication to incoming information. Some clients allow use of this service to be configured (in the .freenetrc file users can specify not to use the service, read addresses only, read and send addresses, or send address only) while others do not.
The PHP node discovery mechanism is a security hazard for three reasons:
2) Nodes also learn about nodes by receiving a message with data in it whose DataSource field points at another node (this would either be a DataInsert or a DataReply message). Nodes can then establish new connections to these nodes if they desire. There is no way of verifying that the address in the DataSource is valid; it could be pointing at a malicious node. (Any node along the insert or reply path can decide to claim himself or any other node as the datsource).
3) Nodes also know about nodes because users can add them to their nodes.config file themselves. It is possible, then, that users can choose to ignore the DataSource fields and only talk to known, trusted nodes. However, if everyone opted to talk to only known nodes, Freenet wouldn't work. A fundamental part of Freenet's adaptive mechanism is the "path compression" which comes from nodes using the DataSource field. It may be possible that as long as a significant portion of the nodes are not using this method, the necessary path compression will be achieved.
Consider an attacker who controls hundreds of nodes. Each of those nodes can ensure that all of the address references that they give out point to other nodes in that corrupt set. Then anyone who makes a request to one of these nodes may find with time that more and more of their references point at the corrupt nodes. Even if another mechanism eventually replaces inform.php, it will be possible for a well funded attacker to widely advertise nodes he controls as good starting points. Therefore, users should only trust nodes run by people they trust and should authenticate them over a secure channel.
To find nodes who haven't intentionally announced themselves, Mallory can create a Freenet node that portscans with a Freenet Request.Handshake message. All nodes I have examined automatically respond to this message with a handshake reply. Also, since the initial exchanges do not occur in binary format, it is possible to for a service provider to easily find out if any of its users are running Freenet nodes by searching through traffic looking for textual messages (e.g., "Freenet v1.0 DH KeyExchange") instead of having to scan ports.
Mallory can monitor the network by setting up nodes in strategic locations and by monitoring encrypted traffic between nodes. First we will examine how, by setting up a node, Mallory can obtain valuable information by observing the HTL and Depth fields, search key closeness and by making requests to other nodes.
Remember that messages do not automatically terminate after hops to live reaches 1 but are forwarded with some probability (with hops to live again 1). Depth is incremented at each hop and is used by a replying node to set the hops to live high enough to reach a requestor. Requestors can initialize depth to a small random value to obscure their location and a depth of 1 is not automatically incremented but passed unchanged with finite probability.
However, even to the extent that a random element is added to them, HTL and Depth still narrow the space in determining who the original requestor is. Some have suggested that it may be possible to do away with the HTL field because reply messages inherit the Message ID of the request they are replying to. Rather than depending on HTL, nodes can use Message IDs to route replies back in the proper direction that requests originated from.
However, even removing both the Depth and HTL fields will not entirely eliminate this problem. Freenet is designed to forward search requests in such a way that they close in on the search key in question with each additional hop. By examining how closely the requested search key matches the keys that it normally serves, a node can get some indication as to how early it is in the forwarding chain of nodes. In the simplest case, if a node receives a request for a search key that it has never served anything close to, he can guess that the request originated from a directly connected node (or that he is very early in the chain).
If Mallory makes a request through a node and receives a reply without observing any outgoing messages, he knows that node was either storing or originated the file. The success of a large number of requests for related files and the timing of replies may provide grounds for suspicion that those files were being stored previously on a node. Because intermediary nodes do nothing to alter messages, Mallory can gain many clues just by sending strategic messages through nodes and by observing the resulting size, ordering and latency of messages that enter and leave particular nodes.
Freenet faces a number of technical challenges to providing strong anonymity in the face of traffic analysis [11]. One obvious fix is to allow nodes to alter the order in which messages are sent and their latency, to pad messages to a constant size or split large messages, and to send a constant stream of traffic (real traffic plus dummy traffic during lulls).
Another solution is to employ onion routing, an anonymous communication method which allows one to set up an anonymous connection through several nodes [12]. One drawback to this for Freenet is that it requires the client to know the node addresses (and keys) ahead of time. This will make the startup node discovery problem worse, because a client needs to know not just one node, but several nodes.
A second drawback is that onion routing may significantly damage the overall efficiency of Freenet. Onion routing nodes do not know the contents of the messages they are forwarding. These messages will therefore add communications and computational load to the node, without allowing for local caching which is the key to Freenet's presumed efficiency. A compromise then, is to allow some subset of nodes to act as onion routers. For example a node could choose an onion route for the first few hops, after which the message will be inserted to the normal Freenet and be routed as usual.
Another alternative for privacy sensitive users is to access Freenet via existing anonymous communication networks [13]. In this way Freenet could focus on local caching and automatic load balancing, and other gateways into Freenet could provide a robust service for hiding identities.
3.5. Routing attacks
Designers hope that routing will improve over time due the behavior of the request and insert messages:
Mallory can do a number of simple things to subvert the efficiency of routing. He can reply with bogus data to requests (using content hashes instead of key word hashes would prevent this). Mallory can set the datasource for every message that passes through him to point back to himself. Mallory can also always provide a different datasource for every document it serves, forcing nodes to constantly update their routing tables and discouraging documents from clustering properly.
Mallory can also steer traffic for a target file toward himself by creating many bogus files that hash very close to the target file through exhaustive search. If he inserts them into the network making himself the datasource, this will tend to lure requests and inserts for the target file toward himself (affecting routing and also giving him the ability to detect more information about the nodes downloading the target file).
3.6 Denial of Service Attacks
A very simple denial of service attack is for an attacker to insert a large number of bogus files into the network to attempt to fill the network's storage capacity. A proposal to counter this attack is to divide the datastore into two sections, one for new inserts and one for established files. New inserts would only replace new inserts, therefore a flood of them could not displace existing files. However an attacker may be able to legitimize his bogus files, by requesting them from strategic locations where it will be cached on as many nodes as possible. This scheme might also make it difficult for genuine new inserts to survive long enough to be requested by others and become established.
An attempt by an attacker to replace a target file by inserting a bogus file whose search key collides with target's search key will not work. Instead, this is likely to spread the real file further, since the original file is propagated back on collision. This is true, but the designers do not consider the case where Mallory inserts bogus files (with the same search key as the target file) into many nodes that are disconnected from the network. When these nodes rejoin the network, there may be many more corrupt copies of the file on the network than real ones. This attack is not feasible if content hashes are used as search keys instead of keyword hashes.
Even if Mallory does not know the location of a node storing a target file, he can attempt to shut that node down by launching a distributed denial of service attack. First using exhaustive search he can find a (bogus) search key that is closer to target's search key than any other on Freenet. Then he can launch thousands of requests for that key using thousands of nodes. Because the data can't be found, many of the requests will find their way to nodes where the targeted data clusters, shutting them down. This may not work for popular documents that are locally cached throughout the network, but will work for less popular documents that cluster on specialized nodes.
To make DDOS attacks easier to launch, there is nothing to prevent Mallory from using any number of ports on a single host, pretending to be an extremely large number of nodes. Finally, there is nothing to prevent Mallory from indefinitely suppressing all messages that pass through him, from generating false DataReplies to DataRequests (e.g., always sending back spam messages), from generating false RequestFailed or TimedOut messages to DataRequests or from sending messages with extremely large amounts of data.
4. Conclusion
The attacks detailed above illustrate that is it is currently possible to defeat all of the security claims designers have made about Freenet. Of course, the system is a work in progress and many enhancements remain to be implemented. Freenet designers are strongly encouraged to widely publish their documentation of the protocol and future enhancements to encourage security reviews. The highest priority should be given to authenticating connections between nodes and more thought must be given to mechanisms for node discovery. To defeat traffic analysis, designers can implement a limited form of onion routing or concentrate on building gateways to existing anonymous communication networks designed with these goals in mind.
In the near future, I plan to evaluate various proposals to split large files on Freenet (to defeat traffic analysis) and proposals to enable secure update of documents. I am also interested in developing a mechanism that will allow nodes to vote on the validity of the information returned on requests, to defeat insertion of malicious files. The analysis in this paper raised many more vulnerabilities than it found solutions for, and it suggests that more research remains to be done in securing peer to peer communications.
References
[1] More details
and documents about Freenet can be found on its homepage, http://freenet.sourceforge.net/
[2] "A Distributed
Decentralised Information Storage and Retrieval System", Ian Clarke,
Division of Informatics University of Edinburgh Dissertation, 1999, available
at http://freenet.sourceforge.net/freenet.pdf
[3] Freenet:
A Distributed Anonymous Information Storage and Retrieval System, Clarke,
Sandberg, Wiley, Hong, ICSI Workshop on Design Issues in Anonymity and Unobservability,
July 25-26, Berkeley, California, available at http://freenet.sourceforge.net/icsi.ps
[4] Freenet Protocol
1.0 Specification (incomplete), available at http://freenet.sourceforge.net/index.php?page=protocol
[5] Freenet 0.3
Protocol Additions, available at http://freenet.sourceforge.net/index.php?page=r3proto
[6] Freenet Cryptographic
Layer, available at http://freenet.sourceforge.net/index.php?page=fncrypto
[7] Why Cryptosystems
Fail, Ross Anderson, http://www.cl.cam.ac.uk/ftp/users/rja14/wcf.ps.gz
[8] GSM hack--operator
flunks the challenge, Ross Anderson note to Risks Digest, available at http://catless.ncl.ac.uk/Risks/19.48.html#subj5
[9] Applied Cryptography,
Bruce Shneier, Chapter 22
[10] Freenet
Inform.php script, see http://freenet.sourceforge.net/inform0.3.php
[11] Mixmaster
& Remailer Attacks, Lance Cottrell, http://www.obscura.com/%7Eloki/remailer-essay.html
[12] Onion Routing
Research project, NRL http://www.onion-router.net/
[13] Freedom product from Zero Knowledge (http://www.zeroknowledge.com/)