Gnutella NG Monaco Proposal 14/04/2000
Comments to Lambla@bouygtel.com
The Monaco proposal details the ideas of the authors for the next generation of the protocol.
The goals of the system are to (in random order):
- Reduce the general overload of the distributed network
- Provide a way for slow connections not to put down the distributed network
- Provide an extensibility mechanism to allow future enhancements
- Make the network scalable
Message: a message is the binary header and the payload.
Payload: the content of a message.
Function: byte defining the type of a message in the binary header.
Servent: Defines a Gnutella protocol-compliant agent.
The bytes are represented in the form 0xnn.
Function 0x00: PING
Function 0x01: PONG
Function 0x80: QUERY
Function 0x81: QUERY REPLY
Function 0x40: PUSH
What about the “old” protocol
I talked with some people wanting to change completely the current protocol, either because they wanted to use ASCII, or because they wanted to use the IRC, meaning breaking the old protocol, and all the work done actually.
I think, for the success of this new protocol, we should create a protocol based on what was done.
Proposed changes to Protocol:
Each servent is flagged with his speed-class (SC). Several SC should be defined, something like that:
DSL / Cable
This is just an example, but 4 or 5 SC should be defined. The servent decides, from his connection, what is his SC.
Let’s call the servent connecting Alice, and the host Bob. When Alice connects to Bob, she checks his SC. If it is equal or superior to her own SC, then she keeps the connection as it is, and accepts connections.
A third servent named Caroline connects to Alice. Alice accepts the connection. Alice checks Caroline’s SC. As Caroline enjoy using her cable modem, and as Alice is only a modem user, Alice decide that she shouldn’t be connecting to her, and that she should find another host, a faster one. All these communications are only between Caroline and Alice, thus not being broadcasted to the distributed network.
Alice sends Caroline the address of her own host, here Bob, with a new message with a TTL of 1, which will be named REDIRECTED CONNECTION, containing the SC flag of Bob. Caroline then decides to connect to Bob, as he got an ADSL connection.
That way, connecting to the distributed network is still very easy, as anyone can connect from anyone, but there is some kind of hierarchy going on, to automatically create a “backbone”.
So we have the following:
So we define the modem SC as a servent that connects to a faster host, accepts all incoming connections but just to redistribute it to other servents on faster connections.
Now the network should redistribute the load between connections. Let’s take the following example:
Now why should Bob host everyone (Alice, Daphné, Eloïse and Francois), as Caroline host only Damien?
Here is the interesting part of the RC (Redirect Connection) message. Bob and Caroline both know how many modem users they each host, by a communication process detailed in the next paragraph. Now bob will choose one user, let’s say François, and send him a RC. Francois will then automatically connect to Caroline, therefore sharing the load of servents across hosts of the same SC.
This leads us to a new message, which I’ll name the SC-wide message (SCW). Let’s say Alice have a new modem user hosted, she will send a SCW telling so to Bob, so that Bob know that now she have 3 modem users, so that if Bob gets a fourth modem user connected, he won’t try to redirect it to Alice.
Now will there be too much SCW in the distributed network? As each SC servent will have just his share of users, quite equal to the others, there won’t be too much of them, as it will suppress most of the redundant connections.
So what does the distributed network look like with the 3 SC I proposed, with say 3 T1 connections, 9 cable/adsl connections, and 32 modem users?
Now the network is in a hierarchy, scalable and yet distributed. But it still nee improvements.
First of all, messages should begin or end with control bytes, so that a parser can know when the message begins and when it ends in the socket stream.
We may then provide at the end of the message a hash to provide transmission error detection on each messages, to drop invalid ones
With the hierarchy system I proposed before, there’s no need for the client to try finding other hosts to connect to, as they are advertised by the host on connection if needed. Now if this feature is needed, we should delete the concept of requesting the information by a Ping to get an answer, as the number of messages grows exponentially with the number of servents connected to the network.
There should only be an extended PONG message broadcasted over the network with the same information as today, released every n minutes, and clients should delete references to servers who didn’t broadcast it for n+1 minute.
The query/query response shouldn’t work as it is done today. If the query should be broadcasted to the distributed network, the query response should be routed back to the emission point. So here is my idea of how it should work:
Alice send a query message which is broadcasted across the DN, with it’s GUID. Each servent remembers the servent who sent a specific GUID. Now let’s say a distant computer find the specific file needed by Alice. It should create a Query Response with a new GUID, add a reply-to flag in the message with the GUID of the original message. Then each servent getting the query response will send it back to the servent who emitted the original message. That way, the query response is no longer broadcasted unnecessarily on all the DN.
Now why shouldn’t the query response have only the original GUID? Obviously, each GUID should be globally unique, so that servents can drop messages with an already-seen GUID without having to analyze it. Analyzing the function is always longer than just checking the first 16bit GUID of the header. And anyway, it’s easier to code :-)
File downloading could finally be enhanced by providing a hash (MD5, CRC or other) of the file in the query response, so that when several servants have the same file, it could be downloaded in multi-part efficiently. The hash is the only way to be sure two files are the same.
Yet to be discussed, I don’t really know much about this topic anyway…
Even with these systems, we could optimize the bandwidth requirement by proxying search requests from the modem, and even proxy the PONG messages.
Each servent hosting a modem will receive from him the list of files shared and cache it. Now when a query arrives to the host, it answers for the modem servent.
In the same way, files of the modem servent will be presented in the PONG message of the host servent as if it was the same file system. Modem users, as they don’t accept incoming connections apart from redirecting them to another connection, don’t need to be known on the network.