Gnutella NG Monaco Proposal 14/04/2000
Comments
to Lambla@bouygtel.com
The Monaco proposal details the ideas of the authors for the
next generation of the protocol.
The goals of the system are to (in random order):
-
Reduce the general overload of the distributed network
-
Provide a way for slow connections not to put down the
distributed network
-
Provide an extensibility mechanism to allow future
enhancements
-
Make the network scalable
Message: a message is the binary header and the payload.
Payload: the content of a message.
Function: byte defining the type of a message in the binary
header.
Servent: Defines a Gnutella protocol-compliant agent.
The bytes are represented in the form 0xnn.
Messages names:
Function 0x00: PING
Function 0x01: PONG
Function 0x80: QUERY
Function 0x81: QUERY REPLY
Function 0x40: PUSH
What about the “old” protocol
I talked with some people wanting to change completely the
current protocol, either because they wanted to use ASCII, or because they
wanted to use the IRC, meaning breaking the old protocol, and all the work done
actually.
I think, for the success of this new protocol, we should
create a protocol based on what was done.
Proposed changes to Protocol:
Each servent is flagged with his speed-class (SC). Several
SC should be defined, something like that:
Modem
DSL / Cable
T1
This is just an example, but 4 or 5 SC should be defined.
The servent decides, from his connection, what is his SC.
Let’s call the servent connecting Alice, and the host
Bob. When Alice connects to Bob, she checks his SC. If it
is equal or superior to her own SC, then she keeps the connection as it is, and
accepts connections.
A third servent named Caroline connects to Alice.
Alice accepts the connection. Alice checks Caroline’s SC.
As Caroline enjoy using her cable modem, and as Alice is only a
modem user, Alice decide that she shouldn’t be connecting to her, and
that she should find another host, a faster one. All these communications are
only between Caroline and Alice, thus not being broadcasted to
the distributed network.
Alice sends Caroline the address of her own host,
here Bob, with a new message with a TTL of 1, which will be named REDIRECTED
CONNECTION, containing the SC flag of Bob. Caroline then decides to
connect to Bob, as he got an ADSL connection.
That way, connecting to the distributed network is still
very easy, as anyone can connect from anyone, but there is some kind of hierarchy
going on, to automatically create a “backbone”.
So we have the following:
So we define the modem SC as a servent that connects to a faster host, accepts all incoming connections but just to redistribute it to other servents on faster connections.
Now the network should redistribute the load between connections. Let’s take the following example:
Now why should Bob host everyone (Alice, Daphné, Eloïse and
Francois), as Caroline host only Damien?
Here is the interesting part of the RC (Redirect Connection)
message. Bob and Caroline both know how many modem users they each host, by a
communication process detailed in the next paragraph. Now bob will choose one
user, let’s say François, and send him a RC. Francois will then automatically
connect to Caroline, therefore sharing the load of servents across hosts of the
same SC.
This leads us to a new message, which I’ll name the SC-wide
message (SCW). Let’s say Alice have a new modem user hosted, she will send a
SCW telling so to Bob, so that Bob know that now she have 3 modem users, so
that if Bob gets a fourth modem user connected, he won’t try to redirect it to
Alice.
Now will there be too much SCW in the distributed network?
As each SC servent will have just his share of users, quite equal to the
others, there won’t be too much of them, as it will suppress most of the
redundant connections.
So what does the distributed network look like with the 3 SC
I proposed, with say 3 T1 connections, 9 cable/adsl connections, and 32 modem
users?
Now the network is in a hierarchy, scalable and yet
distributed. But it still nee improvements.
First of all, messages should begin or end with control
bytes, so that a parser can know when the message begins and when it ends in
the socket stream.
We may then provide at the end of the message a hash to
provide transmission error detection on each messages, to drop invalid ones
With the hierarchy system I proposed before, there’s no need
for the client to try finding other hosts to connect to, as they are advertised
by the host on connection if needed. Now if this feature is needed, we should
delete the concept of requesting the information by a Ping to get an answer, as
the number of messages grows exponentially with the number of servents
connected to the network.
There should only be an extended PONG message broadcasted
over the network with the same information as today, released every n
minutes, and clients should delete references to servers who didn’t broadcast
it for n+1 minute.
The query/query response shouldn’t work as it is done today.
If the query should be broadcasted to the distributed network, the query
response should be routed back to the emission point. So here is my idea of how
it should work:
Alice send a query message which is broadcasted across the
DN, with it’s GUID. Each servent remembers the servent who sent a specific
GUID. Now let’s say a distant computer find the specific file needed by Alice.
It should create a Query Response with a new GUID, add a reply-to flag in the
message with the GUID of the original message. Then each servent getting the
query response will send it back to the servent who emitted the original
message. That way, the query response is no longer broadcasted unnecessarily on
all the DN.
Now why shouldn’t the query response have only the original
GUID? Obviously, each GUID should be globally unique, so that servents can drop
messages with an already-seen GUID without having to analyze it. Analyzing the
function is always longer than just checking the first 16bit GUID of the
header. And anyway, it’s easier to code :-)
File downloading could finally be enhanced by providing a
hash (MD5, CRC or other) of the file in the query response, so that when
several servants have the same file, it could be downloaded in multi-part
efficiently. The hash is the only way to be sure two files are the same.
Yet to be discussed, I don’t really know much about this
topic anyway…
Even with these systems, we could optimize the bandwidth
requirement by proxying search requests from the modem, and even proxy the PONG
messages.
Each servent hosting a modem will receive from him the list
of files shared and cache it. Now when a query arrives to the host, it answers
for the modem servent.
In the same way, files of the modem servent will be presented in the PONG message of the host servent as if it was the same file system. Modem users, as they don’t accept incoming connections apart from redirecting them to another connection, don’t need to be known on the network.