Gnutella routing, Notes, Jukka Santala, donwulff@nic.fi, 14 May 2000
Revised May 15th: HTML:ized, expanded LEVEL 1 description.
Revised May 16th: Added rest of the levels. Hopefully understandable.

Routing, ah, what a fun thing. I've particularily liked the ideas set forth in Stormrose's "moods" proposal, however, as it stands Gnutella clients are missing even elementary routing etiquette or recommendations. This document tries to briefly fix that omission relying on the experience and ideas I've gleamed so far. It's going to be a living document, and as it stands there's no empirical evidence anything suggested herein is better than anything else, especially depending on how many others implement the same action. Thus the use of "Levels" in this document doesn't apply to particular greatness or "goodness" of an approach, but more to a level of complicatedness and perceived importance. It is also the recommended order of implementation, as each level relies in some way on the earlier.

LEVEL 0 - Disconnection etiquette

Gnutella clients use an interesting method of dynamic routing. Sadly, one of the effects of this methodology are that when links between servents are formed and old ones abandoned old routes no long work, and messages traveling along them are wasted, contributing to lack of general robustness and increase of traffic. In fact, it can be shown that at some point GnutellaNet will reach a size where the sheer number of link disconnections will make the network useless with current practices. With the recommendations in this document, the number of automatic disconnects rises even higher, and things could quickly come to a screeching halt if nothing was done.

There is, however, a rather simple solution to these problems, that does not require the infinite maintenance of links. Namely, by stopping forwarding any broadcast (or random routed) data to a link up to a minute before disconnecting it, a servent can ensure that at least most of the data to be routed over it arrives safely. The other end of the link MAY, if it detects a sudden drop of all broadcast messages while routed replies beyond the link keep coming in, decide to initiate disconnection sequence of its own and stop forwarding broadcast messages from that end. However, the disconnecting servent SHOULD send a content-request message (See associated proposal) disabling all broadcast-messages. Ofcourse, if no broadcast messages have been sent over the link or received and forwarded yet, there's no need to maintain this system.

This method, ofcourse, is only possible for predicted disconnections. This is why servents should strive to make as many link disconnections as possible predictable, and follow this recommendation for them. If the disconnect reason is bandwidth or redundancy, just dropping broadcast traffic (especially if honored by other side as well) can help the situation until the connection can be properly closed. Ofcourse, the bandwidth situation could also improve during this time, and the disconnection sequence interrupted. Shutting down the client, when possible, should be done with a waiting-delay even just because of uploads, altough to avoid user frustration a quick escape might have to be provided. As for disconnections due to bad messages, the threshold is already arbitrary, and unless the servent starts seriously misbehaving there should be no reason not to wait an extra minute.

LEVEL 1 - Redundancy elimination

While redundancy in GnutellaNet links is undoubtly a good thing, there are kinds of overly-redundant connections that only contribute to the bandwidth use and total lag on the network. The ability to detect these and disconnect them (Using the method described in LEVEL 0) will significantly improve GnutellaNet performance and even robustness as more useless links get replaced. Unfortunately the detection of detrimentally-redundant links is not always entirely automatic or easy. However, there are some methods that can be recommended so far.

The first and perhaps simplest sign of a non-contributing link is when you receive your own broadcast-message back from another link. Unfortunately, this can theoretically only happen if the message directly from you is still traveling while a copy of the same message arrives to them throug another route, so this should not be a frequent occasion. However, when it does occur, it is a good signal that the link in question is not only redundant, but also inefficiently slow or lagged. This condition could also be caused by the TTL bug; check the TTL and Hops fields for unreasonably high values. It could also be just due to temporary congestion; you may want to see if it repeats or if there are other sings before initiating disconnection.

Another method suggested to me is based on detection of short loops, and is good advice. If you receive a duplicate broadcast message from somebody else with only one or two hops difference to the duplicate ones hops-value, this indicates there are two routes from them with only small difference in your end, and you might want to drop either one. However, if the longer one comes in significantly before the shorter one, you might want to consider dropping the lower-hops one if you prefer speed of responses over how wide the search goes. Remember that duplicates are dropped regardless of TTL/Hops, so if you kept both, the faster routes TTL/Hops would be likely to stand, and the other one be rather useless.

Links which constantly send you duplicate messages and no originals are suspect too, but you should keep in mind that it is no indication of redundantness, simply of slowness. You might want to consider sending them a content-request message (See associated proposal) requesting them to stop sending broadcast traffic to you. However, if after a significant time you get no routed messages from them, it is likely they're redundant or at least so lagged the link is useless. Be aware that if many messages coming from that link have lower Hops than the duplicates (But see above, also!), this link could be able to reach hosts others can't, so you might want to wait longer to request stopping broadcasts or for a routed reply to arrive or just keep the link.

LEVEL 2 - Furthest shore

At LEVEL 2 the consideration is merely on trying to have each new link placed as far as possible from the existing, old links. The use of other criteria is also encouraged. For this purpose a "host-catcher" such as already exists in practically every implementation of Gnutella client is used, however with some changes. First of all, the distance from Hops field to each host should be recorded with other information about the host. Instead of its own PING/PONG interaction, a servent SHOULD also rely on listening PONG's and query REPLIES it is relaying to other servents. In fact, to reduce bandwidth usage sending your own PING messages is strongly discouraged. PUSH requests also have IP's, but should only be used by firewalled machines, so you might want to ignore them or even mark those hosts as invalid.

There are quite a few considerations for the pure Hops model of routing, or link-selection. Foremost of these is that all the Hops distances recorded for hosts may be invalidated every time a new link is generated. Until majority of clients support intelligent link selection, incoming links are less likely to affect the current Hops counts though. And while the losing of existing links affects Hops counts as well, these should all mean increase or no change especially if the disconnect was due to link redundancy. But successfull LEVEL 2 based outgoing links should invalidate all accumulated Hops information, or at least make it secondary criterion.

Another issue is the collection of Hops values. When duplicate messages are received, the lowest Hops value should be recorded, as this is your systems shortest path to the remote host. However, due to routing-changes in the rest of the network during long periods without creating new links, Hops distances may sometimes also increase. Keeping track of this would unfortunately seem to require mainintaining minimum Hops for each Message-ID, which may get impracticable. Automatically resetting the distances to all hosts at regular intervals could help eliminating stale entries and no longer active hosts. The Hops-values and hosts should also probably be catched before any possible Hops/TTL validation or message dropping, but it is possible many bad-message detection checks should be applied first to avoid linking to a host with inpropotionate amount of bad messages.

For the actual choice of the target of outgoing link, there exists a number of considerations as well. Hops values beyond about 50 are likely the result of the famous TTL bug, and in any case the link-selection algorithm may want to avoid ridiculously high Hops-counts. Especially if, as the case often is, more than one new link is required at once, it is easy to run out of hosts with a valid Hops count due to the invalidation logic. For this reason it is recommended to keep the entries with invalidated Hops-counts still in the table and just select one at random when running out and hope for LEVEL 1 process to weed out poor choices. Also, to avoid hammering high-Hops links, a host should be marked "used" when the connection is lost and still kept in the table as a reminder it's already used. It could be put back to "invalidated" state after quite a while. Finally, when a tie for Hops-count exists, the number of messages routed from that host recently could be used as a tie-breaker, or even weighted in to the whole link selection process.

A sample implementation of LEVEL 2 would store the MINIMUM Hops received for every host (Decided by IP and Port) unless the value was negative, in which case it was left at that. Hosts from which it receies a PUSH request it would set to Hops -1. To select a candidate for a new link, it would have to find the host with highest Hops value and not currently in use. In case of a tie, it would select one of the entries with same Hops count at random. If the new link was successful, all Hops-counts that weren't negative would be set to 0. Any failed link, whether new or already having been up for a while, should have their Hops cound made -1. This way the link-selection algorithm would always select first one of the farthest known host, or a random new host if no valid Hops values were known, and try again some earlier used hosts if it ran totally out of hosts.

Possible improvements: In long-running servents, negative Hops-counts could have 1 added to them every once in a while to make them reach invalidated (0) status and eligible for new connection attempt. Additionally, when a host with negative current Hops is re-used, one could be substracted from it to prevent it from being re-used a third time. As an additional consideration for selecting a new link, the number of messages received from them (at least for query replies; PONG traffic may be too random to be of interest here) could be used in order to optimize by bandwidth. This could be used as the tie-breaker for highest Hops, or even be figured in the original link-selection if you want to agressively help the GnutellaNet routing. In that case you would want to reset (Or divide) the count between some intervals in case the host has changed. It could also make sense to maintain the Hops-counts without the newest link until another one is started in case the last link is broken. The link the fastest Hops path was aquired through could also be saved, and the respective Hops-counts invalidated or marked for update when a link is lost - but consider this can also improve the Hops-count to that host through other links.

LEVEL 3 - Routing madness

There are a few problems to the method described in LEVEL 2. A common concern I keep hearing from servent implementators is "But what if I only have one IP to connect to? Then I won't be routing traffic between two hosts, and I won't receive any new IP's." Strictly speaking, this is true. However, what if you had no hosts to connect to? The simple solution is to get more than one host at a time. If you perform any queries of your own at all you will also receive lots of IP's to hosts of interest specifically to you. The worst problem, however, is that the IP's you get in query replies (Which you might be preferring over PONG's, and in any case LEVEL 2 seeks to reduce PING/PONG traffic) are predominantly for hosts with lots of file-traffic that may give poor link performance or even not accept them. In fact, a pure router with no file-traffic would be invisible to you without PING's.

The solution is to not totally relinguish the PING/PONG protocol. But to reduce traffic, any PING sent should have TTL of 2 (Or 1 as they leave your system) so that they will only be passed to all the uplinks of your link. They should probably also only be sent to one link at time; preferably at connection-time, altough you might also want to recheck the connections every 10 minutes or so. You might also want to cut every PING request going through your host to a max TTL of 2. This is theoretically enough to discover every host on the network, altough it does require connecting around to each host to find their direct uplinks, which isn't recommended. Unfortunately not all hosts allow you to connect to them at all, which is why some parts of the network could be outside of reach. The LEVEL 2 method would let you reach most of these, and LEVEL 4 puts forth recommendations to solve the problem.

A thing that TTL 2 PING's does allow you to do is hopping your connection around to "practically same routing location" even when you lose connection to one host. In other words, if you lose your connection to one host, you can reconnect to one of it's direct uplinks in effort to be able to still reach many of the same hosts. In fact, sometimes this will even allow you to catch some of the reply-messages you'd otherwise lost. This is most useful when you're trying to connect to a file-serving "leaf" host with only one uplink, but it drops you due to being busy. Unfortunately, if you lost the link due to the target host going down, all of its uplinks may no longer be connected to each another, and the new routing location may be significantly worse than the last one. In that case LEVEL 1 will take care of it, but recall that the rest of the network is likely to route itself for you as well.

In context of LEVEL 2 behaviour, LEVEL 3 should only be used when you lose an existing non-redundant link, and in itself should probably not cause the Hops invalidation mentioned in LEVEL 2 since the routing position should stay the same with only one hop change to way or another. LEVEL 2 should still be used for establishing completely new links, particularily when dropping old ones due to detected detrimental redundancy as in LEVEL 1, or just timed disconnects to change routing. Remember that LEVEL 0 should be applied in all cases.

LEVEL 4 - Beam me up, Scotty

LEVEL 4 is still very much an open book. Our main concern is that current servents generally, when capable, just refuse new incoming links if they intend to be some sorts of leaf-servers. Instead, they should always accept the connection (Unless clear DoS activity is detected), send out PONG's for their immediate downlinks as a kind of "Try one of these instead" protocol, and crudely disconnect the link before either party can pass any broadcast traffic (In accordance with LEVEL 0).

Notice I mentioned sending out immediate downlinks, at once, without accepting any broadcast traffic. The only way to do this is if the host already knows its uplinks... and it does. In fact, if it follows LEVEL 3 protocol itself, it knows all the PONG replies for its direct uplinks (And the hosts behind those, but it can safely ignore them) and can thus reproduce them easily from memory to send to those trying to connect.

I've written a recommendation dealing with GnutellaNet protocol extensions. This protocol would let a servent communicate a few of the parameters mentioned in this document, and is relevant. First of all, a servent could request to receive no PING's with Hops higher than specified value, and thus avoid even getting the requests helping to reduce bandwidth consumption a lot. A leaf-client could send an empty content-request message to signal it won't have any of that before continuing with the PONG's and disconnection. The max-hops value for query REPLIES and PONG's should be left high to allow host-catching from them.