Ed Tomlinson | 1 Oct 2003 01:22

Re: Re: Some suggestions for NG routing

On September 30, 2003 12:43 pm, Ian Clarke wrote:
> On Tue, Sep 30, 2003 at 08:12:31AM -0400, Ed Tomlinson wrote:
> > 1.  The NG weight calculation contains 4 terms,  wSF, wTF, wDNF,
> > wSuccess. The last two are conditional on pSF (and pConn which is not
> > used now).  When a node is busy its pSF quickly reaches .99 or larger. 
> > This tends to damp out any wDNF/wSucess info so, in effect when we are
> > routing to busy nodes, we route randomly.  If we remove the
> > (1-SFp)*(1-pConn) factor from these terms we would try the busy nodes in
> > order of specialization.
>
> But that term is there for a reason - namely to discourage routing to
> busy nodes in a manner that tries to balance the risk that the node will
> QR with the benefits of routing to that node.  Getting rid of that term
> will eliminate NGR's load balancing.

Yesterday I traced down and fixed a but that was producing very very large
values for tSF.  With this, this hack may not be needed...  And yes it would bias
the weights.

One point here.  A case could be made that wTF should be using (1-pSF)
term too.  Suspect that most transfer fails occur during the receipt of data.

> > 2. When estimating weights, if pDNF is zero, wDNF can be less than zero. 
> > I suggest that, when the estimator for a node returns zero for pDNF (it
> > has no info a this time) we would be better to use pLegitDNF.  i.e. add
> > if(pDNF==0) pDNF=pLegitDNF;
>
> Yes, I think there is a general argument to be made that we should use
> global averages when we don't have enough node-specific data for a given
> estimate.
(Continue reading)

Ed Tomlinson | 1 Oct 2003 01:26

Re: Sanity check

On September 30, 2003 05:33 pm, Ian Clarke wrote:
> Just a sanity check in the RTE code:
>
> Line 223 in StandardNodeEstimator - is this right :
>   rtTransferFailed.report(size);
>
> It looks like it is reporting the size, but shouldn't it be reporting
> the time?

It looks very much like it should be read:

rtTransferFailed.report(time);

good catch.

Ed
Edward J. Huff | 1 Oct 2003 01:54
Picon
Favicon

Re: Sanity check

On Tue, 2003-09-30 at 17:33, Ian Clarke wrote:
> Just a sanity check in the RTE code:
> 
> Line 223 in StandardNodeEstimator - is this right :
>   rtTransferFailed.report(size);
> 
> It looks like it is reporting the size, but shouldn't it be reporting 
> the time?
> 
> Ian.

This is correct.  The value is labeled "ms" in a report, and
is used to calculate an estimated time.

You might also want to fix getDataLength() which is broken.
(Apparently it is or would be called from SimpleDataObjectStore.java,
but since many objects don't give accurate values, SDOS uses a
different method).

-- Ed Huff

--- StandardNodeEstimator.java.~1.3.~	2003-09-30 00:23:36.000000000 -0400
+++ StandardNodeEstimator.java	2003-09-30 19:38:58.000000000 -0400
 <at>  <at>  -220,7 +220,7  <at>  <at> 
 		rpDNF.report(0.0);
 		// transferFailed is a peer event to dataNotFound, transferSucceeded
 		// and searchFailed
-		rtTransferFailed.report(size);
(Continue reading)

Ian Clarke | 1 Oct 2003 03:31
Gravatar

Re: Re: Some suggestions for NG routing

> > Absolutely, I would go one further and say that access times should only be
> > updated with a node successfully returns data - this is a *major* issue and
> > could be having a serious negative impact (it would explain the slow
> > node-turnover in the RT causing it to fill up with overloaded nodes).
> 
> I have looked at this some more.  I its a bit trickier than just changing one
> value.  As it is now once a node has had one successfull query it seems to
> say in the rt as long as it can be connected to... 

I have not RTFS but Matthew claims that this is already the case - you 
should probably verify this one way or the other or talk to him before 
making any changes.

> > > What I propose is to peturb the rpSearchFail
> > > average down when we see a reference for a node already in the rt.  We
> > > probably want to add code to limit this to one peturb until a request is
> > > processed.  This would prevent a bad node from attacking by sending
> > > reseting all references. I have the code to implement most of this
> > > written.
> >
> > What do you mean by "peturb"?  Peturb how?  As described here it smells
> > like alchemy, is it?
> 
> Yes its a bit of alchemy - can you see any other way to take advantage of this
> info?  What I would do is _one_ rpSearchFailure.report(0).  This will reduce
> rpSearchFailure by 20%.

I don't think the benefits outweigh the disadvantage of polluting NGR 
with alchemy.

(Continue reading)

Ian Clarke | 1 Oct 2003 03:38
Gravatar

Re: Sanity check

On Tue, Sep 30, 2003 at 07:26:23PM -0400, Ed Tomlinson wrote:
> > It looks like it is reporting the size, but shouldn't it be reporting
> > the time?
> It looks very much like it should be read:
> rtTransferFailed.report(time);
> good catch.

I won't be 100% sure until Toad verifies this, but if there is such an 
obvious error in the RTE code then we really should go over it with a 
fine tooth-comb (and/or come up with some debugging tools to help us do 
sanity checks on this) since this and other bugs are likely to render 
NGR completely useless until they are fixed, and they tend not to 
manifest themselves in anything other than a large-scale performance 
failure (such as the one we are seeing right now).

Ian.

--

-- 
Ian Clarke                			            ian@...
Coordinator, The Freenet Project	      http://freenetproject.org/
Weblog				     http://slashdot.org/~sanity/journal
Edward J. Huff | 1 Oct 2003 04:29
Picon
Favicon

Re: Sanity check

On Tue, 2003-09-30 at 21:38, Ian Clarke wrote:
> On Tue, Sep 30, 2003 at 07:26:23PM -0400, Ed Tomlinson wrote:
> > > It looks like it is reporting the size, but shouldn't it be reporting
> > > the time?
> > It looks very much like it should be read:
> > rtTransferFailed.report(time);
> > good catch.
> 
> I won't be 100% sure until Toad verifies this, 

find . -name '*.java' -print0 | xargs -0 grep -n transferFailed |
 grep -i key
./src/freenet/node/rt/NGRouting.java:161:
last.transferFailed(key, System.currentTimeMillis() - lastTime, size);
./src/freenet/node/rt/NodeEstimator.java:63:
abstract public void transferFailed(Key key, long time, long size); 
./src/freenet/node/rt/StandardNodeEstimator.java:217:
public void transferFailed(Key key, long time, long size) {

The only call is NGRouting.java:161, and the second argument
is obviously a time and the third is the size argument to
NGRouting.transferFailed().

lastTime is a member of class NGRouting.  It is set to
System.currentTimeMillis() in getNextRoute(), routeConnected(),
verityFailed(), queryRejected(), and dataNotFound().

So the value being averaged in rtTransferFailed was definitely
not a time, and now it is.  But does transferFailed() also get 
called after VF, QR, or DNF?
(Continue reading)

freenet | 1 Oct 2003 12:41

Re: Beautifying Freenet

>> IF the smaller installer will automatically chain to the larger
>> installer when a suitable JVM isn't installed, then I would support
>> removing the link to the big one, although I still say it's a concession
>> to the moron party, since somebody might conceivably know that they
>> don't have a JVM, and want to save some time by downloading the big
>> installer straight off.

And I completely agree.  I am working to make the smaller installer
download and run the larger installer, but due to limitations in our
installer system (the free NSIS system), this is proving to be a pain in
the ass (unless the user is expected not to balk when entering the install
location, etc, all over again when the second installer is run)

> How about the same 2 seperate installers, with links saying:
> "Download this if you do not have a JVM, or are not sure"
> "Download this if you KNOW you have a JVM"

Sounds good, but then...

> Just how much bigger does the JVM make the installer?

... yeah.  The small installer is approx 50 something KB (and when the new
icons go in it will be nearer 100KB).  The with-java installer is over
15MB...

There's one final alternative.  The freenet installer could automatically
install and run Sun's JAVA Online Installer, which connects to SUN's
website to conditionally download the JVM.  We would have one
moderately-sizer installer (I guess about 400KB).  It's almost ideal,
unless the users are expected to balk at the idea of our installer running
(Continue reading)

Ed Tomlinson | 1 Oct 2003 13:17

Re: Sanity check

On September 30, 2003 10:29 pm, Edward J. Huff wrote:
> On Tue, 2003-09-30 at 21:38, Ian Clarke wrote:
> > On Tue, Sep 30, 2003 at 07:26:23PM -0400, Ed Tomlinson wrote:
> > > > It looks like it is reporting the size, but shouldn't it be reporting
> > > > the time?
> > >
> > > It looks very much like it should be read:
> > > rtTransferFailed.report(time);
> > > good catch.
> >
> > I won't be 100% sure until Toad verifies this,

I am.  Espically with the results below.

> find . -name '*.java' -print0 | xargs -0 grep -n transferFailed |
>  grep -i key
> ./src/freenet/node/rt/NGRouting.java:161:
> last.transferFailed(key, System.currentTimeMillis() - lastTime, size);
> ./src/freenet/node/rt/NodeEstimator.java:63:
> abstract public void transferFailed(Key key, long time, long size);
> ./src/freenet/node/rt/StandardNodeEstimator.java:217:
> public void transferFailed(Key key, long time, long size) {
>
> The only call is NGRouting.java:161, and the second argument
> is obviously a time and the third is the size argument to
> NGRouting.transferFailed().
>
> lastTime is a member of class NGRouting.  It is set to
> System.currentTimeMillis() in getNextRoute(), routeConnected(),
> verityFailed(), queryRejected(), and dataNotFound().
(Continue reading)

Ed Tomlinson | 1 Oct 2003 13:44

does ocm lie?

Hi,

I have noticed something funny in the ocm stats.  My box with a node
up 14 hours (6214) has 242 connections transmitting data and has 
324M of data queued on them.  My output bandwidth is limited to 
10k/s.  The servlet/nodeinfo/performance/general tells me I am
using about 8k/s.  So either the ocm lies about what is queued OR
most of those connections are stalled...  It may well be a stalling
problem - if I limit the send queue size to a couple of mins of
traffic (by QRing) servlet/nodeinfo/performance/general shows
dropping bandwidth over time.

Are we timing out transmits?

Another interesting point.  We I coded the first NG implementation
I ignored Transfer Failed messages.  I put some metrics in to check
for them and there were so few that it was safe to ignore them (its 
not safe in the long run).  Now we are seeing lots of transfer failed.
Why?  

Think answering any of these questions will solve the others.

Anyone have ideas?
Ed
Some Guy | 1 Oct 2003 17:07
Picon
Favicon

Re: Re: Really Stupid Routing (RSR)

Thanks Theo,

You're right, perhaps I've oversimplified. 

I had to read the prerequisit a paper first:
0007235 v2
"Random graphs with arbitrary degree distributions and
their applications" 
M. E. J. Newman, S. H. Strogatz, and D. J. Watts

I don't think you could say that freenet nodes have 
power distributed number of neighbors.  For the most
part they should have  connections about proportional
to bandwidth.  I suppose you could try to use "Graphs
with arbitrary specified degree distribution" section
from Newman's paper.

I think there is still one more problem here:   All
these networks assume that where the connections go
from one node is random relative to were they come
from.  This is absolutely not true in any of the DHT
networks and shouldn't be true for freenet either. 
I'm pretty sure if it were true routing couldn't be
O(log(N)/log(m)) but something like O(N/m).  In short,
freenet nodes connect more to others with the same
specialization, which makes freenet non-random in this
sense.

I'm still tring to rap my head around those papers. 
I'd like to keep it simple and stupid for a moment. 
(Continue reading)


Gmane