Nick Reinking | 1 May 2003 04:54

Really good news!

One fine day, Brion Vibber said:
> Do feel free to ask other free projects and universities if they'd be
> interested in supporting the project...

I thought it might be a good idea to ask, so I sent out an email to ibiblio:
> Greetings - my name is Nick, and I help out with a site called Wikipedia.
> (http://www.wikipedia.org).  It is a free, multi-lingual project to
> create a complete, accurate, and more importantly open content
> encyclopedia.  All of the content is licensed under the GNU FDL (GFDL),
> meaning that anybody has the freedom to copy and redistribute it, with       
> or without modifications, either commercially or non-commercially - 
> although they may not put in place technical measures to conceal the
> content.
> 
> The English language Wikipedia has over 117 thousand articles already,
> and by our calculations, we have about half the content of the
> Encyclopedia Britannica.  We have many other languages, which are also   
> quickly growing in size and diversity.
> 
> However, we are currently facing both a budget and capacity crunch.
> In short, we have neither.  Since Wikipedia is a volunteer based
> program, it is hard for us to raise the funds to purchase additional
> hardware.  Right now, we are running off of a dual Athlon 1800+ server
> with 2GB of RAM and 36GB of SCSI storage.  Unfortunately, the system is
> being pushed to its absolute limits, with little relief in sight.  We
> are installing a second system this weekend as a front end, but even
> with that, we are not sure how long we can hold out.  The dual Athlon
> system runs at a typical load of 15-20 during normal US working hours.
> 
> Since our mission seems to be very much inline with ibiblio's mission, I was
(Continue reading)

Tim Starling | 1 May 2003 06:48
Picon
Favicon

Please run this DB query to fix Special:Randompage

Someone needs to run this:

UPDATE cur SET cur_random=RAND()

Here's a quote from Village Pump explaining why:

The pseudorandom number generator which backs the Special:Randompage link 
appears to be not so good. I just hit it about two dozen times and had 
several links come up multiple times. -º¡º

Emphasis on pseudo. I got RNA transcription four times in about a dozen 
clicks - twice in succession. Also, Zog's favorite song came up - but 
redirected to Johnny Rebel. I thought redirects were excluded. Geoffrey

Here's what I found out with a few SQL queries. The cur_random field values 
are strongly clustered up around the high end. In fact, there are no 
articles at all between about 0.18 and 0.4, and only few below 0.18. A few 
minutes of browsing through the old versions of SpecialRandompage.php shows 
why. A previous version of the software selected the lowest-numbered 
cur_random value, and set it to a random value. So here's why we now see 
poor results: Most of the pages are clustered up above 0.9 or so, so when 
you click Special:Randompage , there's a high chance of picking one of the 
few low-numbered articles. The cur_random value is then reset, and there's 
still a high chance of the new value being below 0.9. Hence, the few 
priveleged low-numbered articles get selected far more often, and unless 
someone re-randomizes cur_random column, it take a long time for the 
high-numbered articles to diffuse back down. -- Tim Starling 04:25 May 1, 
2003 (UTC)

--END QUOTE--
(Continue reading)

Lee Daniel Crocker | 1 May 2003 08:01
Gravatar

Re: Really good news!

>>  [ibiblio] would LOVE to host wikipedia.org...

I don't think that's an option, but certainly they might be
useful as a mirror, or storage for backups, etc.

--

-- 
Lee Daniel Crocker <lee@...> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Nick Reinking | 1 May 2003 15:00

Re: Really good news!

On Thu, May 01, 2003 at 01:01:06AM -0500, Lee Daniel Crocker wrote:
> >>  [ibiblio] would LOVE to host wikipedia.org...
> 
> I don't think that's an option, but certainly they might be
> useful as a mirror, or storage for backups, etc.

That's something I was thinking - perhaps, for example, the database goes
down.  At that point, we make a quick Apache change, and all requests
URLs are re-written to point at a read-only copy of Wikipedia  <at>  ibiblio.

However, is there a reason why we couldn't have ibiblio host the entire
Wikipedia?  It seems that is what Brion was asking for.  If they're
willing to supply three, four (or more) servers to keep Wikipedia
running, why isn't that a good thing?

--

-- 
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Jimmy Wales | 1 May 2003 15:00

Re: Really good news!

I'm not the least bit interested in moving wikipedia off my servers.

The claims of budget crisis are wildly exagerrated.  I'm donating one
and possibly two more servers to the project (one for a www frontend
and one for a mailserver and auxiliary front end).
Jimmy Wales | 1 May 2003 15:01

Re: Really good news!

Nick Reinking wrote:
> However, is there a reason why we couldn't have ibiblio host the entire
> Wikipedia?  It seems that is what Brion was asking for.  If they're
> willing to supply three, four (or more) servers to keep Wikipedia
> running, why isn't that a good thing?

There are a lot of reasons, but the main reason is that we shouldn't
make ourselves dependent on anyone else.

--Jimbo
Brion Vibber | 1 May 2003 17:10
Picon
Favicon
Gravatar

Re: Please run this DB query to fix Special:Randompage

On Wed, 2003-04-30 at 21:48, Tim Starling wrote:
> Someone needs to run this:
> 
> UPDATE cur SET cur_random=RAND()

Ok, it'll be a bit before it's done. Hooray for big slow databases. :D

-- brion vibber (brion  <at>  pobox.com)
Nick Reinking | 1 May 2003 17:53

Re: Please run this DB query to fix Special:Randompage

On Thu, May 01, 2003 at 08:10:38AM -0700, Brion Vibber wrote:
> On Wed, 2003-04-30 at 21:48, Tim Starling wrote:
> > Someone needs to run this:
> > 
> > UPDATE cur SET cur_random=RAND()
> 
> Ok, it'll be a bit before it's done. Hooray for big slow databases. :D
> 
> -- brion vibber (brion  <at>  pobox.com)

Off the top of my head, I can't think of any simple mathematical way to
do want you want to do.  (that being making articles selected randomly
less likely to be selected randomly over time)  Even if every cur_random
is totally re-randomized, it still isn't going to be fair.

Seems to me that the best way to do this is either:
1) Be truly random.  In PostgreSQL, this would be something like:
     SELECT cur_id from cur LIMIT 1 OFFSET random(SELECT count(cur_id) FROM cur)
   (this function should be really fast, not sure if faster than:)
     SELECT cur_id from cur LIMIT 1 OFFSET random(SELECT count(*) FROM cur)
   (but you get the idea, may need other constraints)
2) Keep a timestamp instead of a random number.  That way, whenever an
   article is "randomly" selected, it gets its timestamp updated to the
   current time.  Always select the oldest article for a "random" page.
   New articles always get a current timestamp here.

--

-- 
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Lee Daniel Crocker | 1 May 2003 18:39
Gravatar

Re: Really good news!

> (Nick Reinking <nick@...>):
> 
> However, is there a reason why we couldn't have ibiblio host
> the entire Wikipedia?  It seems that is what Brion was asking for.
> If they're willing to supply three, four (or more) servers to keep
> Wikipedia running, why isn't that a good thing?

Because they aren't us. We have a specific vision, and goals to
accomplish. Even if ibiblio totally supported those same goals
now, there's no guarantee they will in the future. Ultimately,
physical ownership of the servers is our guarantee that Wikipedia's
operation will serve our vision.

Also, my experience with ibiblio is that their servers tend to
run close to capacity anyway: try downloading Gentoo Linux, for
example.

--

-- 
Lee Daniel Crocker <lee@...> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Brion Vibber | 1 May 2003 19:07
Picon
Favicon
Gravatar

Re: Please run this DB query to fix Special:Randompage

On Thu, 2003-05-01 at 08:53, Nick Reinking wrote:
> Off the top of my head, I can't think of any simple mathematical way to
> do want you want to do.  (that being making articles selected randomly
> less likely to be selected randomly over time)

Not sure who wants to do that... For me it's quite sufficient to make
the probability of selection random, and let the size of the selection
field make multiple selections _reasonably_ rare.

The resetting of the random weight upon load is just meant to keep the
pot stirring, and "in theory" shouldn't have any effect at all on
randomness if the random indexes were random to begin with.

> Seems to me that the best way to do this is either:
> 1) Be truly random.  In PostgreSQL, this would be something like:
>      SELECT cur_id from cur LIMIT 1 OFFSET random(SELECT count(cur_id) FROM cur)
>    (this function should be really fast, not sure if faster than:)

Mysql doesn't seem to let you use a non-constant expression for
'LIMIT offset,count' (and besides, still no subqueries), but we can pick
a random number easily enough in PHP. The problem is that we aren't sure
what range to use; hence a random index attached to each article whose
range is known (between 0 and 1).

We don't want to select non-article pages or redirects, but we don't
currently have a count of exactly that. There's the "good articles"
count, but it has other criteria, so if we used it we'd always crop off
relatively recent articles. If we inflate it based on an estimate we'd
get overemphasis on the _most_ recent article if we estimate too high.

(Continue reading)


Gmane