Jimmy Wales | 8 Feb 20:03 2002

New list

This is the new list for tech discussions for wikipedia:

wikitech-l@...

Brion L. VIBBER | 8 Feb 20:39 2002
Picon

Re: [Wikipedia-l] Plans for RecentChanges page; programmers only!

(moving thread to wikitech-l)

Jan Hidders wrote:

>From: "Brion L. VIBBER" <brion@...>
>
>>Jan Hidders wrote:
>>
>>>Dear fellow programmers,
>>>
>>>I saw that the SQL code for the Recent Changes page is rather inefficient
>>>and causes a lot of database access, so I decided to improve this.
>>>
>However,
>
>>>I can only do this properly if the timestamp field in the tables is split
>>>
>in
>
>>>a day and a time field.
>>>
>>Can I ask how exactly that would help? I'm not much of a database guru,
>>so the answer isn't obvious to me and I'm a bit curious.
>>
>
>It allows me to do a GROUP BY on the day. That way I can take a left outer
>join between the cur table and the old table and group on the combination of
>cur_day and cur_title. This allows me to get all the information I need for
>the page in one SQL statement.
>
(Continue reading)

Magnus Manske | 8 Feb 21:06 2002
Picon

Recent changes; flush cache

Jan's change to the database has my blessing, as he is a (the!) database
expert. Anyway, it isn't *my* baby anymore, it is *ours*;) Yeah, the
blessings of modern cloning technology...

As for the cache flushing mechanism Brion proposed in a private mail, the
easiest way would be to link it to the cur_counter field; if "cur_counter
MOD 20 == 0", the cache could be flushed, best by setting the local $cache
variable to "" at the end of the load routine.
20 is just a wild guess...

Finally, thanks to Jimbo for setting up this mailing list.
Gee, I never caused a mailing list before ;)

Magnus

Jimmy Wales | 8 Feb 21:51 2002

Questions about installing

When I check everything out of the CVS, I do it into a directory that has
nothing to do with the real site.

Then, I copy the files over to the proper location.

It seems that wikiText.php AND wikiTextEn.php are always different, and I have
to edit them... so really, I shouldn't be copying them unless there's a good reason,
right?

Here's my exact question:

wikiTextEn.php warns me:
# ATTENTION:
# To fit your local settings, PLEASE edit wikiText.php ONLY!
# Change settings here ONLY if they're to become global in all wikipedias! 

But that seems a bit "opposite" to me... doesn't wikiTextEn mean "wikiText English"?
If so, then changes here should ONLY affect the English wikipedia, not "global in
all wikipedias"?

Also, whichever way it is supposed to be, I'm sure I should only have to edit one file.
But I have to edit two.

First, $wikiCurrentServer returns http://wikipedia.com in the default configuration, but
we prefer http://www.wikipedia.com/ (see line 12 of wikiTextEn.php, I always edit to
hardcode this.)

And on the next line, $wikiSQLServer is different locallly: the database is named
"wiki" instead of "wikipedia".

(Continue reading)

Brion L. VIBBER | 8 Feb 22:38 2002
Picon

Re: Questions about installing

Jimmy Wales wrote:

>wikiTextEn.php warns me:
># ATTENTION:
># To fit your local settings, PLEASE edit wikiText.php ONLY!
># Change settings here ONLY if they're to become global in all wikipedias! 
>
>But that seems a bit "opposite" to me... doesn't wikiTextEn mean "wikiText English"?
>If so, then changes here should ONLY affect the English wikipedia, not "global in
>all wikipedias"?
>
My understanding is that wikiTextEn.php is going to contain all the 
default values. For the other-language versions, alternate server name, 
character set, message strings, etc. will be set in e.g. wikiTextDe.php 
or wikiTextPl.php. The theory is that anything that's *not* set in the 
language-specific file will get the default value; thus if a new feature 
is added that needs a message string $wikiFooBar, the English message 
defined in wikiTextEn.php will show up, rather than nothing, if the 
local wikiTextXx.php isn't updated.

Ultimately, wikiText.php should probably be nothing more than:
  include("wikiTextEn.php");
  include("wikiTextSomeOtherLanguage.php");

Anything additional in that file might be for site-specific data; for 
instance if somebody sets up a read-only mirror of Wikipedia, they could 
customize just that file to include their alternate server name, a title 
string that links to the live 'pedia, and a hypothetical 
refuse-all-edits option.

(Continue reading)

Jimmy Wales | 8 Feb 22:35 2002

Re: Questions about installing

Brion L. VIBBER wrote:
> But then, should it be called wikiText.php at all? Would 
> wikiSettings.php make more sense, maybe?

Yes!  Or, wikiLocalSettings.  This would be settings which override whatever may
be in the "default" package, but specific to this _site_.

For example, on Magnus's machine, the database is 'wikipedia', and the user/password
for mysql are different.  So those would all go in the wikiLocalSettings file.

> I think the wikiTextEn.php defaults should be what you're actually using 
> on the English wikipedia!

Right, especially for internationalization stuff.

What might make sense would be for us to have wikiSettings or wikiLocalSettings, and
that's where stuff goes that we are _fairly confident_ will be different on different
people's machines.

Brion L. VIBBER | 8 Feb 22:43 2002
Picon

Re: Re: [Wikipedia-l] Plans for RecentChanges page; programmers only!

Jan Hidders wrote:

>From: "Brion L. VIBBER" <brion@...>
>
>>Right. Hmm, can you use TO_DAYS(cur_timestamp) or some such? Or is that
>>just going to cause problems?
>>
>
>Unfortunately MySQL only allows column names in the GROUP BY clause.
>
D'oh! Well then, two columns it is.

-- brion vibber (brion  <at>  pobox.com)

Axel Boldt | 8 Feb 23:35 2002
Picon

Speedup suggestions

Browsing around on http://www.mysql.com/documentation/ and comparing
with the wikipedia.sql in cvs, I have the following suggestions:

1) To speed up searches, we should use a FULLTEXT index on title and
   text, and then use the match operator. That should also yield more
   relevant results.
   (http://www.mysql.com/doc/F/u/Fulltext_Search.html)

2) In special_recentchanges.php, we select with "WHERE
   cur_timestamp>$mindate", but cur_timestamp is not indexed. This
   means that mysql linearly searches through the whole cur database,
   everytime somebody views RecentChanges.

3) Assuming that php runs as an apache module, we should use
   persistent database connections. That way, we won't repeatedly send
   over the username and password; one connection is reused by apache
   even after the php script dies.
   (http://www.phpbuilder.com/manual/features.persistent-connections.php
   and http://www.phpbuilder.com/manual/function.mysql-pconnect.php)

Axel

Hr. Daniel Mikkelsen | 8 Feb 23:50 2002
Picon

Re: Speedup suggestions

> 2) In special_recentchanges.php, we select with "WHERE
>    cur_timestamp>$mindate", but cur_timestamp is not indexed. This
>    means that mysql linearly searches through the whole cur database,
>    everytime somebody views RecentChanges.

Why not simply log change-events to a separate table, and display data
from that log depending on the user's filter? Changes older than, say, two
weeks could be discarded by a cronjob.

-- Daniel

Jimmy Wales | 8 Feb 23:56 2002

Re: Speedup suggestions

Axel Boldt wrote:
> 3) Assuming that php runs as an apache module, we should use
>    persistent database connections. That way, we won't repeatedly send
>    over the username and password; one connection is reused by apache
>    even after the php script dies.
>    (http://www.phpbuilder.com/manual/features.persistent-connections.php
>    and http://www.phpbuilder.com/manual/function.mysql-pconnect.php)

After reading these two pages, it seemed that all I needed to do was change
mysql_connect in databaseFunctions.php to mysql_pconnect.

It's a little early to get *too* excited, but so far the results seem astonishing!

This is the first time I've seen the 5 minute load on the machine under 3 in days.
And it's 0.46 now.

Also, I'm getting 2.52 pages per second from my little benchmarking tool.  This is
dramatically better than the 1/2 page per second pace we've been seeing.

--Jimbo


Gmane