Erik Moeller | 1 Feb 2003 02:15
Picon
Picon

Optimizing and tuning

There are two very good IRC channels on irc.freenode.net:

#php and #mysql

There are lots of knowledgeable people there who are eager to help open  
source projects, many have already heard of Wikipedia. I learned a lot  
from just being there for half an hour or so.

Here are some things we need to look into:

1) Composite indexes and index use in general. I do not know which indexes  
are currently used on the live Wikipedia. However, after the *default*  
database generation script, there is just a single composite index, namely  
in the table watchlist. All other indexes are on a single field. If I  
understand correctly, this means that when we do a query with multiple  
comparisons, e.g. WHERE foo=bar AND baz=bat, only one index is used. At  
least, that is what this article claims:

http://www.linux-mag.com/cgi-bin/printer.pl?issue=2001-06&article=mysql

There are also a couple of tables with no indexes (including ARCHIVE,  
which may cause Special:Undelete to create high server load) and some  
unexplainable ones (site_stats has an index, but only one row). We really  
need to clean up our indexes. I can't help much with this without server  
access because I don't know if the table structures have been altered.

2) The HANDLER command. HANDLER, which works with both InnoDB and MyISAM,
provides low-level access to tables *without locking*. You get reads, but  
you don't get consistency, which may cause problems when you're dealing  
with critical, heavily modified data. But it seems perfect for our archive  
(Continue reading)

Erik Moeller | 1 Feb 2003 02:22
Picon
Picon

Re: Optimizing and tuning

Correction:

> - We could ask one of our resident C programmers to help. There's a
> specified interface to access persistent resources in MySQL

Resources in PHP, not MySQL. The MySQL pconnect functions are an example  
that uses this interface.

Regards,

Erik
Brion Vibber | 1 Feb 2003 02:42
Picon
Favicon
Gravatar

Re: Optimizing and tuning

On ven, 2003-01-31 at 17:15, Erik Moeller wrote:
> 1) Composite indexes and index use in general.... If I  
> understand correctly, this means that when we do a query with multiple  
> comparisons, e.g. WHERE foo=bar AND baz=bat, only one index is used.

Might want to check into that, yes...

> There are also a couple of tables with no indexes (including ARCHIVE,  
> which may cause Special:Undelete to create high server load) and some  
> unexplainable ones (site_stats has an index, but only one row).

Archive is very rarely used. But, it probably should have an index stuck
on the namespace & title fields at least.

>  We really  
> need to clean up our indexes. I can't help much with this without server  
> access because I don't know if the table structures have been altered.

They should all be the same...

> 2) The HANDLER command. HANDLER, which works with both InnoDB and MyISAM,
> provides low-level access to tables *without locking*. You get reads, but  
> you don't get consistency, which may cause problems when you're dealing  
> with critical, heavily modified data.

Now you're scaring me. :)

> 3) Upgrading to MySQL4. MySQL4 supports query caching (also subqueries,  
> but I haven't looked into that), which means that heavily queried pages  
> will load faster. When someone mentioned query caching, a lot of other  
(Continue reading)

Daniel Mayer | 1 Feb 2003 03:27
Picon
Favicon

Re: "View user page" edit links on anon talk pages

On Friday 31 January 2003 05:41 pm, Anthere wrote:
> > it up, and I don't think it should be used for
> > personal essays, sorry,  
> > Anthere). Using subpages on Meta might also help for
> > organization.
>
> Ah ? Well, I disagree. Of course permanent deletions
> of personal essays can only occur after a consensus is
> reached about that, no ?

I agree with Anthere and strongly disagree with eliminating personal essays 
from meta. If POV material isn't allowed on meta then where should it go? 
This will only make it more difficult to keep this stuff out of the 
encyclopedias.

Meta can and should be many things. Simply create an alternate Main Page for 
whatever you want to focus on (software for example) and organize everything 
on that page and the pages linked from it in any way you wish. Heck even 
create another namespace if you really want to organize things, but I see no 
reason whatsoever why meta shouldn't be more like a regular wiki with a 
fairly undefined scope. What really is needed is more integration between 
topics discussed on the mailing lists and meta. 

--mav
Erik Moeller | 1 Feb 2003 12:39
Picon
Picon

Re: Re: "View user page" edit links on anon talk pages

> I agree with Anthere and strongly disagree with eliminating personal essays
> from meta. If POV material isn't allowed on meta then where should it go?

Um .. how about just getting rid of it? Why is it within Wikipedia's  
mission to somehow provide storage space for personal essays? We're an  
encyclopedia, not a hosting provider.

Taku is correct in that this only makes Meta hard to use, especially for  
other people who want to help working on the Wikipedia software. While it  
is possible to better organize meta, the Recent Changes list is cluttered  
by this stuff. There are literally hundreds of entries like this:

...
# diff) (hist) . . MN Meta-symbiosis; 15:25 . . Saprtacus
# (diff) (hist) . . M User talk:Saprtacus; 15:43 . . Saprtacus
# (diff) (hist) . . M User talk:Saprtacus; 15:39 . . Saprtacus
# (diff) (hist) . . M User:Saprtacus; 15:38 . . Saprtacus
# (diff) (hist) . . MN Meta-etiology; 15:33 . . Saprtacus
# (diff) (hist) . . M User talk:Saprtacus; 15:31 . . Saprtacus
...

Now try to find the critical "How Wikipedia can be really, really fast"  
development proposal hidden deep within this idiosyncratic nonsense.

Regards,

Erik
Jan Hidders | 1 Feb 2003 12:55
Picon

Re: Optimizing and tuning

On Sat, Feb 01, 2003 at 02:15:00AM +0100, Erik Moeller wrote:
> There are two very good IRC channels on irc.freenode.net:
> 
> #php and #mysql
> 
> There are lots of knowledgeable people there who are eager to help open  
> source projects, many have already heard of Wikipedia. I learned a lot  
> from just being there for half an hour or so.
> 
> Here are some things we need to look into:
> 
> 1) Composite indexes and index use in general. I do not know which indexes  
> are currently used on the live Wikipedia. However, after the *default*  
> database generation script, there is just a single composite index, namely  
> in the table watchlist. All other indexes are on a single field. If I  
> understand correctly, this means that when we do a query with multiple  
> comparisons, e.g. WHERE foo=bar AND baz=bat, only one index is used.

That is correct, and you could be right about this (I don't know what the
current database schema looks exactly like, but last time I was involved
this was still ok.) But keep in mind that a composite index also adds a lot
of overhead for updates, takes up more room in the index cache, and doesn't
really add much speed if the extra columns don't have a high selectivity,
i.e., filter out many unwanted records.

> 2) The HANDLER command. HANDLER, which works with both InnoDB and MyISAM,
> provides low-level access to tables *without locking*. You get reads, but  
> you don't get consistency, which may cause problems when you're dealing  
> with critical, heavily modified data. But it seems perfect for our archive  
> stuff.
(Continue reading)

Erik Moeller | 1 Feb 2003 13:14
Picon
Picon

Re: Optimizing and tuning

> That is correct, and you could be right about this (I don't know what the
> current database schema looks exactly like, but last time I was involved
> this was still ok.) But keep in mind that a composite index also adds a lot
> of overhead for updates, takes up more room in the index cache, and doesn't
> really add much speed if the extra columns don't have a high selectivity,
> i.e., filter out many unwanted records.

True. However, I suspect with some of our queries, our existing single- 
field index only has very low selectivity, which makes stuff like "Most  
wanted" and "History" so slow. Not using any composite indexes seems like  
a bad idea to me, especially given the fact that some of our queries are  
run on huge random-sequence tables.

>> 2) The HANDLER command. HANDLER, which works with both InnoDB and MyISAM,
>> provides low-level access to tables *without locking*. You get reads, but
>> you don't get consistency, which may cause problems when you're dealing
>> with critical, heavily modified data. But it seems perfect for our archive
>> stuff.

> If there are not many updates then this accomplishes next to nothing, except
> that you have made the software more complicated.

HANDLER works almost exactly the same as SELECT from a syntax POV. We have  
many updates on the OLD table, and it is still in MyISAM, meaning table- 
level locking. So unless we upgrade that particular table to InnoDB (which  
Brion seems reluctant to do because of the long downtime), HANDLER may be  
worth giving a try.

However, your point regarding complexity is well taken. We could abstract  
the HANDLER function in wfQuery using a "consistent read" parameter, by  
(Continue reading)

Magnus Manske | 1 Feb 2003 14:02
Picon
Favicon

Re: Spin-off software (was Re: "View user page" edit links on anon talk pages)

Erik Moeller wrote:

>No objections - just a few more "if($wgSomeGlobalVar)"s I guess. But  
>perhaps you could post a more detailed proposal of what you want to change  
>and how.
>
I didn't work the details myself yet - actually, I'm still fighting with 
my latest Linux installation ;-)

>Also, I still think that a well-integrated, generalized group  
>certification system makes more sense than a separate filter project.
>
We could do both in the same code, then set up both as different 
projects, and wait what happens. Nobody expected what happened to 
Nupedia and Wikipedia.

Magnus
Jan Hidders | 1 Feb 2003 15:09
Picon

Re: Optimizing and tuning

On Sat, Feb 01, 2003 at 01:14:00PM +0100, Erik Moeller wrote:
> > That is correct, and you could be right about this (I don't know what
> > the current database schema looks exactly like, but last time I was
> > involved this was still ok.) But keep in mind that a composite index
> > also adds a lot of overhead for updates, takes up more room in the index
> > cache, and doesn't really add much speed if the extra columns don't have
> > a high selectivity, i.e., filter out many unwanted records.
> 
> True. However, I suspect with some of our queries, our existing single- 
> field index only has very low selectivity, which makes stuff like "Most  
> wanted" and "History" so slow.

Adding composite indexes won't help there. What you need is an extra
redundant table that contains the number of wanted links.

> >> 2) The HANDLER command. HANDLER, which works with both InnoDB and MyISAM,
> >> provides low-level access to tables *without locking*. You get reads, but
> >> you don't get consistency, which may cause problems when you're dealing
> >> with critical, heavily modified data. But it seems perfect for our archive
> >> stuff.
> 
> > If there are not many updates then this accomplishes next to nothing, except
> > that you have made the software more complicated.
> 
> HANDLER works almost exactly the same as SELECT from a syntax POV.

That's only part of the problem. If anyone wants to mess with this they have
to understand what it does and why. That doesn't go away if you have similar
syntaxes.

(Continue reading)

Daniel Mayer | 1 Feb 2003 15:29
Picon
Favicon

What to do with Meta?

On Saturday 01 February 2003 04:00 am, Erik Moeller wrote:
> Um .. how about just getting rid of it? Why is it within Wikipedia's  
> mission to somehow provide storage space for personal essays? We're an  
> encyclopedia, not a hosting provider.

Let me repeat:
 If POV material isn't allowed on meta then where should it go?

Would it be better for Saprtacus to still be on en.wiki's Recent Changes? 
Again, create another namespace and then you can filter your view of meta's 
Recent Changes so would only see changes made to entries in those namespaces.

--mav

Gmane