David Gerard | 1 Jul 2008 11:35
Picon
Gravatar

TorBlock active on en:wp?

http://en.wikipedia.org/wiki/Special:Version

I see TorBlock is listed as active. I thought this was waiting for
clear local consensus before switching on.

- d.
Tim Starling | 1 Jul 2008 13:54
Picon

Re: TorBlock active on en:wp?

David Gerard wrote:
> http://en.wikipedia.org/wiki/Special:Version
> 
> I see TorBlock is listed as active. I thought this was waiting for
> clear local consensus before switching on.

I said I disabled the admin block override feature, not the whole extension.

-- Tim Starling
Nicolas Dumazet | 1 Jul 2008 16:10
Picon

Schema change : category redirects

Hello !

Following the latest thread on this [0] (and the older [1]) about
implementation details for my GsoC project, aiming at adding category
redirects / moves, I would need more input on the DB schema change
that was being discussed, knowing that :

* We expect category moves to be reversible
* The only "feature" we are in fact adding is category redirects
* I am considering this schema change as a good opportunity to fix bug
#13579 [Categorylinks table should use category ID rather than
category name in cl_to field] (please comment overthere if you see
specific issues for that change)

While thinking about that schema change, I considered the following
use cases, considering categories A B and C :
1)Move existing A to empty B.
2)A and B contain pages  : Redirect A to B.
3)A redirects to B : make A redirect to C
4)A redirects to B : undo the redirect, make A and B plain categories
5)A redirects to B : invert the redirect, make B redirect to A
6)On page edits, alter the category_links table
7)Listing pages that belong to a specific category

Use case #1 is fairly easy to implement if cl_to points to a cat_id :
you only have to rename the cat_title of the corresponding category
table row, leaving the cat_id unchanged, and that's in fact the reason
for switching cl_to' type

== Original idea : add a cl_final field ==
(Continue reading)

Simetrical | 1 Jul 2008 17:15
Picon

Re: Schema change : category redirects

On Tue, Jul 1, 2008 at 10:10 AM, Nicolas Dumazet <nicdumz@...> wrote:
> When A redirects to B, all A' category_links entries will share the
> same cl_final field. Being quite unexperienced, and very naive, I may
> miss something important here... but I think that it makes much more
> sense to add that shared value to the category table, instead of
> duplicating it times the number of pages included in the category.
>
> I'm saying that instead of adding a cl_final field to the
> category_links table, we should perhaps add a cat_final field to the
> category table pointing to the final category it belongs to (see [4]
> ).
>
> * Use cases #2 #3 and #4 :
> Trivial. Change  cat_final in one category table row.
>
> * Use case #5 :
> Damn. Tricky. Alter TWO category table rows :p
>
> * Use case #6 :
> Easy, fetch the corresponding cat_id to fill cl_to.
>
> * Use case #7 :
> The most expensive operation for this proposal. You have to join
> category_links and category (hopefully on cat_id) to retrieve what was
> cl_final in the first proposition, and select ... where cat_final =

The important reason for favoring this way of doing it is that use
case 7 is almost certainly the most common one.  It's rare that
categories will need to be moved or redirected, which is basically
what cases 1-5 are.  Case 6 is moderately common but fast for both
(Continue reading)

Roan Kattouw | 1 Jul 2008 17:22
Picon
Favicon

Re: Schema change : category redirects

Nicolas Dumazet schreef:
> == Original idea : add a cl_final field ==
> <snip>
>
> * Use case #6 :
> When adding a category to a page, you have to fetch its corresponding
> cat_id for cl_to, (fairly easy), but you also have to fetch the right
> cl_final.
> You have to know first, if the category is a redirect. And if I'm
> right, with that schema, the only ways to tell this actually, are to
> retrieve the corresponding page_is_redirect in the page table, or to
> check for an entry in the redirect table.
> I believe that this forces us to compute a page-redirect join on
> page_id + a redirect-category join on page_title for each category
> title (see [3] ). If there's no results for that query, (can be caused
> if {1} the category page does not exist, {2} the category page exist
> but is not a redirect), the category is not a redirect, and else it
> returns us the cat_id for cl_final.
>   
You wouldn't need to go through the page table here if you added a 
cat_page field to the category table. It's not a big deal, though, 
because joining on page_namespace=14 AND page_title=cat_title is pretty 
cheap.
> * Use case #7 :
> Easy. SELECT ... FROM category_links WHERE cl_final = ##
>
> == But what about adding a cat_final field instead ? ==
>
> When A redirects to B, all A' category_links entries will share the
> same cl_final field. Being quite unexperienced, and very naive, I may
(Continue reading)

Roan Kattouw | 1 Jul 2008 17:31
Picon
Favicon

Re: Schema change : category redirects

Simetrical schreef:
> Moreover, case 7 is much worse than you think it is, for this
> proposal.  For large categories retrieved in sorted order, we must use
> an index for sorting.  That requires that the entire result set be
> ordered according to the index.  Currently we have an index on (cl_to,
> cl_sortkey, cl_from).  Then the query SELECT ... WHERE cl_from = X
> ORDER BY cl_sortkey will be able to retrieve in index order: ascending
> cl_to, followed by ascending cl_sortkey in the event of a tie,
> followed by ascending cl_from in the event of another tie.  (So really
> it's like "ORDER BY cl_to, cl_sortkey, cl_from", but cl_to is constant
> and ordering by it does nothing, while ordering by cl_from is
> incidental and so we don't specify it in the query.)
>
> If we add cl_final, then we'll put an index on (cl_final, cl_sortkey)
> or similar (possibly dropping an existing index and/or with other
> stuff on the end).  Then the query will be WHERE cl_final = X ORDER BY
> cl_sortkey, which will use the index.
>
> On the other hand, with cat_final, we can't use an index for sorting.
> The query will be WHERE cl_to=cat_id AND cat_final=X ORDER BY
> cl_sortkey.  There are two possible retrieval orders here: retrieve
> from categorylinks, then category, or vice versa.  Retrieving from
> category first will get us some unknown number of rows, and then we
> would have to join to categorylinks using an index on (cl_to).  But
> even if that index is actually (cl_to, cl_sortkey), we're going to be
> retrieving in cl_to order, and order by cl_sortkey only in the case of
> a tie.
>
> In this case, unlike in the previous one, we have multiple cl_to
> values, so our ORDER BY cl_sortkey is *not* the same as ORDER BY
(Continue reading)

Simetrical | 1 Jul 2008 17:37
Picon

Re: Schema change : category redirects

On Tue, Jul 1, 2008 at 11:22 AM, Roan Kattouw <roan.kattouw@...> wrote:
> You wouldn't need to go through the page table here if you added a
> cat_page field to the category table.

Categories are not guaranteed to have associated pages, so we must
keep cat_title.  In that case cat_page is redundant and denormalized,
and should only be added if there's some specific performance benefit
to it, which there's not for any application I've heard.

> Can be done on cat_title=cl_to too, doesn't really matter. Joining on
> cat_id is cleaner of course, but I don't think it'll be any faster.

Joining on integers is considerably faster than joining on VARCHARs.
In InnoDB, joining on primary keys is considerably faster than joining
on anything else.  These shouldn't be neglected, in general.  However,
the speed of the join is not the limiting factor here, the problem is
you'll have to filesort the entire category.
Roan Kattouw | 1 Jul 2008 18:02
Picon
Favicon

Re: Schema change : category redirects

Simetrical schreef:
> On Tue, Jul 1, 2008 at 11:22 AM, Roan Kattouw <roan.kattouw@...> wrote:
>   
>> You wouldn't need to go through the page table here if you added a
>> cat_page field to the category table.
>>     
>
> Categories are not guaranteed to have associated pages, so we must
> keep cat_title.  In that case cat_page is redundant and denormalized,
> and should only be added if there's some specific performance benefit
> to it, which there's not for any application I've heard.
>   
I wasn't suggesting ditching cat_title. The fact that cat_page is 
sometimes 0 doesn't really matter, since categories without 
corresponding pages can't be redirects, so they don't have a redirect 
target (which is what the query was about). I believe there actually is 
a performance benefit to introducing cat_page, explained below.
>   
>> Can be done on cat_title=cl_to too, doesn't really matter. Joining on
>> cat_id is cleaner of course, but I don't think it'll be any faster.
>>     
>
> Joining on integers is considerably faster than joining on VARCHARs.
> In InnoDB, joining on primary keys is considerably faster than joining
> on anything else.  These shouldn't be neglected, in general.
I had the idea joining on integers would be faster, I guess I 
underestimated how much. I didn't know about the primary key thing, but 
it makes sense (it's called *primary* key for a reason).
>   However,
> the speed of the join is not the limiting factor here, the problem is
(Continue reading)

Simetrical | 1 Jul 2008 18:11
Picon

Re: Schema change : category redirects

On Tue, Jul 1, 2008 at 12:02 PM, Roan Kattouw <roan.kattouw@...> wrote:
> I had the idea joining on integers would be faster, I guess I
> underestimated how much. I didn't know about the primary key thing, but
> it makes sense (it's called *primary* key for a reason).

The primary key benefits are specific to InnoDB, since it clusters the
table data in the primary key (basically the table data is in the
leaves of the primary key B-tree, if I understand right).  A primary
key lookup is therefore one B-tree lookup instead of two.  In MyISAM,
and in many other DBMSes, the primary key isn't special.

> With the schema I was backing, yes. However, schema #1 doesn't eliminate
> the need to check for a category's redirect target when adding a page to
> it, and since joining on cat_page=page_id is faster than joining on
> cat_title=page_title (because of the int vs. varchar and primary vs.
> non-primary issues), that would constitute a "specific performance
> benefit" for adding cat_page, wouldn't it?

Not a big enough one.  Adding an extra column means every row is that
much larger, reducing key buffer efficiency and thereby hurting
performance slightly for all queries on the table.  And if you're
going to join on it you also may need an extra index (depending on
join direction), which takes time to maintain on every insert and
delete and also competes for the cache.  Plus you get the headache of
denormalization.

In this case it's almost certainly not worth it.  In the case of
cl_final, where you're avoiding cripplingly large filesorts, it's
definitely worth it, because otherwise the feature is completely
untenable.
(Continue reading)

Gerard Meijssen | 1 Jul 2008 19:39
Picon

Implementing the Babel extension

Hoi.
The Babel templates are widely used on the Wikimedia Foundation's wikis.
Implementing them is a lot of work; you need more then 1000 templates just
to cover the languages that the Wikimedia Foundation supports in its
projects. Several Wikis have templates to support additional languages. For
the bigger projects this is no longer an issue as the templates have already
been created, but for many of the smaller projects getting the Babel
information implemented is a lot of work. It would be great if the time
could be saved to do something that is really useful like writing articles.

At Betawiki we have been working hard to create a Babel extension. The great
news of an extension is, that there is no need to do anything but implement
the extension. We currently think that the software is at a state where we
would like to invite the last comments leading to the implementation on all
the WMF wikis.

Thanks,
     Gerard

Gmane