A. Stos <alex2cf <at> gmail.com>
2007-05-10 22:12:59 GMT
What can numbers tell us about CZ after the launch?
Here are some selected categories, as of 14th April (first time I did the 'accountancy') and 8th May.
Mainspace text pages = 2071 -> 2616
Number of redirects = 1173 -> 1504
Disambig pages = 22 -> 30
Articles (=textpages-disambig) = 2049 -> 2586
----
CZ_Live = 1387 -> 1757
Checklisted_Articles = 1238 -> 1900
Internal_Articles = 876 -> 1398
External_Articles = 361 -> 503
Stub_Articles = 223 -> 305
Developing_Articles = 383 -> 624
Developed_Articles = 262 -> 457
Approved_Articles = 11 -> 17
Remarks
*
In view of the number of Checklisted Articles, the Big Cleanup is a big
success. For example, the difference between Internal Articles and
CZ-Live category decreases (they have more or less the same scope).
Still, there is some Cleanup to do. And, since we are growing, there
will always be
Now, we have quite many Internal Articles, more
Developed Articles than stubs(!), not so many External ones. IMHO,
ideally, the absolute number of the Externals should be more or less
constant or even decrease with time (it still grows, while the
proportion External/Checklisted slightly decreased).
* For a relatively long time there were 11 Approved articles. In last three weeks this has significantly changed.
How about the human resources we have?
Here is a picture (to be shown correctly a fixed width font is needed)
year and|this month| users |new | authors| backward | longterm
month | users | > 20 : >100 |authors| daily | 2 months | 6 months
-----------------------------------------------------------------------------
2006-10 | 52 | 8 : 2 | 52 | 10 | 0 0.0% | 0 0.0%
2006-11 | 183 | 41 : 12 | 143 | 24 | 0
0.0% | 0 0.0%
2006-12 | 120 | 28 : 12 | 49 | 17 | 29 24.2% | 0 0.0%
2007-01 | 350 | 43 : 15 | 280 | 29 | 43 12.3% | 0 0.0%
2007-02 | 801 | 85 : 25 | 678 | 62 | 47 5.9% | 0 0.0%
2007-03 | 348 | 76 : 23 | 208 | 38 | 62
17.8% | 0 0.0%
2007-04 | 395 | 111 : 45 | 180 | 63 | 77 19.5% | 19 4.8%
description:
* this month users: number of users who have at least 1 edit in the
month; in the next column there are 'active' ('very active') users with
more than 20 (100) edits in the month
*authors daily = average number of interacting authors (editing the same day)
*new user is detected when he makes his first edit (so the number may differ from that of new userpages)
*backward = how many of users editing in month n had been there in months n-1 and n-2
*longterm = backward 6 months, without break
Remarks:
* February was particular as for number of users.
This can be related to slashdot reports and the automatic
registration. Clearly, many of automatically registered users were
just watching CZ, as the proportion of active users of February is
about 10% while in April it's 28%.
*
"authors daily" column seems to be interesting, as it measures somehow
the 'human resources' and how 'vibrating' the community is. Note that
this is not the number of all users of this month divided by number
of days; it's the actual (average) number of editing users. For
example, as of April, you could meet about 63 authors each day. We can
also observe that after the launch you could meet as many CZ authors in
one day as in February (or in the self-registration period).
* Columns "backward" and "longterm" are not very meaningful for CZ
at this stage, but maybe at some point they will be. They are meant to
measure the 'human resources rotation' and 'stability' of the wiki.
In terms of number of edits, after the launch CZ was significantly more active (see e.g. double activity in the mainspace)
month | total edits | main* | act* |new pages
-----------------------------------------------------
2006-10 | 1218 | 948 | 23.4 | 131
2006-11 | 5576 | 2968 | 30.5 | 1183
2006-12 | 8291 | 5929 | 69.1 | 1726
2007-01 | 8819 | 4428 | 25.2 | 1545
2007-02 | 16560 | 5654 |
20.7 | 4276
2007-03 | 15526 | 6369 | 44.6 | 3159
2007-04 | 26914 | 13333 | 68.1 | 3846
* main = number of edits in the mainspace
* act = mean 'activity',
i.e. edits per user
Remarks
* In April there was nearly 900 edits per day, on average (all namespaces)
* new pages include redirects and pages from all namespaces. This
explains somewhat why there were so many "new pages" in February
(think about new userpages and talks only).
* a few timestamps in the database are prior to 2006-10 (hence adjusted to 2006-10).
Can we reasonably compare our 'human resources' and editing activity to
the English Wikipedia? Of course not. But maybe it is interesting to
see that CZ is comparable to some smaller, still big and active wikis.
To this end suppose that we count only the registered users. This
seems reasonable, since 'annonymous' IP, while numerous when compared
to the registered users, do not make many edits (globally 8-15%,
depending on the wiki). Consider also that rarely the same IP makes
more than few edits. So 'annons' do not add that much to the
community, just bring some usually unreliable/unsourced information, if
not vandalisms, to be verified by regular users (e.g. adding those anons who are are
"more-than-20-edits-active" wouldn't really change the picture).
Then we compare CZ to lt.wikipedia (Lithuanian), considering that the latter is listed on the English WP Main Page as one of
more significant, i.e. in the category "more than 25K entries" (about
44K, in fact).
As of April 2007, CZ had a bit more editing users (395 vs 338), active
users (111 vs. 93), more very active users (45 vs. 43, "bots" included), new users (180 vs.
142), users daily (63 vs 53). Interestingly, users of ltwiki are
exceptionally active (systematically more than 100 edits in month per
user, far above the average for wikis I analyzed; a typical activity
is 50-60 edits per user). In April there were 5378 new pages on
ltwiki (vs 3846 on CZ).
If
we look at wikis from the category "more than 50K entries" (11
members), then CZ is still of the same "order of magnitude" but
generally much smaller. For example, on huwiki (Hungarian, probably the
smallest in the category, 58K entries) there were 1078 users editing
in April, 291 active users, 143 very acitve, 444 new users, 173 users
daily (on average) and about 9800 new pages.
And here are the numbers for nowiki (Norwegian) from "more than
100K entries" category. Unfortunately, it comes from March 2007, since
there were no dump file for April available. There were 2439 editing
users, 343 active, 152 very active users, 1259 new users, 249 users
daily, 12626 new pages.
That said, I do not think that making CZ-WP comparison in
terms of quantity is very relevant at least at the present stage. Clearly, CZ encourages different working style and priorities that do not easily translate into stats or make "counters" turn more slowly (e.g. accent on quality/reliability, narrative/introductory style, not creating stubs without clear intent to develop them etc.). I guess CZ will always be different. Now, I made a quantitative comparison just to test the hypothesis (or not-so-rare suggestion) that CZ is likely to ' re-create the failure of Nupedia'. Clearly, the numbers tell us that CZ will (probably) succeed.
Alex.
PS. Any technical details (data sources, scripts, more detailed discussion of methods and results etc.) on request.