Jakob Voss | 7 Aug 12:16 2005
Picon

Re: Re: WSJ on Wikipedia

Cormac Lawler wrote:

> To get around these philosophical issues, I believe that the only way
> to measure quality of articles (especially contentious ones) is
> qualitatively, ie by asking people/experts their opinions of articles,
> their experience of the community etc and analysing the types of
> reactions, the emotional resonance (or lack thereof), the language
> they used etc. So far, I haven't seen many qualitative studies of
> Wikipedia - I did one last Christmas as a kind of pilot study for my
> dissertation, which you can see here:
> http://wikisource.org/wiki/A_small_scale_study_of_Wikipedia 
> and some others, from Wikimania, include:
> http://en.wikibooks.org/wiki/Wikimania05/Paper-PA1
> http://en.wikibooks.org/wiki/Wikimania05/Paper-JT1
> Note: almost all Wikimania papers are still works in progress, including mine :)

The best study I know is done by Andreas Brändle:

http://en.wikibooks.org/w/index.php?title=Wikimania05/AB1

He shows that the number of authors is the most important variable to 
predict the quality.

Jakob
Kevin Gamble | 17 Aug 18:05 2005
Picon

Wikimedia Research and Privacy

Colleagues,

A group of us had a rather interesting discussion on privacy, etc at  
Wikimania. Not wanting to lose the momentum of that discussion I took  
a stab at starting an article on a research policy statement. This is  
just a start, please feel free to have at it and thank you for your  
help!

http://meta.wikimedia.org/wiki/Wikimedia_Research_Network_Privacy_Policy

Kevin

Kevin J. Gamble. Ph.D.
Associate Director eXtension Initiative
Box 7641 NCSU
Raleigh, NC 27694-7641
v: 919.515.8447
c: 919.605.5815
AIM: k1v1n
Web: intranet.extension.org
Jeremy Dunck | 17 Aug 18:38 2005
Picon

Re: Wikimedia Research and Privacy

On 8/17/05, Kevin Gamble <kevin_gamble@...> wrote:
>This is
> just a start, please feel free to have at it and thank you for your
> help!
> 
> http://meta.wikimedia.org/wiki/Wikimedia_Research_Network_Privacy_Policy

I think it would be useful to define what sort of research this policy
is meant to address, as I'm having  hard time understanding how this
policy relates to the area I'm interested in (which is statistics as a
basis for trust as well as groundwork for further research).

For example, I have no "subjects", except for the evidence trails left
in the wp download source data.

Perhaps I should take this to the Talk page?
Cormac Lawler | 17 Aug 18:50 2005
Picon

Re: Wikimedia Research and Privacy

On 8/17/05, Jeremy Dunck <jdunck@...> wrote:
> On 8/17/05, Kevin Gamble <kevin_gamble@...> wrote:
> >This is
> > just a start, please feel free to have at it and thank you for your
> > help!
> >
> > http://meta.wikimedia.org/wiki/Wikimedia_Research_Network_Privacy_Policy
> 
> I think it would be useful to define what sort of research this policy
> is meant to address, as I'm having  hard time understanding how this
> policy relates to the area I'm interested in (which is statistics as a
> basis for trust as well as groundwork for further research).
> 
> For example, I have no "subjects", except for the evidence trails left
> in the wp download source data.
> 
> Perhaps I should take this to the Talk page?

Yes, the talk page is a good place to start - that's where i went :-)
Feel free to describe your research and aspects of it relating to this
discussion.

Cormac
Cathy Ma | 19 Aug 11:38 2005
Picon

subscription

Am I on the list already?

--
MA, P.S. Cathy
MPhil student,
Department of Sociology
The University of Hong Kong
http://cathyma.net

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@...
http://mail.wikipedia.org/mailman/listinfo/wiki-research-l
Cormac Lawler | 19 Aug 12:44 2005
Picon

Re: subscription

On 8/19/05, Cathy Ma <cathyma@...> wrote:
> Am I on the list already?
> 

Yep, welcome :-)

Cormac
Cormac Lawler | 21 Aug 16:30 2005
Picon

Database hacks

Hi,
Just something that occurs to me as I write up my dissertation - I
keep on thinking it would be nice to be able to cite some basic
figures to back up a point I am making, eg. how many times Wikipedia
is edited on a given day or how many pages link to this policy page -
as I asked in an email to the wikipedia-l list, which has mysteriously
vanished from the archives (August 11, entitled "What links here?"). I
realise these could be done by going to the recent changes or special
pages and counting them all, but I'm basically too lazy to do that -
we're talking about thousands of pages here, right? I'm also thinking
this is something that many people would be interested in finding out
and writing about. So what I'm asking is that to help researchers
generally, wouldn't it be an idea to identify some quick database
hacks that we could provide - almost like a kate's tools function? Or
are these available on the MediaWiki pages? If they are, and I've
looked at some database related pages, they're certainly not so
understandable from the perspective of someone who just wants to use
basic functions. You might be thinking of sending me to a page like
http://meta.wikimedia.org/wiki/Links_table - but *what does it mean?*
Can someone either help me out, or suggest what we could do about this
in the future?

Cheers,
Cormac
Jakob Voss | 21 Aug 17:18 2005
Picon

Re: Database hacks

Cormac Lawler wrote:

> Just something that occurs to me as I write up my dissertation - I
> keep on thinking it would be nice to be able to cite some basic
> figures to back up a point I am making, eg. how many times Wikipedia
> is edited on a given day or how many pages link to this policy page -
> as I asked in an email to the wikipedia-l list, which has mysteriously
> vanished from the archives (August 11, entitled "What links here?"). I
> realise these could be done by going to the recent changes or special
> pages and counting them all, but I'm basically too lazy to do that.

I'm doing different statistics of Wikipedia data for month. Not every 
data is available but there is *a lot* It's much more to analyse than I 
can do in my time. You can answer a lot of questions with the database 
dumps (recently changed to XML) and python mediawiki framework but that 
means you have to dig into the data models and programming.

> we're talking about thousands of pages here, right? I'm also thinking
> this is something that many people would be interested in finding out
> and writing about. So what I'm asking is that to help researchers
> generally, wouldn't it be an idea to identify some quick database
> hacks that we could provide - almost like a kate's tools function? 
 > Or are these available on the MediaWiki pages?

The only solution is to share your code and data and to frequently 
publicate results. That's how research works isn't it?. I'm very 
interested to have a special server for Wikimetrics but someone has to 
admin it (getting the hardware is not such a problem). For instance I 
could parse the version history dump to select article, user and 
timestamp only so other people can analyse which articles are edited at 
which days or vice versa but I just don't have a server to handle 
Gigabytes of data. Up to know I only managed to set up a Data Warehouse 
for Personendaten (http://wdw.sieheauch.de/) but - like most of what's 
already done - mostly undocumented :-(

 > If they are, and I've looked at some database related pages, they're
 > certainly not so
> understandable from the perspective of someone who just wants to use
> basic functions. You might be thinking of sending me to a page like
> http://meta.wikimedia.org/wiki/Links_table - but *what does it mean?*
> Can someone either help me out, or suggest what we could do about this
> in the future?

1.) collect the questions, define what exacly you want (for instance 
"number of articles edited at each day")
2.) collect ways to answer them ("extract data X from Y and calculate Z")
3.) find someone who does it

Well, it sounds like work ;-)

Greetings,
Jakob
Andrew Lih | 21 Aug 17:29 2005
Picon

Re: Database hacks

Cormac, puzzling why your Aug 11 post is missing. 

As for the task of finding what links here, you could do the low tech
hack of just sending a hand crafted URL that sends back the first 5000
links, like this query that finds out how many folks link to WP:POINT:

http://en.wikipedia.org/w/index.php?title=Special:Whatlinkshere&target=Wikipedia%3ADon%27t_disrupt_Wikipedia_to_illustrate_a_point&limit=5000&offset=0

I use 5000 since the last time I checked, the most any query will
return is 5000 for db performance reasons. If there are more than
5000, alter the "offest" number to 1, then rinse, lather and repeat.

At least that way, you're not having to hack XML, SQL, Python.

If you know some shell scripting, then you can automate this somewhat
with curl/wget to automate the fetching of these pages, then use some
combo of grep/wc to actually find out how many user page, project
pages, talk pages, etc link to policy pages.

-Andrew (User:Fuzheado)

On 8/21/05, Cormac Lawler <cormaggio@...> wrote:
> Hi,
> Just something that occurs to me as I write up my dissertation - I
> keep on thinking it would be nice to be able to cite some basic
> figures to back up a point I am making, eg. how many times Wikipedia
> is edited on a given day or how many pages link to this policy page -
> as I asked in an email to the wikipedia-l list, which has mysteriously
> vanished from the archives (August 11, entitled "What links here?"). I
> realise these could be done by going to the recent changes or special
> pages and counting them all, but I'm basically too lazy to do that -
> we're talking about thousands of pages here, right? I'm also thinking
> this is something that many people would be interested in finding out
> and writing about. So what I'm asking is that to help researchers
> generally, wouldn't it be an idea to identify some quick database
> hacks that we could provide - almost like a kate's tools function? Or
> are these available on the MediaWiki pages? If they are, and I've
> looked at some database related pages, they're certainly not so
> understandable from the perspective of someone who just wants to use
> basic functions. You might be thinking of sending me to a page like
> http://meta.wikimedia.org/wiki/Links_table - but *what does it mean?*
> Can someone either help me out, or suggest what we could do about this
> in the future?
> 
> Cheers,
> Cormac
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@...
> http://mail.wikipedia.org/mailman/listinfo/wiki-research-l
>
Cormac Lawler | 21 Aug 20:09 2005
Picon

Re: Database hacks

On 8/21/05, Jakob Voss <jakob.voss@...> wrote:
> Cormac Lawler wrote:
> 
> > Just something that occurs to me as I write up my dissertation - I
> > keep on thinking it would be nice to be able to cite some basic
> > figures to back up a point I am making, eg. how many times Wikipedia
> > is edited on a given day or how many pages link to this policy page -
> > as I asked in an email to the wikipedia-l list, which has mysteriously
> > vanished from the archives (August 11, entitled "What links here?"). I
> > realise these could be done by going to the recent changes or special
> > pages and counting them all, but I'm basically too lazy to do that.
> 
> I'm doing different statistics of Wikipedia data for month. Not every
> data is available but there is *a lot* It's much more to analyse than I
> can do in my time. You can answer a lot of questions with the database
> dumps (recently changed to XML) and python mediawiki framework but that
> means you have to dig into the data models and programming.

I'm certainly not averse to doing some work ;) and I'd happy to look
into this as long as there is some form of clear instructions for
doing it. That's primarily what I'm interested in.

> 
> > we're talking about thousands of pages here, right? I'm also thinking
> > this is something that many people would be interested in finding out
> > and writing about. So what I'm asking is that to help researchers
> > generally, wouldn't it be an idea to identify some quick database
> > hacks that we could provide - almost like a kate's tools function?
>  > Or are these available on the MediaWiki pages?
> 
> The only solution is to share your code and data and to frequently
> publicate results. That's how research works isn't it?. I'm very
> interested to have a special server for Wikimetrics but someone has to
> admin it (getting the hardware is not such a problem). For instance I
> could parse the version history dump to select article, user and
> timestamp only so other people can analyse which articles are edited at
> which days or vice versa but I just don't have a server to handle
> Gigabytes of data. Up to know I only managed to set up a Data Warehouse
> for Personendaten (http://wdw.sieheauch.de/) but - like most of what's
> already done - mostly undocumented :-(

It'd be very interesting to see details of your data and methodology -
I'm sure that's something that will be of incredible value as we move
research forward on Wikipedia. But not just as in a paper where
normally you will say "I retrieved this data from an SQL dump of the
database" and then do things with the data, what I am looking for, to
repeat, is *how you actually do this* from another researcher's point
of view.

> 
>  > If they are, and I've looked at some database related pages, they're
>  > certainly not so
> > understandable from the perspective of someone who just wants to use
> > basic functions. You might be thinking of sending me to a page like
> > http://meta.wikimedia.org/wiki/Links_table - but *what does it mean?*
> > Can someone either help me out, or suggest what we could do about this
> > in the future?
> 
> 1.) collect the questions, define what exacly you want (for instance
> "number of articles edited at each day")
> 2.) collect ways to answer them ("extract data X from Y and calculate Z")
> 3.) find someone who does it
> 
> Well, it sounds like work ;-)

1,2 and 3 should either be written up on m:Wikimedia Research Network
or a sub page of m:Research. As for the ongoing work actually on this
area, I'll be taking a quantitative research module as part of my
latest masters and I'll happily intertwine any project we deem
fitting/necessary with my project for that module. Just have to finish
off my current masters first, which means that my wikiworkload has to
be put on hold for about two weeks.

> 
> Greetings,
> Jakob

Thanks,
Cormac

Gmane