Evgeniy Gabrilovich | 5 Dec 2007 00:34
Picon
Favicon

[Call for papers] Wikipedia and AI: An Evolving Synergy

                                          AAAI 2008 Workshop
                         WIKIPEDIA AND ARTIFICIAL INTELLIGENCE: AN EVOLVING
SYNERGY
                                     http://lit.csci.unt.edu/~wikiai08

                                           CALL FOR PAPERS

OVERVIEW

Since its inception less than seven years ago, Wikipedia has become one of the
largest and fastest 
growing online sources of encyclopedic knowledge. One of the reasons why
Wikipedia is 
appealing to contributors and users alike is the richness of its embedded
structural information: 
articles are hyperlinked to each other and connected to categories from an ever
expanding 
taxonomy; pervasive language phenomena such as synonymy and polysemy are
addressed 
through redirection and disambiguation pages; entities of the same type are
described in a 
consistent format using infoboxes; related articles are grouped together in
series templates.

As a large-scale repository of structured knowledge, Wikipedia has become a
valuable resource 
for a diverse set of Artificial Intelligence (AI) applications. Major
conferences in natural language 
processing and machine learning have recently witnessed a significant number of
approaches that 
(Continue reading)

Dominik Flejter | 5 Dec 2007 14:11
Picon
Favicon

CfP: 2nd Workshop on Social Aspects of the Web (SAW 2008); deadline: 12 Jan 08

===========================================================================

           2nd Workshop on Social Aspects of the Web  (SAW 2008)
                            in conjunction with
 11th International Conference on Business Information Systems (BIS 2008)

                            Innsbruck, Austria
                            May 5, 6 or 7, 2008

         http://bis.kie.ae.poznan.pl/11th_bis/wscfp.php?ws=saw2008

===========================================================================

                Deadline for submissions: January 12, 2008

===========================================================================

In recent years, the Web has moved from a simple one-way communication
channel extending traditional media, to a complex "peer-to-peer"
communication space with a blurred author/audience distinction and new
ways to create, share and use knowledge in a social way. 
This change of paradigm is currently profoundly transforming most areas of
our life: our interactions with other people, our relationships, ways of
gathering information, ways of developing social norms, opinions,
attitudes and even legal aspects as well as ways of working and doing
business. 
It also raises a strong need for theoretical, empirical and applied
studies related to how people may interact on the Web, how they actually
do so and what new possibilities and challenges are emerging in the
social, business and technology dimensions. 
(Continue reading)

Brianna Laugher | 7 Dec 2007 13:32
Picon
Gravatar

Fwd: [Icommons] Open Knowledge (OKCon) 2008: LSE, London, 15th March 2008

May be of interest.
cheers,
Brianna

---------- Forwarded message ----------
From: Rufus Pollock <rufus.pollock@...>
Date: 7 Dec 2007 23:08
Subject: [Icommons] Open Knowledge (OKCon) 2008: LSE, London, 15th March 2008
To: "icommons@..." <icommons@...>

* OKCon 2008 - 'Open Knowledge: Applications, Tools and Services'
* where: London School of Economics, London, UK
* when: 15th March 2008 (1030-1830)
* www: <http://www.okfn.org/okcon/>
* register: <http://www.okfn.org/okcon/register/>
* last year: <http://www.okfn.org/okcon/2007/>
* wiki: <http://www.okfn.org/wiki/okcon2008/>

Following on from the success of our inaugural conference last year,
we're pleased to announce that the second Open Knowledge conference
(OKCon) will take place on Saturday 15th March 2008.

The event will bring together individuals and groups from across the
open knowledge spectrum for a day of seminars and workshops around the
theme of 'Applications, Tools and Services'. Three main sessions will
focus on 'Transport and Environment', 'Visualization and Analysis' and
'Education and Academia'. In addition there will be an 'Open Space'
suitable for presentations and demos of general open knowledge related
work.

(Continue reading)

Luca de Alfaro | 19 Dec 2007 22:36
Favicon

Re: Wikipedia colored according to trust


Dear All,

we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole English Wikipedia, as of its February 6, 2007 snapshot, colored according to text trust.
This is the first time that even we can look at how the "trust coloring" looks on the whole of the Wikipedia!
We would be very interested in feedback (the wikiquality-l-RusutVdil2icGmH+5r0DM0B+6BGkLq7r@public.gmane.org mailing list is the best place).

If you find bugs, you can email us at http://groups.google.com/group/wiki-trust

Happy Holidays!

Luca

PS: yes, we know, some images look off.  It is currently fairly difficult for a site outside of the Wikipedia to fetch Wikipedia images correctly. 

PPS: there are going to be a few planned power outages on our campus in the next days, so if the demo is off, try again later.



_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@...
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Desilets, Alain | 20 Dec 2007 15:12
Picon

Re: Wikipedia colored according to trust

Here is my feedback based on looking at a few pages on topics that I know very well.

 

Agile Software Development

·        http://wiki-trust.cse.ucsc.edu/index.php/Agile_software_development

·        Not bad. I counted 13 highlighted items, 5 of which I would say are questionable.

 

Usability

·        http://wiki-trust.cse.ucsc.edu/index.php/Usability

·        Not as good. 14 highlighted items 3 of which I would say are questionable.

 

Open Source Software

·        http://wiki-trust.cse.ucsc.edu/index.php/Open_source_software

·        Not so good either. 23 highlighted items, 3 of which I would say are questionable.

 

This is a very small sample, but it’s all I have time to do. It will be interesting to see how other people rate the precision of the highlightings on a wider set of topics. Based on these three examples, it’s not entirely clear to me that this system would help me identify questionable items in topics that I am not so familiar with.

 

Are you planning to do a larger scale evaluation with human judges? An issue in that kind of study is to avoid favourable or disfavourable bias on the part of the judges. Also, you have to make sure that your algorithm is doing better than random guessing (in other words, there may be so many questionable phrases in a wiki page that random guessing would be bound to guess right ounce out of every say, 5 times). One way to avoid these issues would be to produce pages where half of the highlightings are produced by your system, and the other half are highlighting a randomly selected contiguous contribution by a single author.

 

I think this is really interesting work worth doing, btw. I just don’t know how useful it is in its current state.

 

Cheers,

 

Alain Désilets

 

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@...
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Ward Cunningham | 20 Dec 2007 17:00
Favicon
Gravatar

Re: Wikipedia colored according to trust

Alain -- Is it true that although you've seen 3x to 15x false positive, that you did not see any false negatives? By false negative I would mean a questionable item that was not highlighted. Maybe you weren't looking for these? Best regards. -- Ward

__________________
Ward Cunningham
503-432-5682





On Dec 20, 2007, at 6:12 AM, Desilets, Alain wrote:

Here is my feedback based on looking at a few pages on topics that I know very well.

 

Agile Software Development

·        Not bad. I counted 13 highlighted items, 5 of which I would say are questionable.

 

Usability

·        Not as good. 14 highlighted items 3 of which I would say are questionable.

 

Open Source Software

·        Not so good either. 23 highlighted items, 3 of which I would say are questionable.

 

This is a very small sample, but it’s all I have time to do. It will be interesting to see how other people rate the precision of the highlightings on a wider set of topics. Based on these three examples, it’s not entirely clear to me that this system would help me identify questionable items in topics that I am not so familiar with.

 

Are you planning to do a larger scale evaluation with human judges? An issue in that kind of study is to avoid favourable or disfavourable bias on the part of the judges. Also, you have to make sure that your algorithm is doing better than random guessing (in other words, there may be so many questionable phrases in a wiki page that random guessing would be bound to guess right ounce out of every say, 5 times). One way to avoid these issues would be to produce pages where half of the highlightings are produced by your system, and the other half are highlighting a randomly selected contiguous contribution by a single author.

 

I think this is really interesting work worth doing, btw. I just don’t know how useful it is in its current state.

 

Cheers,

 

Alain Désilets

 

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l-RusutVdil2icGmH+5r0DM0B+6BGkLq7r@public.gmane.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@...
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Luca de Alfaro | 20 Dec 2007 17:04
Favicon

Re: Wikipedia colored according to trust

You are evaluating the coloring against a performance criterion that is not the one we designed it for.

Our coloring gives orange color to new information that has been added by low-reputation authors.  New information by high-reputation authors is light orange.  As the information is revised, it gains trust.

Thus, our coloring answers the question, intuitively: has this information been revised already?  Have reputable authors looked at it?
 
You are asking the question: how much information colored orange is questionable?
This is a different question, and we will never be able to do well, for the simple reason that it is well known that a lot of the correct factual information on Wikipedia comes from occasional contributors, including anonymous authors, and those occasional contributors and anonymous will have low reputation in most conceivable reputation systems.

We do not plan to do any large-scale human study.  For one, we don't have the resources.  For another, in the very limited tests we did, the notion of "questionable" was so subjective that our data contained a HUGE amount of noise.  We asked to rank edits as -1 (bad), 0 (neutral), +1 (good).  The probability that two of us agreed was somewhere below 60%.  We decided this was not a good way to go.

The results of our data-driven evaluation on a random sample of 1000 articles with at least 200 revisions each showed that (quoting from our paper):

  • Recall of deletions. We consider the recall of low-trust as a predictor for deletions. We show that text in the lowest 50% of trust values constitutes only 3.4% of the text of articles, yet corresponds to 66% of the  text that is deleted from one revision to the next.
  • Precision of deletions.  We consider the precision of low-trust as a predictor for deletions. We show that text that is in the bottom half of trust values has a probability of 33% of being deleted in the very next revision, in  contrast with the 1.9% probability for general text.  The deletion probability raises to 62% for text in the bottom 20%  of trust values.

  • Trust of average vs. deleted text. We consider the trust distribution of all text, compared to the trust distribution to the text that is deleted.  We show that 90% of the text overall had trust at least 76%, while the average trust for deleted text was 33%.
  • Trust as a predictor of lifespan.  We select words uniformly at random, and we consider the statistical correlation between the trust of the word at the moment of sampling, and the future lifespan of the word.  We show that words with the highest trust have an expected future lifespan that is 4.5 times longer than words with no trust.  We remark that this is a proper test, since the trust at the time of sampling depends only on the history of the word prior to sampling.
Luca

On Dec 20, 2007 6:12 AM, Desilets, Alain <Alain.Desilets-GPT7cTdnlGT+SYiAP49vUg@public.gmane.org> wrote:

Here is my feedback based on looking at a few pages on topics that I know very well.

 

Agile Software Development

·        http://wiki-trust.cse.ucsc.edu/index.php/Agile_software_development

·        Not bad. I counted 13 highlighted items, 5 of which I would say are questionable.

 

Usability

·        http://wiki-trust.cse.ucsc.edu/index.php/Usability

·        Not as good. 14 highlighted items 3 of which I would say are questionable.

 

Open Source Software

·        http://wiki-trust.cse.ucsc.edu/index.php/Open_source_software

·        Not so good either. 23 highlighted items, 3 of which I would say are questionable.

 

This is a very small sample, but it's all I have time to do. It will be interesting to see how other people rate the precision of the highlightings on a wider set of topics. Based on these three examples, it's not entirely clear to me that this system would help me identify questionable items in topics that I am not so familiar with.

 

Are you planning to do a larger scale evaluation with human judges? An issue in that kind of study is to avoid favourable or disfavourable bias on the part of the judges. Also, you have to make sure that your algorithm is doing better than random guessing (in other words, there may be so many questionable phrases in a wiki page that random guessing would be bound to guess right ounce out of every say, 5 times). One way to avoid these issues would be to produce pages where half of the highlightings are produced by your system, and the other half are highlighting a randomly selected contiguous contribution by a single author.

 

I think this is really interesting work worth doing, btw. I just don't know how useful it is in its current state.

 

Cheers,

 

Alain Désilets

 


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@...
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Desilets, Alain | 20 Dec 2007 17:06
Picon

Re: Wikipedia colored according to trust

xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

Good point.

 

I did not look for false negatives because of lack of time. That would have required me to read the whole content of the pages, and look for items that I thought were questionable eventhough they weren’t highlighted by the system.

 

I agree with you that false negatives may be just as important as false positives. It’s just more work to evaluate that metric.

 

Alain

 

 

From: wiki-research-l-bounces-RusutVdil2icGmH+5r0DM0B+6BGkLq7r@public.gmane.org [mailto:wiki-research-l-bounces-RusutVdil2icGmH+5r0DM0B+6BGkLq7r@public.gmane.org] On Behalf Of Ward Cunningham
Sent: December 20, 2007 11:01 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Wikipedia colored according to trust

 

Alain -- Is it true that although you've seen 3x to 15x false positive, that you did not see any false negatives? By false negative I would mean a questionable item that was not highlighted. Maybe you weren't looking for these? Best regards. -- Ward


__________________

Ward Cunningham

503-432-5682

 

 



 

On Dec 20, 2007, at 6:12 AM, Desilets, Alain wrote:



Here is my feedback based on looking at a few pages on topics that I know very well.

 

Agile Software Development

·        Not bad. I counted 13 highlighted items, 5 of which I would say are questionable.

 

Usability

·        Not as good. 14 highlighted items 3 of which I would say are questionable.

 

Open Source Software

·        Not so good either. 23 highlighted items, 3 of which I would say are questionable.

 

This is a very small sample, but it’s all I have time to do. It will be interesting to see how other people rate the precision of the highlightings on a wider set of topics. Based on these three examples, it’s not entirely clear to me that this system would help me identify questionable items in topics that I am not so familiar with.

 

Are you planning to do a larger scale evaluation with human judges? An issue in that kind of study is to avoid favourable or disfavourable bias on the part of the judges. Also, you have to make sure that your algorithm is doing better than random guessing (in other words, there may be so many questionable phrases in a wiki page that random guessing would be bound to guess right ounce out of every say, 5 times). One way to avoid these issues would be to produce pages where half of the highlightings are produced by your system, and the other half are highlighting a randomly selected contiguous contribution by a single author.

 

I think this is really interesting work worth doing, btw. I just don’t know how useful it is in its current state.

 

Cheers,

 

Alain Désilets

 

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l <at> lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@...
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Brian | 20 Dec 2007 17:07
Picon
Favicon

Re: Wikipedia colored according to trust

It seems that what Ward and others are getting at is that it would be useful to have precision and recall measures for Luca's trust metric. Of course, the metric can't possibly know it when a brand new user contributes unusually high quality text to the encyclopedia. Nonetheless, it seems that a tool such as Amazon's Mechanical Turk could allow us to easily measure how often false positives and false negatives occur using random sampling. Although your hammer was not designed for their nail, I imagine it would do quite well.


On Dec 20, 2007 9:04 AM, Luca de Alfaro <luca-Emgmuai4XwL2fBVCVOL8/A@public.gmane.org> wrote:
You are evaluating the coloring against a performance criterion that is not the one we designed it for.

Our coloring gives orange color to new information that has been added by low-reputation authors.  New information by high-reputation authors is light orange.  As the information is revised, it gains trust.

Thus, our coloring answers the question, intuitively: has this information been revised already?  Have reputable authors looked at it?
 
You are asking the question: how much information colored orange is questionable?
This is a different question, and we will never be able to do well, for the simple reason that it is well known that a lot of the correct factual information on Wikipedia comes from occasional contributors, including anonymous authors, and those occasional contributors and anonymous will have low reputation in most conceivable reputation systems.

We do not plan to do any large-scale human study.  For one, we don't have the resources.  For another, in the very limited tests we did, the notion of "questionable" was so subjective that our data contained a HUGE amount of noise.  We asked to rank edits as -1 (bad), 0 (neutral), +1 (good).  The probability that two of us agreed was somewhere below 60%.  We decided this was not a good way to go.

The results of our data-driven evaluation on a random sample of 1000 articles with at least 200 revisions each showed that (quoting from our paper):
  • Recall of deletions. We consider the recall of low-trust as a predictor for deletions. We show that text in the lowest 50% of trust values constitutes only 3.4% of the text of articles, yet corresponds to 66% of the  text that is deleted from one revision to the next.
  • Precision of deletions.  We consider the precision of low-trust as a predictor for deletions. We show that text that is in the bottom half of trust values has a probability of 33% of being deleted in the very next revision, in  contrast with the 1.9% probability for general text.  The deletion probability raises to 62% for text in the bottom 20%  of trust values.

  • Trust of average vs. deleted text. We consider the trust distribution of all text, compared to the trust distribution to the text that is deleted.  We show that 90% of the text overall had trust at least 76%, while the average trust for deleted text was 33%.
  • Trust as a predictor of lifespan.  We select words uniformly at random, and we consider the statistical correlation between the trust of the word at the moment of sampling, and the future lifespan of the word.  We show that words with the highest trust have an expected future lifespan that is 4.5 times longer than words with no trust.  We remark that this is a proper test, since the trust at the time of sampling depends only on the history of the word prior to sampling.
Luca


On Dec 20, 2007 6:12 AM, Desilets, Alain <Alain.Desilets-GPT7cTdnlGT+SYiAP49vUg@public.gmane.org> wrote:

Here is my feedback based on looking at a few pages on topics that I know very well.

 

Agile Software Development

·        http://wiki-trust.cse.ucsc.edu/index.php/Agile_software_development

·        Not bad. I counted 13 highlighted items, 5 of which I would say are questionable.

 

Usability

·        http://wiki-trust.cse.ucsc.edu/index.php/Usability

·        Not as good. 14 highlighted items 3 of which I would say are questionable.

 

Open Source Software

·        http://wiki-trust.cse.ucsc.edu/index.php/Open_source_software

·        Not so good either. 23 highlighted items, 3 of which I would say are questionable.

 

This is a very small sample, but it's all I have time to do. It will be interesting to see how other people rate the precision of the highlightings on a wider set of topics. Based on these three examples, it's not entirely clear to me that this system would help me identify questionable items in topics that I am not so familiar with.

 

Are you planning to do a larger scale evaluation with human judges? An issue in that kind of study is to avoid favourable or disfavourable bias on the part of the judges. Also, you have to make sure that your algorithm is doing better than random guessing (in other words, there may be so many questionable phrases in a wiki page that random guessing would be bound to guess right ounce out of every say, 5 times). One way to avoid these issues would be to produce pages where half of the highlightings are produced by your system, and the other half are highlighting a randomly selected contiguous contribution by a single author.

 

I think this is really interesting work worth doing, btw. I just don't know how useful it is in its current state.

 

Cheers,

 

Alain Désilets

 



_______________________________________________
Wiki-research-l mailing list
Wiki-research-l-RusutVdil2icGmH+5r0DM0B+6BGkLq7r@public.gmane.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@...
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Desilets, Alain | 20 Dec 2007 17:30
Picon

Re: Wikipedia colored according to trust

> You are evaluating the coloring against a performance criterion that is not the one we designed it for.
>
> Our coloring gives orange color to new information that has been added by low-reputation authors.  New
information by 
> high-reputation authors is light orange.  As the information is revised, it gains trust. 
>
> Thus, our coloring answers the question, intuitively: has this information been revised already? 
Have reputable 
> authors looked at it? 
> 
> You are asking the question: how much information colored orange is questionable? 
> This is a different question, and we will never be able to do well, for the simple reason that it is well known
that 
> a lot of the correct factual information on Wikipedia comes from occasional contributors, including
anonymous 
> authors, and those occasional contributors and anonymous will have low reputation in most conceivable
reputation 
> systems. 

Before I go further, let me reiterate that I think your work is excellent and has the potential for adding
huge value to the wiki world. If I didn't think so, I wouldn't bother writing this message.

I think it's important to evaluate a system like this in terms of a metric that captures some sort of value
added to some category of wiki end user.

The system you are trying to build could provide HUGE value for the end user, if it could allow him to tell with
a certain amount of certainty (say, > 60%) which parts of the system are questionable and which parts are
not. This is the metric I used in my admittedly very small test (Note: I'm sure it's not the only metric that
could be used to measure end-user value).

Based on that very preliminary test, it seems your system does not do a great job at that, and you seem to say
that you don't think it could. 

That's OK. I'm sure there is SOMETHING that this system can do for the end user, because he "internal"
performance metrics you list in your message seem to indicate that there is some substance to the
predictions of the algorithm.

> We do not plan to do any large-scale human study.  For one, we don't have the resources.  

A study with human judges does not have to be large scale. I would guess 30 subjects would do the trick. 

> For another, in the very 
> limited tests we did, the notion of "questionable" was so subjective that our data contained a HUGE amount
of 
> noise.  We asked to rank edits as -1 (bad), 0 (neutral), +1 (good).  The probability that two of us agreed
was 
> somewhere below 60%.  We decided this was not a good way to go. 

That's interesting. I would have expected a large amount of agreement based on my assumption that the
majority of edits are either clearly Good or Neutral. In other words, I would have expected judges to
disagree only on the "iffy" portion of the edits, but since I assume that this is a small portion of all
edits, you would still have large agreement. I guess my assumptions are wrong.

Is the story the same if you look at only two categories: Reject (= your {-1} set) and Keep (your {0, +1} set)?

> The results of our data-driven evaluation on a random sample of 1000 articles with at least 200 revisions
each showed 
> that (quoting from our paper): 
> * Recall of deletions. We consider the recall of low-trust as a predictor for deletions. We show that text
in the 
> lowest 50% of trust values constitutes only 3.4% of the text of articles, yet corresponds to 66% of the 
text that is 
> deleted from one revision to the next.
> * Precision of deletions.  We consider the precision of low-trust as a predictor for deletions. We show
that text 
> that is in the bottom half of trust values has a probability of 33% of being deleted in the very next
revision, 
> in  contrast with the 1.9% probability for general text.  The deletion probability raises to 62% for
text in the 
> bottom 20%  of trust values.
> * Trust of average vs. deleted text. We consider the trust distribution of all text, compared to the trust 
> distribution to the text that is deleted.  We show that 90% of the text overall had trust at least 76%,
while the 
> average trust for deleted text was 33%.
> * Trust as a predictor of lifespan.  We select words uniformly at random, and we consider the statistical
correlation 
> between the trust of the word at the moment of sampling, and the future lifespan of the word.  We show that
words 
> with the highest trust have an expected future lifespan that is 4.5 times longer than words with no
trust.  We remark 
> that this is a proper test, since the trust at the time of sampling depends only on the history of the word
prior to 
> sampling.

Those measures tell me that there is definitely something to the algorithm, and I am trying to help you
define what value it could provide to the which kind of end user.

One concern I have though. Have you compared your system to a naïve implementation which simply uses the
edit's "age" as a measure of its trustworthiness? In other words, don't worry about who created the edit or
modified it. Just worry about how long it's been there. 

Alain

Gmane