Jeffrey V. Merkey | 1 Apr 07:23

Extensive Link Errors related to Proper Names - Needs Fixing


I have been compiling a machine compiled lexicon created from link and 
disambiguation pages from the XML dumps.  Oddly, the associations 
contained in [[ARTICLE_NAME | NAME]] form a comprehesive "real time" 
thesauraus of common associations used by current English Speakers in 
Wikipedia, and perhaps comprise the worlds largest and most comprehesive 
Thesaurus on the planet emedded within the mesh of these links within 
the dumps.  

While going through the dumps and constructing associative link maps of 
all these expressions, I have noticed a serious issue with embdded 
linking with proper names.  It appears there may be a robot running 
somewhere that is associating Proper Names listed in articles about 
relationships between people
by linking blindly to any entry in Wikipedia that matches a name in an 
article.

Some of the content may create controversy to post examples here, so I 
will complete the thesaurus compilation, and folks should go through the 
encyclopedia.  Articles about movies stars and other "gossipy" type 
articles seem to have the highest errors linking proper names to 
unrelated people without proper disambiguation pages.  It could be 
interpreted as violations of WP:BLP and some of the error linkages could 
be troublesome for the foundation.

Whomever is running bots that link between articles should look at 
proper name links based on categories and check into this.  I found a 
large number of these types of errors.  They are subtle, but will most 
probably show up when browsing through articles unless you can analyze 
the link targets and relationships in the dumps.
(Continue reading)

Rob Church | 1 Apr 06:39
Picon

Re: Extensive Link Errors related to Proper Names - Needs Fixing

On 01/04/07, Jeffrey V. Merkey <jmerkey@...> wrote:
> Whomever is running bots that link between articles should look at
> proper name links based on categories and check into this.  I found a
> large number of these types of errors.  They are subtle, but will most
> probably show up when browsing through articles unless you can analyze
> the link targets and relationships in the dumps.

If I understand the problem, and I'm not sure if I do, then the advice
I have is the operators of a lot of such bots won't be subscribed to
this list; try wikipedia-l.

Rob Church
Mets501 | 1 Apr 06:45
Picon

Re: Extensive Link Errors related to Proper Names - Needs Fixing

> Whomever is running bots that link between articles should look at 
> proper name links based on categories and check into this.  I found a 
> large number of these types of errors.  They are subtle, but will most 
> probably show up when browsing through articles unless you can analyze 
> the link targets and relationships in the dumps.
>
> Jeff

Try http://en.wikipedia.org/wiki/Wikipedia:Bot_owners%27_noticeboard.
--Mets501
Mohamed Magdy | 1 Apr 06:53
Picon

Re: Extensive Link Errors related to Proper Names - Needs Fixing

Mets501 wrote:
>> Whomever is running bots that link between articles should look at 
>> proper name links based on categories and check into this.  I found a 
>> large number of these types of errors.  They are subtle, but will most 
>> probably show up when browsing through articles unless you can analyze 
>> the link targets and relationships in the dumps.
>>
>> Jeff
> 
> Try http://en.wikipedia.org/wiki/Wikipedia:Bot_owners%27_noticeboard.
> --Mets501
> 
> 
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@...
> http://lists.wikimedia.org/mailman/listinfo/wikitech-l
> 

AFAIK links to another articles are done using human editors NOT bots..
Jeffrey V. Merkey | 1 Apr 09:40

Re: Extensive Link Errors related to Proper Names - Needs Fixing

Mohamed Magdy wrote:

>Mets501 wrote:
>  
>
>>>Whomever is running bots that link between articles should look at 
>>>proper name links based on categories and check into this.  I found a 
>>>large number of these types of errors.  They are subtle, but will most 
>>>probably show up when browsing through articles unless you can analyze 
>>>the link targets and relationships in the dumps.
>>>
>>>Jeff
>>>      
>>>
>>Try http://en.wikipedia.org/wiki/Wikipedia:Bot_owners%27_noticeboard.
>>--Mets501
>>
>>    
>>
>AFAIK links to another articles are done using human editors NOT bots..
>
>
>  
>
I try to not post to the English Wikipedia or its mailing lists. The 
problem was reported on Foundation-l with a courtesty notice to
the developers who control the dumps of the problem.

Jeff

(Continue reading)

Mark Williamson | 1 Apr 08:11
Picon

Re: Extensive Link Errors related to Proper Names - Needs Fixing

I think the issue here is that someone linking to [[Thomas Figby]]
might not realize that the article at [[Thomas Figby]] is not about
the man who is currently on trial for the murder of his wife, but
rather Thomas Figby, the movie star, and instead he is looking for
[[Thomas Lawrence Figby]].

I don't think this has anything to do with bots.

Mark

On 31/03/07, Mohamed Magdy <mohamed.m.k@...> wrote:
> Mets501 wrote:
> >> Whomever is running bots that link between articles should look at
> >> proper name links based on categories and check into this.  I found a
> >> large number of these types of errors.  They are subtle, but will most
> >> probably show up when browsing through articles unless you can analyze
> >> the link targets and relationships in the dumps.
> >>
> >> Jeff
> >
> > Try http://en.wikipedia.org/wiki/Wikipedia:Bot_owners%27_noticeboard.
> > --Mets501
> >
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l@...
> > http://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
(Continue reading)

brion | 1 Apr 10:15
Picon
Favicon
Gravatar

MediaWiki automated test run failure 2007-04-01

An automated run of parserTests.php showed the following failures:

This is MediaWiki version 1.10alpha (r20876).

Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...

  17 still FAILING test(s) :(
      * URL-encoding in URL functions (single parameter)  [Has never passed]
      * URL-encoding in URL functions (multiple parameters)  [Has never passed]
      * Table security: embedded pipes
(http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)  [Has never passed]
      * Link containing double-single-quotes '' (bug 4598)  [Has never passed]
      * message transform: <noinclude> in transcluded template (bug 4926)  [Has never passed]
      * message transform: <onlyinclude> in transcluded template (bug 4926)  [Has never passed]
      * BUG 1887, part 2: A <math> with a thumbnail- math enabled  [Has never passed]
      * HTML bullet list, unclosed tags (bug 5497)  [Has never passed]
      * HTML ordered list, unclosed tags (bug 5497)  [Has never passed]
      * HTML nested bullet list, open tags (bug 5497)  [Has never passed]
      * HTML nested ordered list, open tags (bug 5497)  [Has never passed]
      * Inline HTML vs wiki block nesting  [Has never passed]
      * Mixing markup for italics and bold  [Has never passed]
      * dt/dd/dl test  [Has never passed]
      * Images with the "|" character in the comment  [Has never passed]
      * Parents of subpages, two levels up, without trailing slash or name.  [Has never passed]
      * Parents of subpages, two levels up, with lots of extra trailing slashes.  [Has never passed]

Passed 494 of 511 tests (96.67%)... 17 tests failed!
(Continue reading)

Tim Starling | 1 Apr 10:30
Picon

Re: Article hit rates -- research at the University of Minnesota

Reid Priedhorsky wrote:
> Tim Starling wrote:
>> Reid Priedhorsky wrote:
>>> Dear Wikitechnicians,
>>>
>>> My name is Reid Priedhorsky, and I'm a Ph.D. student at GroupLens 
>>> Research, which is the human-computer interaction group at the 
>>> University of Minnesota.
>>>
>>> We are currently working on some research which is investigating 
>>> Wikipedia contribution and vandalism. To this end, statistics on the 
>>> view rate of different articles would be extremely helpful to us -- 
>>> something along the lines of Leon Weber's WikiCharts tool, but with a 
>>> larger limit (ideally all 1.7 million articles).
>> Producing such statistics will be a Google Summer of Code project this
>> summer. If you can't wait that long, then we can give you a sampled,
>> anonymised log stream to analyse.
> 
> Yes, summer would be too late: anonymised logs would be be excellent for 
> our purposes. Does "stream" mean that we would need to write a program 
> to listen to the real-time log stream, or could you give us files?
> 
> Gregory Maxwell wrote:
>  > Greetings, describe for me what you ideal data would look like.
> 
> Ideal data would be log files that just looked like:
> 
>    Main Page\t1169499304.066
> 
> i.e., article titles as they appear in the XML dumps and request time.
(Continue reading)

Rob Church | 1 Apr 18:19
Picon

Re: Extensive Link Errors related to Proper Names - Needs Fixing

On 01/04/07, Jeffrey V. Merkey <jmerkey@...> wrote:
> I try to not post to the English Wikipedia or its mailing lists. The
> problem was reported on Foundation-l with a courtesty notice to
> the developers who control the dumps of the problem.

We control the dumps, we don't control the content of the dumps. The
issue is, in fact, nothing to do with us.

Rob Church
Jeffrey V. Merkey | 1 Apr 20:50

Re: Extensive Link Errors related to Proper Names - Needs Fixing

Rob Church wrote:

>On 01/04/07, Jeffrey V. Merkey <jmerkey@...> wrote:
>  
>
>>I try to not post to the English Wikipedia or its mailing lists. The
>>problem was reported on Foundation-l with a courtesty notice to
>>the developers who control the dumps of the problem.
>>    
>>
>
>We control the dumps, we don't control the content of the dumps. The
>issue is, in fact, nothing to do with us.
>
>  
>
The origianl message was in two parts. The second part seems to be the 
contentious part.
I tend to agree with you that the developers are unconcerned with human 
errors in the dumps
(which has now been verified as a non-Bot non-MediaWiki issue which was 
unconfirmed before).

The first part of the message discusses a machine created thesaurus 
based upon these links which I will post as an XML
dump when th program is completed. That part may be of interest moving 
forward as it would enable a built in
thesaurus for MediaWiki. The wikitrans uses this thesaurus created from 
within the dumps. Could have a lot
of applications for translators. I have found it very useful.
(Continue reading)


Gmane