Marcin Cieslak | 3 Feb 02:01 2011
X-Face

Humans vs bots: 0:1

(cross posted to wikitech-l)

Hello, 

Recently we have found a case of incorrect interwiki entry linking [[:pl:Aktywizm]]
describing philosophical concept with family of the articles on other Wikipedias
regarding political activism.

This was a human mistake and it was reverted later. However, there seems
to be no way to tell all the interwiki bots running to stop re-adding this
removed link to articles. 

I have added {{nobots}} on the Polish and English Wikipedia to prevent
spreading of incorrect linking. I have tried from time to time to use
my own pywikipedia bot to remove the link (using -localright option)
and the result is the following:

https://secure.wikimedia.org/wikipedia/ro/wiki/Activism?action=history

(Saperka is my script) 

I spoke with the owner of WikitanvirBot and we have stopped it for
a while completely to allow the "undo" script run and remove the links.
Unfortunately, some other bot (https://secure.wikimedia.org/wikipedia/ro/w/index.php?title=Activism&diff=5071856&oldid=5071793)
changed the link again.

It seems like it is not possible to revert one mistake unless
ALL running interwiki bots will be stopped by their owners.
Sounds like an overkill (and imagine coordinating this!).

(Continue reading)

John | 3 Feb 02:10 2011
Picon

Re: [Pywikipedia-l] Humans vs bots: 0:1

Yeah, all you need to do is remove the incorrect links from all affected
articles.

You sorta did that with -localright. however that just fixed the correct
article but still left some articles pointing to the wrong article. you need
to fix every article, as long as one page has the wrong link it will be
propagated back

On Wed, Feb 2, 2011 at 8:01 PM, Marcin Cieslak <saper <at> saper.info> wrote:

> (cross posted to wikitech-l)
>
> Hello,
>
> Recently we have found a case of incorrect interwiki entry linking
> [[:pl:Aktywizm]]
> describing philosophical concept with family of the articles on other
> Wikipedias
> regarding political activism.
>
> This was a human mistake and it was reverted later. However, there seems
> to be no way to tell all the interwiki bots running to stop re-adding this
> removed link to articles.
>
> I have added {{nobots}} on the Polish and English Wikipedia to prevent
> spreading of incorrect linking. I have tried from time to time to use
> my own pywikipedia bot to remove the link (using -localright option)
> and the result is the following:
>
> https://secure.wikimedia.org/wikipedia/ro/wiki/Activism?action=history
(Continue reading)

John | 3 Feb 03:00 2011
Picon

Re: Humans vs bots: 0:1

The issues should be fixed, if you continue to have issues let me know

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Marcin Cieslak | 3 Feb 03:23 2011

Re: [Pywikipedia-l] Humans vs bots: 0:1

Zawartość nagłówka ["Followup-To:" gmane.comp.python.pywikipediabot.general.]
>> John <phoenixoverride <at> gmail.com> wrote:
> Yeah, all you need to do is remove the incorrect links from all affected
> articles.
>
> You sorta did that with -localright. however that just fixed the correct
> article but still left some articles pointing to the wrong article. you need
> to fix every article, as long as one page has the wrong link it will be
> propagated back

On 22nd somebody readded those links manually because article was reconstructed.

But earlier, after this revert:

https://secure.wikimedia.org/wikipedia/pl/w/index.php?title=Aktywizm&diff=25027755&oldid=25014331

I tried to use pywiki interwiki.py to remove 'pl' by making sure [[en:Activism]]
has no pl: link (enforced by {{nobots}}), [[pl:Aktywizm]] has no interwiki links
(enforced by {{nobots}}) and running interwiki.py -localright poiting to en:Activism
to remove it on all other wikis. But this got reverted again in few minutes
by the bots running. 

Sure, after this:

https://secure.wikimedia.org/wikipedia/pl/w/index.php?title=Aktywizm&diff=25078310&oldid=25028009

you can't reproduce this anymore, but this is pretty strange.

//Saper

_______________________________________________
Wikitech-l mailing list
Wikitech-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Andre Engels | 3 Feb 09:44 2011
Picon

Re: Humans vs bots: 0:1

The first issue is indeed that the wrong interwiki has to be removed
on _all_ languages to stop it from returning, but even with that one
could still get into problems because there might be bots that visited
some languages _before_ your removal, and others _after_ it. They
would then consider the wrong interwiki to be a missing one on the
languages visited afterward, and re-add them there.

Working with {{nobots}} as you have done is not a good solution, I
think. Adding it on the Polish page could be justified, but on the
English one it also stops a good amount of correct edits.

This particular issue I have now resolved by finding that there is a
Dutch page on the same subject as the Polish one, and adding an
interwiki to that one. This way, even if someone mistakenly adds the
incorrect link again, for the bots this will lead to an interwiki
conflict, so they will not automatically propagate the wrong link any
more.

--

-- 
André Engels, andreengels <at> gmail.com

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Marcin Cieslak | 3 Feb 11:54 2011

Re: Humans vs bots: 0:1

Zawartość nagłówka ["Followup-To:" gmane.comp.python.pywikipediabot.general.]
>> Andre Engels <andreengels <at> gmail.com> wrote:

(reordered)

> This particular issue I have now resolved by finding that there is a
> Dutch page on the same subject as the Polish one, and adding an
> interwiki to that one. This way, even if someone mistakenly adds the
> incorrect link again, for the bots this will lead to an interwiki
> conflict, so they will not automatically propagate the wrong link any
> more.

Thank you. I was looking for it but I think I didn't look at nlwiki.

I have cut out the part on the political activism to another article
that can now be linked to the rest.

> Working with {{nobots}} as you have done is not a good solution, I
> think. Adding it on the Polish page could be justified, but on the
> English one it also stops a good amount of correct edits.

I fully agree, this was just a temporary relief to keep enwiki page
clean as a source.

> The first issue is indeed that the wrong interwiki has to be removed
> on _all_ languages to stop it from returning, but even with that one
> could still get into problems because there might be bots that visited
> some languages _before_ your removal, and others _after_ it. They
> would then consider the wrong interwiki to be a missing one on the
> languages visited afterward, and re-add them there.

This is what happened.

And I find this disturbing. I just couldn't stop the army of robots
doing what they wanted. Maybe we should think of template like
{{thinkagain}} to make interwiki.py bots flush their caches? Not sure
on this as well.

//Saper

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
info | 3 Feb 15:49 2011
Picon

Re: Humans vs bots: 0:1

 > > The first issue is indeed that the wrong interwiki has to be removed
> > on _all_ languages to stop it from returning, but even with that one
> > could still get into problems because there might be bots that visited
> > some languages _before_ your removal, and others _after_ it. They
> > would then consider the wrong interwiki to be a missing one on the
> > languages visited afterward, and re-add them there.
> 

commenting out interwiki links like
<!-- [[en:Wrong iw link]] -->
prevents bots bringing the wrong link back

Greetings
xqt
Jan Dudík | 4 Feb 09:24 2011
Picon

Re: Humans vs bots: 0:1

You can check, which pages in all languages have interwiki link to certain page:
http://toolserver.org/~merl/reverselanglinks/
(case sensitive, en:link vs. en:Link)

JAnD

2011/2/3  <info <at> gno.de>:
>  > > The first issue is indeed that the wrong interwiki has to be removed
>> > on _all_ languages to stop it from returning, but even with that one
>> > could still get into problems because there might be bots that visited
>> > some languages _before_ your removal, and others _after_ it. They
>> > would then consider the wrong interwiki to be a missing one on the
>> > languages visited afterward, and re-add them there.
>>
>
> commenting out interwiki links like
> <!-- [[en:Wrong iw link]] -->
> prevents bots bringing the wrong link back
>
> Greetings
> xqt
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>

--

-- 
--
Ing. Jan Dudík

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Dr. Trigon | 4 Feb 15:47 2011
Picon

ID: 3108403

Hello xqt!
Hello all!

Sorry I was not able to attach a file to [1] or change 'Assigned' or
something else. This is the reason why I post this patch here.
Could you please commit it? This would be nice... :)

Thanks a lot and Greetings
Dr. Trigon

[1]
https://sourceforge.net/tracker/?func=detail&aid=3108403&group_id=93107&atid=603141
Index: clean_sandbox.py
===================================================================
--- clean_sandbox.py	(Revision 40)
+++ clean_sandbox.py	(Arbeitskopie)
 <at>  <at>  -21,7 +21,11  <at>  <at> 
 # (C) Andre Engels, 2007
 # (C) Siebrand Mazeland, 2007
 # (C) xqt, 2009
+# (C) Dr. Trigon, 2011
 #
+# DrTrigonBot: http://de.wikipedia.org/wiki/Benutzer:DrTrigonBot
+# Clean User Sandbox Robot (clean_user_sandbox.py)
+#
 # Distributed under the terms of the MIT license.
 #
 __version__ = '$Id: clean_sandbox.py 8564 2010-09-15 17:16:53Z xqt $'
 <at>  <at>  -148,24 +152,37  <at>  <at> 
             wait = False
             now = time.strftime("%d %b %Y %H:%M:%S (UTC)", time.gmtime())
             localSandboxTitle = pywikibot.translate(mySite, sandboxTitle)
+            IsUserSandbox = hasattr(self, '_user_list')  # DrTrigonBot (Clean User Sandbox Robot)
+            if IsUserSandbox:
+                localSandboxTitle = [localSandboxTitle % user.name() for user in self._user_list]
             if type(localSandboxTitle) is list:
                 titles = localSandboxTitle
             else:
                 titles = [localSandboxTitle,]
             for title in titles:
                 sandboxPage = pywikibot.Page(mySite, title)
+                pywikibot.output(u'Preparing to process sandbox page %s' % sandboxPage.title(asLink=True))
                 try:
                     text = sandboxPage.get()
                     translatedContent = pywikibot.translate(mySite, content)
                     translatedMsg = pywikibot.translate(mySite, msg)
                     subst = 'subst:' in translatedContent
+                    pos = text.find(translatedContent.strip())
                     if text.strip() == translatedContent.strip():
                         pywikibot.output(u'The sandbox is still clean, no change necessary.')
                     elif subst and sandboxPage.userName() == mySite.loggedInAs():
                         pywikibot.output(u'The sandbox might be clean, no change necessary.')
-                    elif text.find(translatedContent.strip()) <> 0 and not subst:
-                        sandboxPage.put(translatedContent, translatedMsg)
-                        pywikibot.output(u'Standard content was changed, sandbox cleaned.')
+                    elif pos <> 0 and not subst:
+                        if IsUserSandbox:
+                            endpos = pos + len(translatedContent.strip())
+                            if (pos < 0) or (endpos == len(text)):
+                                pywikibot.output(u'The user sandbox is still clean or not set up, no change necessary.')
+                            else:
+                                sandboxPage.put(text[:endpos], translatedMsg)
+                                pywikibot.output(u'Standard content was changed, user sandbox cleaned.')
+                        else:
+                            sandboxPage.put(translatedContent, translatedMsg)
+                            pywikibot.output(u'Standard content was changed, sandbox cleaned.')
                     else:
                         diff = minutesDiff(sandboxPage.editTime(), time.strftime("%Y%m%d%H%M%S", time.gmtime()))
                         if pywikibot.verbose:
 <at>  <at>  -179,6 +196,9  <at>  <at> 
                             wait = True
                 except pywikibot.EditConflict:
                     pywikibot.output(u'*** Loading again because of edit conflict.\n')
+                except pywikibot.NoPage:
+                    pywikibot.output(u'*** The sandbox is not existent, skipping.')
+                    continue
             if self.no_repeat:
                 pywikibot.output(u'\nDone.')
                 return
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Dr. Trigon | 5 Feb 00:00 2011
Picon

Re: ID: 3108403

Hello xqt!

Here is the new (may be final? ;) patch for clean_sandbox.py.

By running e.g.:

python clean_sandbox.py -userlist:Benutzer:DrTrigonBot/Diene_Mir\!

You can clean all sandboxes for users linked on
'Benutzer:DrTrigonBot/Diene_Mir!'. They are checked for the same
template as the global sandbox, but this can easily be changed
by sub-classing as in (new) clear_user_sandbox.py showed.

(https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/clean_user_sandbox.py?hb=true)

Greetings

Am 04.02.2011 15:47, schrieb Dr. Trigon:
> Hello xqt!
> Hello all!
> 
> Sorry I was not able to attach a file to [1] or change 'Assigned' or
> something else. This is the reason why I post this patch here.
> Could you please commit it? This would be nice... :)
> 
> Thanks a lot and Greetings
> Dr. Trigon
> 
> [1]
> https://sourceforge.net/tracker/?func=detail&aid=3108403&group_id=93107&atid=603141
> 
> 
> 
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Index: clean_sandbox.py
===================================================================
--- clean_sandbox.py	(Revision 8916)
+++ clean_sandbox.py	(Arbeitskopie)
 <at>  <at>  -14,6 +14,17  <at>  <at> 
                    hours and limits it between 5 and 15 minutes.
                    The minimum delay time is 5 minutes.

+    -userlist      Use this parameter to run the script in the user name-
+                   space.
+                   > ATTENTION: on most wiki THIS IS FORBIDEN FOR BOTS ! <
+                   > (please talk with your admin first)                 <
+                   Since it is considered bad style to edit user page with-
+                   out permission, you have to pass a page containing a
+                   list of user to process. Argument e.g. is given as
+                   "-userlist:Benutzer:DrTrigonBot/Diene_Mir\!".
+                   Please be also aware that the rules when to clean the
+                   user sandbox differ from those for project sandbox.
+
 """
 #
 # (C) Leonardo Gregianin, 2006
 <at>  <at>  -21,7 +32,12  <at>  <at> 
 # (C) Andre Engels, 2007
 # (C) Siebrand Mazeland, 2007
 # (C) xqt, 2009
+# (C) Dr. Trigon, 2011
 #
+# DrTrigonBot: http://de.wikipedia.org/wiki/Benutzer:DrTrigonBot
+# Clean User Sandbox Robot (clean_user_sandbox.py)
+# https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/clean_user_sandbox.py?hb=true
+#
 # Distributed under the terms of the MIT license.
 #
 __version__ = '$Id$'
 <at>  <at>  -120,13 +136,18  <at>  <at> 
     }

 class SandboxBot:
-    def __init__(self, hours, no_repeat, delay):
+    def __init__(self, hours, no_repeat, delay, userlist):
         self.hours = hours
         self.no_repeat = no_repeat
         if delay == None:
             self.delay = min(15, max(5, int(self.hours *60)))
         else:
             self.delay = max(5, delay)
+        self.site = pywikibot.getSite()
+        if userlist == None:
+            self.userlist = None
+        else:
+            self.userlist = [page.title().split(u'/')[0] for page in pywikibot.Page(self.site, userlist).linkedPages()]

     def run(self):

 <at>  <at>  -143,29 +164,43  <at>  <at> 
                    int(time2[10:12])
             return abs(t2-t1)

-        mySite = pywikibot.getSite()
+        mySite = self.site
         while True:
             wait = False
             now = time.strftime("%d %b %Y %H:%M:%S (UTC)", time.gmtime())
             localSandboxTitle = pywikibot.translate(mySite, sandboxTitle)
+            IsUserSandbox = (self.userlist is not None)  # DrTrigonBot (Clean User Sandbox Robot)
+            if IsUserSandbox:
+                localSandboxTitle = u'%s/' + localSandboxTitle.split(u':')[-1]
+                localSandboxTitle = [localSandboxTitle % user for user in self.userlist]
             if type(localSandboxTitle) is list:
                 titles = localSandboxTitle
             else:
                 titles = [localSandboxTitle,]
             for title in titles:
                 sandboxPage = pywikibot.Page(mySite, title)
+                pywikibot.output(u'Preparing to process sandbox page %s' % sandboxPage.title(asLink=True))
                 try:
                     text = sandboxPage.get()
                     translatedContent = pywikibot.translate(mySite, content)
                     translatedMsg = pywikibot.translate(mySite, msg)
                     subst = 'subst:' in translatedContent
+                    pos = text.find(translatedContent.strip())
                     if text.strip() == translatedContent.strip():
                         pywikibot.output(u'The sandbox is still clean, no change necessary.')
                     elif subst and sandboxPage.userName() == mySite.loggedInAs():
                         pywikibot.output(u'The sandbox might be clean, no change necessary.')
-                    elif text.find(translatedContent.strip()) <> 0 and not subst:
-                        sandboxPage.put(translatedContent, translatedMsg)
-                        pywikibot.output(u'Standard content was changed, sandbox cleaned.')
+                    elif pos <> 0 and not subst:
+                        if IsUserSandbox:
+                            endpos = pos + len(translatedContent.strip())
+                            if (pos < 0) or (endpos == len(text)):
+                                pywikibot.output(u'The user sandbox is still clean or not set up, no change necessary.')
+                            else:
+                                sandboxPage.put(text[:endpos], translatedMsg)
+                                pywikibot.output(u'Standard content was changed, user sandbox cleaned.')
+                        else:
+                            sandboxPage.put(translatedContent, translatedMsg)
+                            pywikibot.output(u'Standard content was changed, sandbox cleaned.')
                     else:
                         diff = minutesDiff(sandboxPage.editTime(), time.strftime("%Y%m%d%H%M%S", time.gmtime()))
                         if pywikibot.verbose:
 <at>  <at>  -179,6 +214,9  <at>  <at> 
                             wait = True
                 except pywikibot.EditConflict:
                     pywikibot.output(u'*** Loading again because of edit conflict.\n')
+                except pywikibot.NoPage:
+                    pywikibot.output(u'*** The sandbox is not existent, skipping.')
+                    continue
             if self.no_repeat:
                 pywikibot.output(u'\nDone.')
                 return
 <at>  <at>  -192,6 +230,7  <at>  <at> 
 def main():
     hours = 1
     delay = None
+    userlist = None
     no_repeat = True
     for arg in pywikibot.handleArgs():
         if arg.startswith('-hours:'):
 <at>  <at>  -199,11 +238,13  <at>  <at> 
             no_repeat = False
         elif arg.startswith('-delay:'):
             delay = int(arg[7:])
+        elif arg.startswith('-userlist:'):
+            userlist = arg[10:]
         else:
             pywikibot.showHelp('clean_sandbox')
             return

-    bot = SandboxBot(hours, no_repeat, delay)
+    bot = SandboxBot(hours, no_repeat, delay, userlist)
     try:
         bot.run()
     except KeyboardInterrupt:
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Gmane