Marlen Caemmerer | 21 May 2013 14:44
Picon
Favicon

TS web broken

hello,

due to a heavy mistake i made with rsync i have to say ts web is currently broken.
as a result of me accidently deleting the configs dab shutdown the web service to be on the safe side.

ill try to repair.

cheers
 	marlen/nosy

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
phil.el | 21 May 2013 10:58
Picon
Favicon

SGE fail to run any of my tasks, they error out as "cgroup change of group failed"

[Asking here, as JIRA doesn't allow me to fill bug report (I'm delogged when submiting a bug)]

Since the switch to a linux implementation of SGE none of my task can start sucecessfully

A typical entry 

qcronsub -l arch=lx -l h_rt=INFINITY -l virtual_free=64M -notify $HOME/wsbot/wshocr

end up with the creation of a file $HOME/wshocr.e* containing

cgroup change of group failed

then the task is killed.

regards

--

-- 
Phe

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Alex Brollo | 18 May 2013 00:29
Picon

Two questions about pywikipedia

As I told, I'm driving a pywikipedia bot (Alebot) into willow; I installed pywikipedia at $HOME, I set 

PYTHONPATH=/home/alebot/pywikipedia [,.....] 

into .environment file, and I am reviewing at my best python scripts to run them by qsub/qcronsub. 

Most of my scripts obviously need a statement:

import wikipedia [,.....]

My questions:
1. Do I really need to install pywikipedia into $HOME?
2. My scripts run happily outside SGE, but when I try to run them under SGE I get an error: 
ImportError: No module named wikipedia

I didn't find doc for this ImportError (I suppose, PYTHONPATH is wrong).

Where am I going wrong? 

Alex


_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Tim Landscheidt | 17 May 2013 03:41
Picon

SGE thinks hosts are overloaded while the latter are idling

Hi,

a "qstat -j" of a simple job yields inter alia:

| scheduling info:            queue instance "longrun-sol <at> willow.toolserver.org" dropped because it is
temporarily not available
|                             queue instance "short-sol <at> willow.toolserver.org" dropped because it is temporarily not available
|                             queue instance "medium-lx <at> mayapple.toolserver.org" dropped because it is temporarily not available
|                             queue instance "longrun3-sol <at> willow.toolserver.org" dropped because it is temporarily not available
|                             queue instance "longrun2-sol <at> clematis.toolserver.org" dropped because it is disabled
|                             queue instance "longrun2-sol <at> hawthorn.toolserver.org" dropped because it is disabled
|                             queue instance "medium-sol <at> ortelius.toolserver.org" dropped because it is overloaded:
np_load_short=0.791601 (= 0.391601 + 0.8 * 2.000000 with nproc=4) >= 0.75
|                             queue instance "medium-lx <at> yarrow.toolserver.org" dropped because it is overloaded:
np_load_short=1.215000 (= 0.015000 + 0.8 * 6.000000 with nproc=4) >= 1.2
|                             queue instance "medium-lx <at> nightshade.toolserver.org" dropped because it is overloaded:
np_load_short=1.227500 (= 0.127500 + 0.8 * 11.000000 with nproc=8) >= 1.2
|                             queue instance "medium-sol <at> wolfsbane.toolserver.org" dropped because it is overloaded:
np_load_short=0.778613 (= 0.078613 + 0.8 * 7.000000 with nproc=8) >= 0.75
|                             queue instance "short-sol <at> wolfsbane.toolserver.org" dropped because it is overloaded:
np_load_short=1.278613 (= 0.078613 + 0.8 * 12.000000 with nproc=8) >= 1.2
|                             queue instance "short-sol <at> ortelius.toolserver.org" dropped because it is overloaded:
np_load_short=1.391601 (= 0.391601 + 0.8 * 5.000000 with nproc=4) >= 1.2
|                             queue instance "longrun-lx <at> yarrow.toolserver.org" dropped because it is overloaded:
np_load_short=3.215000 (= 0.015000 + 0.8 * 16.000000 with nproc=4) >= 3.1
|                             queue instance "longrun-lx <at> nightshade.toolserver.org" dropped because it is overloaded:
mem_free=-420765696.524288 (= 14098.726562M - 500M * 29.000000) <= 500

At the moment, we have /no/ jobs scheduled by SGE running.
Meanwhile, the hosts are idling:

| queuename                      qtype resv/used/tot. load_avg arch          states
| ---------------------------------------------------------------------------------
| short-sol <at> ortelius.toolserver. B     0/0/8          1.52     sol-amd64
| ---------------------------------------------------------------------------------
| short-sol <at> willow.toolserver.or B     0/0/8          -NA-     sol-amd64     au
| ---------------------------------------------------------------------------------
| short-sol <at> wolfsbane.toolserver B     0/0/12         0.64     sol-amd64
| ---------------------------------------------------------------------------------
| medium-lx <at> mayapple.toolserver. B     0/0/32         -NA-     linux-x64     adu
| ---------------------------------------------------------------------------------
| medium-lx <at> nightshade.toolserve B     0/0/8          1.05     linux-x64
| ---------------------------------------------------------------------------------
| medium-lx <at> yarrow.toolserver.or B     0/0/8          0.02     linux-x64
| ---------------------------------------------------------------------------------
| longrun-lx <at> nightshade.toolserv BI    0/0/64         1.05     linux-x64
| ---------------------------------------------------------------------------------
| longrun-lx <at> yarrow.toolserver.o BI    0/0/64         0.02     linux-x64
| ---------------------------------------------------------------------------------
| longrun-sol <at> willow.toolserver. BI    0/0/64         -NA-     sol-amd64     au
| ---------------------------------------------------------------------------------
| medium-sol <at> ortelius.toolserver B     0/0/4          1.52     sol-amd64
| ---------------------------------------------------------------------------------
| medium-sol <at> wolfsbane.toolserve B     0/0/4          0.64     sol-amd64
| ---------------------------------------------------------------------------------
| longrun2-sol <at> clematis.toolserv B     0/0/8          0.03     sol-amd64     d
| ---------------------------------------------------------------------------------
| longrun2-sol <at> hawthorn.toolserv B     0/0/8          0.23     sol-amd64     d
| ---------------------------------------------------------------------------------
| longrun3-sol <at> willow.toolserver B     0/0/4          -NA-     sol-amd64     aduE

I filed https://jira.toolserver.org/browse/TS-1650 on Monday
to no avail so far.

Tim

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Peter Körner | 14 May 2013 11:26
Picon

weird qcronsub errors (was: Output from "cron" command)

Hi

Since a few days I'm getting weird errors when submitting tasks.

My Cronjob calls 
"/home/mazder/public_html/replicate-sequences/update-submit.sh" which 
conains the following command:

qcronsub -l h_rt=0:05:00 -l virtual_free=100M -l arch=* -l sql-user-m=1 
-N mazder-replicate-sequences -m as -o 
'/home/mazder/public_html/replicate-sequences/sge' 
'/home/mazder/public_html/replicate-sequences/update-run.sh'

Most of these calls produce the error below, which seems not to be an 
error in my code as I neither use xml nor python.

Do you have any Idea what's going wrong?
Peter

-------- Original-Nachricht --------
Betreff: Output from "cron" command
Datum: Tue, 14 May 2013 08:40:00 +0000 (UTC)
Von: mazder <at> toolserver.org (mazder)
An: mazder <at> toolserver.org

Your "cron" job on clematis
/home/mazder/public_html/replicate-sequences/update-submit.sh

produced the following output:

error: JSV stderr: Traceback (most recent call last):
error: JSV stderr: File "/sge/GE/bin/sol-amd64/qjobtest", line 108, in 
<module>
error: JSV stderr: dom = minidom.parse(child_stdout)
error: JSV stderr: File 
"/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/minidom.py", line 
1915, in parse
error: JSV stderr: return expatbuilder.parse(file)
error: JSV stderr: File 
"/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/expatbuilder.py", 
line 930, in parse
error: JSV stderr: result = builder.parseFile(file)
error: JSV stderr: File 
"/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/expatbuilder.py", 
line 207, in parseFile
error: JSV stderr: parser.Parse(buffer, 0)
error: JSV stderr: xml.parsers.expat.ExpatError: syntax error: line 1, 
column 0
Unable to run job: JSV stderr: Traceback (most recent call last):
JSV stderr: File "/sge/GE/bin/sol-amd64/qjobtest", line 108, in <module>
JSV stderr: dom = minidom.parse(child_stdout)
JSV stderr: File 
"/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/minidom.py", line 
1915, in parse
JSV stderr: return expatbuilder.parse(file)
JSV stderr: File 
"/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/expatbuilder.py", 
line 930, in parse
JSV stderr: result = builder.parseFile(file)
JSV stderr: File 
"/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/expatbuilder.py", 
line 207, in parseFile
JSV stderr: parser.Parse(buffer, 0)
JSV stderr: xml.parsers.expat.ExpatError: syntax error: line 1, column 0
JSV stderr is - xml.parsers.expat.ExpatError: syntax error: line 1, 
column 0.
Exiting.

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Shubinator | 12 May 2013 01:03
Picon

Can't login to willow

SSHing into willow doesn't work; SSH apparently doesn't think my key
would be accepted by willow. I logged into willow three hours ago and
it worked fine. Anyone else seeing issues?

Is this a side effect of the maintenance?

Shubinator

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Finne Boonen | 10 May 2013 11:42
Picon

Amsterdam hackathon workshops

For all Amsterdam hackathon attendees

We've made a first draft of the programme for the Amsterdam Hackathon. We have room for 6 workshops which will be run twice over the course of the Hackathon.

Which topics should we cover in them? Based on your motivations, we've made a list of the top 8 topics that seem to be of interest. Please take a look at the page and let us know which topics you'd be most interested in: https://www.mediawiki.org/wiki/Amsterdam_Hackathon_2013/Topics

For some of these workshops we're still looking for presenters. Are you interested? Let us know! (hackathon <at> wikimedia.nl)

Think you could fill a 1h (interactive) workshop for one of these topics, but you're not sure?  I'll help you create a workshop and help you practice beforehand.

Finne (henna)
On behalf of the programme committee




_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Marlen Caemmerer | 6 May 2013 09:27
Picon
Favicon

Maintenance: Solaris Updates

Hello,

I will update the solaris user and web servers as well as the head nodes on

Friday, 10th May after 1830 UTC.

Cheers
 	Marlen/nosy

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Marlen Caemmerer | 6 May 2013 09:25
Picon
Favicon

Maintenance: Rebooting ortelius web server

Hey,

I would like to reboot ortelius, one of the web servers at

tomorrow, Tuesday 1830 UTC

Cheers
 	Marlen/nosy

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
rupert THURNER | 4 May 2013 18:45
Picon
Gravatar

Fwd: Your message to Labs-l awaits moderator approval

This bounce alone makes me sad ... is it really necessary to have another list?

---------- Weitergeleitete Nachricht ----------
Von: <labs-l-bounces <at> lists.wikimedia.org>
Datum: 04.05.2013 15:10
Betreff: Your message to Labs-l awaits moderator approval
An: <rupert.thurner <at> gmail.com>

Your mail to 'Labs-l' with the subject

    Re: [Toolserver-l] Migrating to labs from the toolserver

Is being held until the list moderator can review it for approval.

The reason it is being held:

    Post by non-member to a members-only list

Either the message will get posted to the list, or you will receive
notification of the moderator's decision.  If you would like to cancel
this posting, please visit the following URL:

    https://lists.wikimedia.org/mailman/confirm/labs-l/7c60651f25eb0c31e3da598f8485c76d9a6a9c35

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Morten Wang | 2 May 2013 23:28
Picon

SGE job queue availability and execution

I've noticed some irregularity in job execution through SGE over the past few days.  Currently it seems several queues are either disabled or in an error state.

Is this expected?  Is there an easy way to get an idea about how many jobs are queued and how quickly they're executed, in other words how to predict when a certain job might be run?  Or maybe this is just a temporary issue that'll get resolved shortly?


Cheers,
Morten

_______________________________________________
Toolserver-l mailing list (Toolserver-l <at> lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

Gmane