Michal Kawalec | 10 Jul 2011 18:10
Picon

Some help with for loop

Hello,
I wanted to pararellize a very big and long for loop in my program, 
which looks like:
for a,b in enumerate(files):
	do
	a lot of stuff here

Is there an easy way of doing this with joblib?

Best Regards,
Michal

Gael Varoquaux | 12 Jul 2011 01:49
Favicon
Gravatar

Re: Some help with for loop

Just look at the online documentation for Parallel, you should be able to use it to solve your problem. Sorry
for being terse, I am replying from my mobile phone.

----- Original message -----
> Hello,
> I wanted to pararellize a very big and long for loop in my program, 
> which looks like:
> for a,b in enumerate(files):
>     do
>     a lot of stuff here
> 
> Is there an easy way of doing this with joblib?
> 
> 
> Best Regards,
> Michal

Gustavo Goretkin | 29 Jul 2011 11:13
Picon
Gravatar

OSError: [Errno 4] Interrupted system call


I'm running a very simple demo job, but I'm getting an exception from the multiprocessing module. The probability of the exception occurring increases with n_jobs. That is, it's rarer for me to get the exception with n_jobs = 2, but with n_jobs = 8, it's almost guaranteed. I'm running Ubuntu 10.10. I can't tell what's interrupting the parent process. Any suggestions?

Thanks,
Gustavo


In [4]: Parallel(n_jobs=-1, verbose=1)(delayed(sleep)(.1) for _ in range(10))
[Parallel(n_jobs=-1)]: Done   1 out of  10 |elapsed:    0.1s remaining:    0.9s
[Parallel(n_jobs=-1)]: Done   2 out of  10 |elapsed:    0.1s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done   3 out of  10 |elapsed:    0.1s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done   4 out of  10 |elapsed:    0.1s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done   5 out of  10 |elapsed:    0.1s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done   6 out of  10 |elapsed:    0.1s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done   7 out of  10 |elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   8 out of  10 |elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   9 out of  10 |elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  10 out of  10 |elapsed:    0.1s remaining:    0.0s
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "/afs/athena.mit.edu/user/g/o/goretkin/py/lis_env/lib/python2.6/site-packages/joblib-0.4.4.dev-py2.6.egg/joblib/parallel.py", line 257, in __call__
    pool.join()
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 342, in join
    p.join()
  File "/usr/lib/python2.6/multiprocessing/process.py", line 119, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python2.6/multiprocessing/forking.py", line 117, in wait
    return self.poll(0)
  File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
    pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 4] Interrupted system call

Gael Varoquaux | 29 Jul 2011 14:23
Favicon
Gravatar

Re: OSError: [Errno 4] Interrupted system call

On Fri, Jul 29, 2011 at 05:13:35AM -0400, Gustavo Goretkin wrote:
>    I'm running a very simple demo job, but I'm getting an exception from the
>    multiprocessing module. The probability of the exception occurring
>    increases with n_jobs. That is, it's rarer for me to get the exception
>    with n_jobs = 2, but with n_jobs = 8, it's almost guaranteed. I'm running
>    Ubuntu 10.10. I can't tell what's interrupting the parent process. Any
>    suggestions?

That's really strange. I cannot reproduce the problem on my 10.04
computer that has 12 CPUs and on my 10.10 computer with 2 CPUs. I don't
have much of a guess as to what my be interrupting the process. I noticed
that you were running joblib 0.4.4. Have you tried with latest release
(0.5.3)?

Gael

Gustavo Goretkin | 29 Jul 2011 14:43
Picon
Gravatar

Re: OSError: [Errno 4] Interrupted system call

There is something very odd with my setup. Thankfully, I have gotten it to work now. I'll try to dig further, but to give an idea:

I'm on a machine without root, so I'm using virtualenv.
I was using ipython within Spyder IDE.
Spyder might have been loading the ipython globally installed on the machine, because running joblib inside ipython which I launched from the command line worked fine.

The other occasional error was thrown in delayed() by the line "pickle.dumps(function)" (full stack trace below, from a different machine so joblib version is different). This happens on my personal machine in ipython (virtualenv out of the picture).

Thanks for the great module!
Gustavo


-----------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/joblib-0.5.3.dev-py2.6.egg/joblib/parallel.py", line 419, in __call__
    for function, args, kwargs in iterable:
  File "<ipython console>", line 1, in <genexpr>
  File "/usr/local/lib/python2.6/dist-packages/joblib-0.5.3.dev-py2.6.egg/joblib/parallel.py", line 86, in delayed
    pickle.dumps(function)
  File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects

On Fri, Jul 29, 2011 at 8:23 AM, Gael Varoquaux <gael.varoquaux <at> normalesup.org> wrote:
On Fri, Jul 29, 2011 at 05:13:35AM -0400, Gustavo Goretkin wrote:
>    I'm running a very simple demo job, but I'm getting an exception from the
>    multiprocessing module. The probability of the exception occurring
>    increases with n_jobs. That is, it's rarer for me to get the exception
>    with n_jobs = 2, but with n_jobs = 8, it's almost guaranteed. I'm running
>    Ubuntu 10.10. I can't tell what's interrupting the parent process. Any
>    suggestions?

That's really strange. I cannot reproduce the problem on my 10.04
computer that has 12 CPUs and on my 10.10 computer with 2 CPUs. I don't
have much of a guess as to what my be interrupting the process. I noticed
that you were running joblib 0.4.4. Have you tried with latest release
(0.5.3)?

Gael

Gael Varoquaux | 29 Jul 2011 14:47
Favicon
Gravatar

Re: OSError: [Errno 4] Interrupted system call

On Fri, Jul 29, 2011 at 08:43:25AM -0400, Gustavo Goretkin wrote:
>    I was using ipython within Spyder IDE.
>    Spyder might have been loading the ipython globally installed on the
>    machine, because running joblib inside ipython which I launched from the
>    command line worked fine.

OK, indeed. Spyder is using an oldish mechanism of embedding IPython to
which I contributed a bit, and that is not particularly pretty. I
remember that we hacked a bit interprocess communication. Thankfully the
core IPython devs have worked on these issues, and I suspec that the new
IPython just released will be much cleaner in this respect.

>    The other occasional error was thrown in delayed() by the line
>    "pickle.dumps(function)" (full stack trace below, from a different machine
>    so joblib version is different). This happens on my personal machine in
>    ipython (virtualenv out of the picture).

>      File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
>        raise TypeError, "can't pickle %s objects" % base.__name__
>    TypeError: can't pickle function objects

Yes, you have to define the function that you are interested in
'delaying' in a separate module. This should solve that problem.

Cheers,

Gael

Gmane