Fukumu Tsutsumi | 26 Mar 16:16 2015
Picon

About Application for GSoC

Hello, my name is Fukumu Tsutsumi. I'm a student at the University of Tokyo, Japan.
I'm planning to apply for GSoC with one of SciPy projects, but I deeply regret that I did not notice the deadline is so close (only about a day).
I wonder if I can meet the deadline. I'm very eager to participate in GSoC, so wouldn't like to give up this opportunity.
I would appreciate if anyone reply for me.

Sincerely,

Fukumu Tsutsumi
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Ralf Gommers | 25 Mar 20:19 2015
Picon

Re: Regarding taking up project ideas and GSoC 2015



On Wed, Mar 25, 2015 at 6:59 PM, Maniteja Nandana <maniteja.modesty067 <at> gmail.com> wrote:
Hi everyone,

I wanted to get some feedback on the application format and whether the mentioning of methods, API and other packages is necessary in the application or would it be preferable to provide a link to the Wiki page which contains that information.

Your proposal is already a lot more detailed than other proposals are, so I suggest to at least not make it any longer. Moving/keeping some of the background content to/in the wiki and linking to it in your proposal would be even better.
 
I would also update the timeline as early as possible, after I refine the ideas. It would also be great to have any other feedback.

Your timeline is now still empty, it's important to fill that in asap. It's easier to comment on a draft and improve it than suggest something from scratch. There are a number of things that have been suggested and you could put in (and a few I just thought of):

- write set of univariate test functions with known first and higher order derivatives
- same exercise for multivariate test functions
- define desired broadcasting behavior and implement
- refactor numdifftools.core.Derivative
- finalize API in a document
- integrate module into Scipy
- replace usages of numpy.diff with new scipy.diff functionality within Scipy
- (bonus points for at the end):
    - write a tutorial section about scipy.diff
    - write a nice set of benchmarks
 
Cheers,
Ralf



On Mon, Mar 23, 2015 at 6:42 AM, Maniteja Nandana <maniteja.modesty067 <at> gmail.com> wrote:
Hi everyone, 
I was thinking it would be nice to put forward my ideas regarding the implementation of the package.

Thanks to Per Brodtkorb for the feedback.

On Thu, Mar 19, 2015 at 7:29 PM, <Per.Brodtkorb <at> ffi.no> wrote:

Hi,

 

For your information I have reimplemented the approx._fprime and approx._hess code found in statsmodels and added the epsilon extrapolation

method of Wynn. The result you can see here:

https://github.com/pbrod/numdifftools/blob/master/numdifftools/nd_cstep.py

 

This is wonderful, The main aim now is to find a way to determine whether the function is analytic, which is the necessity for the complex step to work. Though differentiability is one of the main necessities for analyticity, it would be really great if any new suggestions are there ?

I have also compared the accuracy and runtimes for the different alternatives here:

https://github.com/pbrod/numdifftools/blob/master/numdifftools/run_benchmark.py

 

Thanks for the information. This would help me better in understanding the pros and cons for various methods.

 

Personally I like the class interface better than the functional one because you can pass the resulting object as function to other methods/functions and these functions/methods do not need to know what it does behind the scenes or what options are used. This simple use case is exemplified here:

 

>>> g = lambda x: 1./x

>>> dg = Derivative(g, **options)

>>> my_plot(dg)

>>> my_plot(g)

 

In order to do this with a functional interface one could wrap it like this:

 

>>> dg2  = lambda x: fprime(g, x, **options)

>>> my_plot(dg2)

 

If you like the one-liner that the function gives, you could call the Derivate class like this

 

>>> Derivate(g, **options)(x)

 

Which is very similar to the functional way:

>>> fprime(g, x, **options)


This is a really sound example for using classes. I agree that classes are better than functions with multiple arguments, and also the Object would e reusable for other evaluations.

 

Another argument for having it as a class is that a function will be large and

large functions are where classes go to hide”. This  is a quote of Uncle Bob’s that we hear frequently in the third and fourth Clean Coders episodes. He states that when a function starts to get big it’s most likely doing too much— a function should do one thing only and do that one thing well. Those extra responsibilities that we try to cram into a long function (aka method) can be extracted out into separate classes or functions.

 

The implementation in https://github.com/pbrod/numdifftools/blob/master/numdifftools/nd_cstep.py is an attempt to do this.

 

For the use case where n>=1 and the Richardson/Romberg extrapolation method, I propose to factor this out in a separate class e.g. :

>>> class NDerivative(object):

….      def __init__(self, f, n=1, method=’central’, order=2, …**options):

 

It is very difficult to guarantee a certain accuracy for derivatives from finite differences. In order to get error-estimates for the derivatives one must do several functions evaluations. In my experience with numdifftools it is very difficult to know exactly which step-size is best. Setting it too large or too small are equally bad and difficult to know in advance. Usually there is a very limited window of useful step-sizes which can be used for extrapolating the evaluated differences to a better final result. The best step-size can often be found around (10*eps)**(1./s)*maximum(log1p(abs(x)), 0.1) where s depends on the method and derivative order.  Thus one cannot improve the results indefinitely by adding more terms. With finite differences you can hope the chosen sampling scheme gives you reasonable values and error-estimates, but many times, you just have to accept what you get.  

 

Regarding the proposed API I wonder how useful the input arguments epsabs, epsrel  will be?

I was just then tinkering about controlling the absolute and relative errors of the derivative, but now it seems like we should just let the methods to take care of it.

I also wonder how one can compute the outputs abserr_round, abserr_truncate accurately?

This idea was from the implementation in this function. I am not sure of how accurate the errors would be, but I suppose this is possible to implement.

 

 

Best regards

Per A. Brodtkorb


Regarding the API, after some discussion, the class implementation would be something like

Derivative()

     Def __init__(f, h=None, method=’central’, full_output=False)

     Def __call__(self, x, *args, **kwds)

 

Gradient():

     Def __init__(f, h=None, method=’central’, full_output=False)

     Def __call__(self, x, *args, **kwds)

 

Jacobian():

    Def __init__(f, h=None, method=’central’, full_output=False)

     Def __call__(self, x, *args, **kwds)

 

Hessian():

     Def __init__(f, h=None, method=’central’, full_output=False)

     Def __call__(self, x, *args, **kwds)


NDerivative():

    Def __init__(f, n=1, h=None, method=’central’, full_output=False, **options)

    Def __call__(self, x, *args, **kwds)

 

Where options could be

Options = dict(order=2, Romberg_terms=2)


I would like to hear opinion on this implementation, where the main issues are

  1. whether the h=None default would mean best step-size found using by around (10*eps)**(1./s)*maximum(log1p(abs(x)), 0.1) where s depends on the method and derivative order or StepGenerator, based on  epsilon algorithm by wynn.
  2. Whether the *args and **kwds should be in __init__ or __call__, the preference by Perk was for it being in __call__ makes these object compatible with scipy.optimize.minimize(funx0args=()method=Nonejac=Nonehess=None,…..) where the args are passed both to the function and jac/hess if they are supplied.
  3. Are the input arguments for the __init__ sufficient ?
  4. What should we compute and return for full_output=True, I was thinking of the following options :

x: ndarray solution array,

success : bool a flag indicating if the derivative was calculated successfully        message : str which describes the cause of the error, if occurred            nfev int number of function evaluations 

abserr_round  : float absolute value of the roundoff error, if applicable            abserr_truncate : float absolute value of the truncation error, if applicable

It would be great any other opinions and suggestions on this.

_______________________________________________

SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev


Cheers,

Maniteja.

_______________________________________________

SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
 



_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev


_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
James Phillips | 24 Mar 20:00 2015

Robert Kern's accursed excellent DE implementation

I have been using Robert Kern's implementation of the Differential Evolution (DE) genetic algorithm for a decade, for the purpose of guessimating initial parameter estimates for curve fitting and surface fitting in my oprn source pyeq2 fitting library.

I can't find anything that works better, which gives rise to my current problem.

In trying to improve performance of my fitting library, I tried to use GPU calculations for each generation of the genetic algorithm, I found the following:

1) Robert's 2005 implementation of DE is not parallelizeable, as each crossover withing a generation can affect the population from which new items will be created.  That is, within a given generation the population *changes* as the algorithm runs the generation itself, and it must run serially in it's present form.

2) I can rework the algorithm to be parallelizeable by separating out "crossover", "breeding" and "evolving" into three separate steps, but months of testing show that population size and number of generations must beconsiderably  increased to match the results from Robert's version.  That is, making the algorithm parallelizable means slowing it down so I can speed it up!

I would like to increase performance, but cannot find any way to equal Robert's results without reducing performance prior to parallelization.

Any suggestions?

James
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Ian Henriksen | 24 Mar 01:22 2015
Picon

GSOC 2015 projects

Hi all,

I'm putting together an application for GSOC 2015, and, although the deadline is approaching fast, I would still appreciate your feedback on a few of my ideas.

I am a masters student studying mathematics at Brigham Young University. Thus far my primary contribution to SciPy has been the Cython API for BLAS and LAPACK (see https://github.com/scipy/scipy/pull/4021).

My research is in Isogeometric Analysis, i.e. finite element analysis on spline curves. My open source work and research at BYU have given me a great deal of experience with Python and Cython, but I am also familiar with C, C++, and Fortran. As such, I have been reflecting on projects that would be best suited to my skill set, as well as most beneficial to SciPy.

I'm curious to know which of the following projects would be of greatest interest to the SciPy community:

1. Wrapping the PROPACK library for sparse linear algebra and using it for sparse SVD computation in scipy.sparse. There has been some initial work on f2py wrappers for PROPACK at https://github.com/jakevdp/pypropack, though it appears the wrappers are still incomplete.
2. Implementing an improved library for spline operations in SciPy. I am very familiar with the different refinement techniques used in CAD (knot insertion, degree elevation, etc.) and could implement a library that would be able to perform them all. My ideal here would be to write a C++ or Fortran library to do this and then wrap it via Cython. The emphasis would be primarily on writing code for refinement and evaluation that is both fast and general. I could include code for spline subdivision methods as well.
3. Adding support for Cython to both f2py and numpy.distutils. The goal here would be to allow f2py to generate cython-compatible wrappers from existing pyf files. I would also modify numpy.distutils so it could compile Cython files.
4. Wrap ffts (https://github.com/anthonix/ffts) and use it as an alternative to FITPACK in scipy.fft for use cases where it is faster.

Which of these projects would be most appreciated? I certainly want to be able to make a valid and, more importantly, useful contribution.

Thanks!

- Ian Henriksen
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Ralf Gommers | 23 Mar 22:21 2015
Picon

GSoC students: please read

Hi all,

It's great to see that this year there are a lot of students interested in doing a GSoC project with Numpy or Scipy. So far five proposals have been submitted, and it looks like several more are being prepared now. I'd like to give you a bit of advice as well as an idea of what's going to happen in the few weeks.

The deadline for submitting applications is 27 March. Don't wait until the last day to submit your proposal! It has happened before that Melange was overloaded and unavailable - the Google program admins will not accept that as an excuse and allow you to submit later. So as soon as your proposal is in good shape, put it in. You can still continue revising it.

From 28 March until 13 April we will continue to interact with you, as we request slots from the PSF and rank the proposals. We don't know how many slots we will get this year, but to give you an impression: for the last two years we got 2 slots. Hopefully we can get more this year, but that's far from certain.

Our ranking will be based on a combination of factors: the interaction you've had with potential mentors and the community until now (and continue to have), the quality of your submitted PRs, quality and projected impact of your proposal, your enthusiasm, match with potential mentors, etc. We will also organize a video call (Skype / Google Hangout / ...) with each of you during the first half of April to be able to exchange ideas with a higher communication bandwidth medium than email.

Finally a note on mentoring: we will be able to mentor all proposals submitted or suggested until now. Due to the large interest and technical nature of a few topics it has in some cases taken a bit long to provide feedback on draft proposals, however there are no showstoppers in this regard. Please continue improving your proposals and working with your potential mentors.

Cheers,
Ralf





_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Abraham Escalante | 23 Mar 04:14 2015
Picon

`histogram2` and `signaltonoise` deprecation from `scipy.stats`

Hello all,

As part of the StatistisCleanup milestone (which I aim to complete by late August) `scipy.stats.histogram2` and `scipy.stats.signaltonoise` are to be deprecated but of course, we would like to get opinions from the community.

In short: 
  • `histogram2` is not well tested and is unnecessary since `np.histogram2d` can be used instead.
  • `signaltonoise` doesn't really belong in `scipy.stats` and it is rarely used.
For more details, please refer to issues #602 and #609.

If you have an objection or any opinion regarding this please let me know to take it into account.


Regards,
Abraham Escalante. 
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Eric Martin | 21 Mar 07:20 2015
Picon

Sparse compressed major axis slicing with sequence is slow

Hi,

I filed https://github.com/scipy/scipy/issues/4573 a few weeks ago and am still waiting for some contact from someone involved with Scipy development that this work is wanted. I recommend reading the issue, but the summary is that slicing a compressed sparse matrix along the major axis with a sequence is quite slow.

My method offers about a 100x speedup when selecting only a small number of rows/columns, and causes a bit of a slowdown if selecting many rows (but perhaps this slowdown could be limited with more development time). I also observed that the compressed sparse matrix initialization takes a large amount of time validating input data.

I'd really appreciate some feedback on things like

(1) is it OK if the code takes 2 different paths depending on input size (based on speculation of which would be faster)?
(2) can I add code paths for compressed matrix initialization that skip input data sanity checks?

before I take the time to make a PR.

Thanks a ton,
Eric Martin
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Will Adler | 20 Mar 19:29 2015

Speeding up scipy.special.erf()?

I hope this is the right place to post this. A user on StackOverflow told me to report this.

I am trying to transition from MATLAB to Python. The majority of my computational time is spent calling erf on millions or billions of vectors. Unfortunately, it seems that scipy.special.erf() takes about 3 times as long as MATLAB’s erf().

Is there anything that can be done to speed up SciPy’s erf()?

Check for yourself if you wish:
MATLAB
r=rand(1,1e7)
tic;erf(r);toc % repeat this line a few times

Python
import numpy as np
import scipy.special as sps
r=np.random.rand(1e7)
%timeit sps.erf(r)

Thanks!

Will Adler
PhD Candidate
Center for Neural Science
New York University
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev

SciPy 2015 Call for Propsals Open - tutorial & talk submissions due April 1st

**SciPy 2015 Conference (Scientific Computing with Python) Call for Proposals: Submit Your Tutorial and Talk Ideas by April 1, 2015 at http://scipy2015.scipy.org.**  

 

SciPy 2015, the fourteenth annual Scientific Computing with Python conference, will be held July 6-12, 2015 in Austin, Texas. SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conference brings together over 500 participants from industry, academia, and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. The full program will consist of two days of tutorials by followed by three days of presentations, and concludes with two days of developer sprints. More info available on the conference website at http://scipy2015.scipy.org; you can also sign up on the website for mailing list updates or follow <at> scipyconf on Twitter. We hope you’ll join us - early bird registration is open until May 15, 2015 at http://scipy2015.scipy.org

 

We encourage you to submit tutorial or talk proposals in the categories below; please also share with others who you’d like to see participate! Submit via the conference website <at> http://scipy2015.scipy.org.

 

*SCIPY TUTORIAL SESSION PROPOSALS –  DEADLINE EXTENDED TO WED APRIL 1, 2015*

The SciPy experience kicks off with two days of tutorials. These sessions provide extremely affordable access to expert training, and consistently receive fantastic feedback from participants. We're looking for submissions on topics from introductory to advanced - we'll have attendees across the gamut looking to learn. Whether you are a major contributor to a scientific Python library or an expert-level user, this is a great opportunity to share your knowledge and stipends are available. Submit Your Tutorial Proposal on the SciPy 2015 website: http://scipy2015.scipy.org

 

*SCIPY TALK AND POSTER SUBMISSIONS – DUE April 1, 2015*

SciPy 2015 will include 3 major topic tracks and 7 mini-symposia tracks.  Submit Your Talk Proposal on the SciPy 2015 website: http://scipy2015.scipy.org

 

Major topic tracks include:

- Scientific Computing in Python (General track)

- Python in Data Science

- Quantitative and Computational Social Sciences

 

Mini-symposia will include the applications of Python in:

- Astronomy and astrophysics

- Computational life and medical sciences

- Engineering

- Geographic information systems (GIS)

- Geophysics

- Oceanography and meteorology

- Visualization, vision and imaging


If you have any questions or comments, feel free to contact us at: scipy-organizers <at> scipy.org.

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Robert Lucente - Pipeline | 18 Mar 02:07 2015
Picon

Tensors

You guys are probably aware of the following article but just in case

Let's build open source tensor libraries for data science - Tensor methods
for machine learning are fast, accurate, and scalable, but we'll need
well-developed libraries by Ben Lorica 

http://radar.oreilly.com/2015/03/lets-build-open-source-tensor-libraries-for
-data-science.html

Maybe this type of thing is inappropriate for this mailing list?
Ralf Gommers | 16 Mar 22:50 2015
Picon

Re: GSOC: scipy.stats improvement



On Mon, Mar 16, 2015 at 3:03 PM, Vidit Bhargava <viditvineet <at> gmail.com> wrote:
Hey.. I was working on building scipy but I keep getting the the error
"no lapack/blas resources found"

I already tried instaling OpenBLAS and changing the environment variable

We're going to need some more info to be able to help you. Can you report version of your OS and all compilers you use, how you installed Python and your BLAS/LAPACK, the build command you used and the relevant part of the error you get?

Ralf



On Sat, Mar 14, 2015 at 4:42 PM, Ralf Gommers <ralf.gommers <at> gmail.com> wrote:
Hi Vidit, welcome!


On Fri, Mar 13, 2015 at 9:46 PM, Vidit Bhargava <viditvineet <at> gmail.com> wrote:
Hello Everyone,
I know I am late, Sorry for that.
My name is Vidit Bhargava. I am a Computer Science and Engineering student at National Institute of Technology, Karnataka, India. I am proficient in Python and C++. I am interested in the stats improvement project.
Any suggestions on how I should go about it?

The first thing to do is to start contributing, i.e. send a pull request on Github addressing an open issue. This way you get a feeling for how everything works, and it allows us to get to know you. There are some issues labeled "easy-fix" which are good to get started, you can also take any other issue that seems interesting to you.

Regarding the stats project, this is relevant: http://article.gmane.org/gmane.comp.python.scientific.devel/19468?

Cheers,
Ralf

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev



_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev


_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev

Gmane