Yury V. Zaytsev | 1 Jan 2011 01:39
Favicon

Re: Sometimes fmin_l_bfgs_b tests NaN parameters and then fails to converge

On Fri, 2010-12-31 at 16:35 -0500, josef.pktd <at> gmail.com wrote:

> But your function has a discontinuity, and I wouldn't expect a bfgs
> method to produce anything useful since the method assumes smoothness,
>  as far as I know.

You are perfectly right about the discontinuity, but that was not the
point. I was rather interested if anyone else is seeing the optimizer
trying out NaNs as function parameters as in my case or not...

I have this problem with a completely different (smooth and
differentiable) function, the test script is just something I came up
with without thinking too much to illustrate the problem.

Happy New Year!

--

-- 
Sincerely yours,
Yury V. Zaytsev
Zufry Malik Ibrahim | 1 Jan 2011 12:12
Picon

How to get first-order optimality from scipy.optimize.leastsq module

I curious about how to get the "first-order optimality" when using scipy.optimize.leastsq module.

I don't have problem to get min using scipy.optimize.leastsq but I get confused when I want get first-order optimality...

This is sample matlab script to get first-order optimality

    [x,resnorm,residual,exitflag,output,lambda]= lsqcurvefit(func,x0,xdata,tdata);
    foo = output.firstorderopt %get first-order optimality value

this is some of foo reference from mathworks here

Thanks for your attention, Happy New Year 2011 :)


--
Zufry Malik Ibrahim
Physics Department
Bandung Institute of Technology , Indonesia

_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Skipper Seabold | 1 Jan 2011 17:38
Picon
Gravatar

Re: Sometimes fmin_l_bfgs_b tests NaN parameters and then fails to converge

On Fri, Dec 31, 2010 at 7:39 PM, Yury V. Zaytsev <yury <at> shurup.com> wrote:
> On Fri, 2010-12-31 at 16:35 -0500, josef.pktd <at> gmail.com wrote:
>
>> But your function has a discontinuity, and I wouldn't expect a bfgs
>> method to produce anything useful since the method assumes smoothness,
>>  as far as I know.
>
> You are perfectly right about the discontinuity, but that was not the
> point. I was rather interested if anyone else is seeing the optimizer
> trying out NaNs as function parameters as in my case or not...
>
> I have this problem with a completely different (smooth and
> differentiable) function, the test script is just something I came up
> with without thinking too much to illustrate the problem.
>

I don't see the NaNs (on 64-bit).  But I have run into perhaps a
similar problem recently.  I switch from fmin_l_bfgs_b to fmin_tnc and
was able to fine tune the step size in the line search (eta in tnc).
Looking only very briefly it looks like the step size for bfgs is
adaptive and determined in the code, but I don't see how to sensibly
change it.  For tnc, I start with eta = 1e-8 and when I get return
code 4, I rerun with eta *= 10.  The return codes for tnc are in

>>> from scipy.optimize.tnc import RCSTRINGS
>>> RCSTRINGS
{-1: 'Infeasible (low > up)',
 0: 'Local minima reach (|pg| ~= 0)',
 1: 'Converged (|f_n-f_(n-1)| ~= 0)',
 2: 'Converged (|x_n-x_(n-1)| ~= 0)',
 3: 'Max. number of function evaluations reach',
 4: 'Linear search failed',
 5: 'All lower bounds are equal to the upper bounds',
 6: 'Unable to progress',
 7: 'User requested end of minimization'}

Curious if this approach might work for you.

Skipper
Matwey V. Kornilov | 2 Jan 2011 16:29
Picon

numpy I/O question


Hi,

I need help with NumPy I/O. I have specific array format in my input text 
data. Due to bug in data-producing software the negative values are 
concatenated to the previous values. (i.e. "1.0-3.4 3.1"). It was not 
critical for me because oddly enough this code in C++ parses it for me:

#include <iostream>

int main(){
	double a;
	double b;
	
	std::cin >> a >> b;
	std::cout << "a=" << a << " b=" << b << std::endl;

	return 0;
}

It is because operator>>(double) stops before "-" char and the second 
operator>>(double) runs from this position.

The question is how to get the same behaviour for NumPy I/O?
Zachary Pincus | 2 Jan 2011 16:41
Picon
Favicon

Re: numpy I/O question

>
> I need help with NumPy I/O. I have specific array format in my input  
> text
> data. Due to bug in data-producing software the negative values are
> concatenated to the previous values. (i.e. "1.0-3.4 3.1").

Can you just run the text files through sed or something and replace  
"-" with " -"?
Matwey V. Kornilov | 2 Jan 2011 16:51
Picon

Re: numpy I/O question


The input format is out my responsibility. I already have written C++ tool 
that parses the data good enough.

I will be asked 'why should we use python which even can't parse as good as 
c++ does?' `sed` isn't a solution.

Zachary Pincus wrote:

>>
>> I need help with NumPy I/O. I have specific array format in my input
>> text
>> data. Due to bug in data-producing software the negative values are
>> concatenated to the previous values. (i.e. "1.0-3.4 3.1").
> 
> Can you just run the text files through sed or something and replace
> "-" with " -"?
Yury V. Zaytsev | 2 Jan 2011 16:59
Favicon

Re: numpy I/O question

On Sun, 2011-01-02 at 18:51 +0300, Matwey V. Kornilov wrote:
> 
> I will be asked 'why should we use python which even can't parse as good as 
> c++ does?' `sed` isn't a solution.

How big are these files in question? 

Why can't you just load them in memory and do the replacement before
feeding them into NumPy if you don't want to pre-process files
beforehand? This is just 2-3 lines of code.

--

-- 
Sincerely yours,
Yury V. Zaytsev
Matwey V. Kornilov | 2 Jan 2011 17:09
Picon

Re: numpy I/O question


These files are pipe-streams but when they are dumped they are about 50M.

Replacement that you described requires O(N) (where N is line length) but 
C++ operator>> requires O(1) for the same parsing.

I hoped there were a way to split data for numpy by regexp instead of 
delimiter.

i.e.

np.genfromtxt(StringIO(data), regexp=r"-?[\d\.]+")

instead of

np.genfromtxt(StringIO(data), delimiter=None)

Yury V. Zaytsev wrote:

> On Sun, 2011-01-02 at 18:51 +0300, Matwey V. Kornilov wrote:
>> 
>> I will be asked 'why should we use python which even can't parse as good
>> as c++ does?' `sed` isn't a solution.
> 
> How big are these files in question?
> 
> Why can't you just load them in memory and do the replacement before
> feeding them into NumPy if you don't want to pre-process files
> beforehand? This is just 2-3 lines of code.
>  
Zachary Pincus | 2 Jan 2011 17:21
Picon
Favicon

Re: numpy I/O question

> These files are pipe-streams but when they are dumped they are about  
> 50M.
>
> Replacement that you described requires O(N) (where N is line  
> length) but
> C++ operator>> requires O(1) for the same parsing.

Reading the file into an array is still an O(N) operation, so if all  
you you care about is big-O complexity, there's no difference between  
doing an O(N) search-and-replace followed by an O(N) load operation  
versus an O(1) parsing followed by an O(N) load operation. O(2N) =  
O(N), right?

But if you care about constant factors, why are you even proposing  
regexp matching?

Have you even tried writing up the simple case search-and-replace to  
determine whether it's too slow?

If you actually need to optimize the file reading (unlikely), perhaps  
the fastest option will be to use the subprocess module to open a  
pipeline to sed and then feed the stdout of that to numpy.loadtxt --  
sed is well-optimized to have low constant factors.

Indeed, these days disks are such a bottleneck that it can be faster  
to read a gzipped file from disk and decompress it on the fly and  
parse the contents than just to read the plain file from disk. But as  
you say the input format is out of your hands. (And again, if speed  
matters so much, why are the files ASCII text and not binary? But if  
speed doesn't matter, why the concern about asymptotic complexity?)

Anyway, if for religious reasons sed is unacceptable, another decent  
option if the files are too large for memory (which 50M is  
emphatically not) would be to open the text file in chunks, do the  
search-and-replace, and then cough up those chunks within an iterator  
that acts as a file-like-object.

> I will be asked 'why should we use python which even can't parse as  
> good as
> c++ does?' `sed` isn't a solution.

This sounds like a personal problem. Sed is a perfectly decent  
solution for reformatting broken text files, as is reformatting the  
files internally to python before passing them to a numpy routine  
designed to be flexible and fast at handling *delimited* text.

The fact that C++ has a particular feature that happens to work well  
with your buggy input files doesn't mean that "python can't parse as  
well as c++" -- but hey, if you think c++ is in general a better tool  
than python or sed or perl or whatever for processing text files, go  
for it.
Matwey V. Kornilov | 2 Jan 2011 17:38
Picon

Re: numpy I/O question

Zachary Pincus wrote:

>> Replacement that you described requires O(N) (where N is line
>> length) but
>> C++ operator>> requires O(1) for the same parsing.
> 
> Reading the file into an array is still an O(N) operation, so if all
> you you care about is big-O complexity, there's no difference between
> doing an O(N) search-and-replace followed by an O(N) load operation
> versus an O(1) parsing followed by an O(N) load operation. O(2N) =
> O(N), right?

Yes, You are right. I mixed char-reading and char-inserting operations.

> But if you care about constant factors, why are you even proposing
> regexp matching?

It was 'typing-before-thinking'

>> I will be asked 'why should we use python which even can't parse as
>> good as
>> c++ does?' `sed` isn't a solution.
> 
> This sounds like a personal problem. Sed is a perfectly decent
> solution for reformatting broken text files, as is reformatting the
> files internally to python before passing them to a numpy routine
> designed to be flexible and fast at handling *delimited* text.

It sounds quite reasonable, thank you.

Gmane