David Montgomery | 1 Jul 01:45 2011
Picon

Time Series using 15 minute intervals using scikits.timeseries

Hi,

Using scikits timeseries I can create daily and hourly time series....no prob

But....

I have time series at 15 minutes intervals...this I dont know how to do...

Can a timeseries array handle 15 min intervals?
Do I use a minute intervals and use mask arrays for the missing minutes?
Also..I can figure out how to create a array at minute intervals.

So..what is best practice?  Any examples?

Thanks

    st = ts.Date('H',
year=ts_start_date.year,month=ts_start_date.month,day=ts_start_date.day,hour=ts_start_hour)
    ed = ts.Date('H',
year=ts_end_date.year,month=ts_end_date.month,day=ts_end_date.day,hour=ts_end_hour)
    st_beg = st.asfreq('H', relation='START')
    ed_end = ed.asfreq('H', relation='END')
John Reid | 1 Jul 09:42 2011
Picon

Re: Fitting procedure to take advantage of cluster


On 29/06/11 18:18, Giovanni Luca Ciampaglia wrote:
> Hi,
> there are several strategies, depending on your problem. You could use a
> surrogate model, like a Gaussian Process, to fit the data (see for
> example Higdon et al
> http://epubs.siam.org/sisc/resource/1/sjoce3/v26/i2/p448_s1?isAuthorized=no).
> I have personally used scikits.learn for GP estimation but there is also
> PyMC that should do the same (never tried it).
>
I can also immodestly recommend my own code for Gaussian processes. It 
is not based on Markov chain Monte Carlo but rather a maximum likelihood 
approach:
http://sysbio.mrc-bsu.cam.ac.uk/group/index.php/Gaussian_processes_in_python
Ning Guo | 1 Jul 09:48 2011
Picon

Question: scipy.stats.gamma.fit

Dear scipy-users,

I'm using scipy.stats.gamma.fit to fit a set of random variables for 
gamma distribution. And to validate the results I also use the fitdistr 
function in R. However the results generated by these two packages are 
different, i.e. shape parameter and scale parameter for the gamma pdf 
are different. Though the difference is not large, I'm wondering what 
causes this difference. I think both of them are using maximum 
likelihood estimation to fit the function.

Best regards!
Ning
Pierre GM | 1 Jul 11:07 2011
Picon

Re: Time Series using 15 minute intervals using scikits.timeseries


On Jul 1, 2011, at 1:45 AM, David Montgomery wrote:

> Hi,
> 
> Using scikits timeseries I can create daily and hourly time series....no prob
> 
> But....
> 
> I have time series at 15 minutes intervals...this I dont know how to do...
> 
> Can a timeseries array handle 15 min intervals?
> Do I use a minute intervals and use mask arrays for the missing minutes?
> Also..I can figure out how to create a array at minute intervals.
> 
> So..what is best practice?  Any examples?

First possibility, you get the latest experimental version of scikits.timeseries on github. There's
support for multiple of frequencies (like 15min).
If you're not comfortable with tinkering with experimental code, you have several solutions, depending
on your problem:
1. You create a minute-freq series and mask 14/15 of the data. Simple but wasteful and problematic if you
have a large series. Still, the easiest solution
2. You create a hour-freq series as a 2D array: each column would correspond to the data for one quarter of
this hour. That's more compact in terms of memory, but you'll have to jump through some extra hoops if you
need to convert the array to another frequency (conversion routines don't really like 2D arrays...)
David Montgomery | 1 Jul 11:22 2011
Picon

Re: Time Series using 15 minute intervals using scikits.timeseries

Awesoke...

for the github version...any docs or an example for creating a 15 min array?

On Fri, Jul 1, 2011 at 7:07 PM, Pierre GM <pgmdevlist <at> gmail.com> wrote:
>
> On Jul 1, 2011, at 1:45 AM, David Montgomery wrote:
>
>> Hi,
>>
>> Using scikits timeseries I can create daily and hourly time series....no prob
>>
>> But....
>>
>> I have time series at 15 minutes intervals...this I dont know how to do...
>>
>> Can a timeseries array handle 15 min intervals?
>> Do I use a minute intervals and use mask arrays for the missing minutes?
>> Also..I can figure out how to create a array at minute intervals.
>>
>> So..what is best practice?  Any examples?
>
> First possibility, you get the latest experimental version of scikits.timeseries on github. There's
support for multiple of frequencies (like 15min).
> If you're not comfortable with tinkering with experimental code, you have several solutions, depending
on your problem:
> 1. You create a minute-freq series and mask 14/15 of the data. Simple but wasteful and problematic if you
have a large series. Still, the easiest solution
> 2. You create a hour-freq series as a 2D array: each column would correspond to the data for one quarter of
this hour. That's more compact in terms of memory, but you'll have to jump through some extra hoops if you
(Continue reading)

Pierre GM | 1 Jul 12:11 2011
Picon

Re: Time Series using 15 minute intervals using scikits.timeseries


On Jul 1, 2011, at 11:22 AM, David Montgomery wrote:

> Awesoke...
> 
> for the github version...any docs or an example for creating a 15 min array?

Use the 'timestep' optional argument in scikits.timeseries.date_array.

BTW, make sure you're using the https://github.com/pierregm/scikits.timeseries-sandbox/
repository (that's the experimental one I was telling you about).
Note that support is *very* limited, as I don't really have time to work on scikits.timeseries these days.
Anyhow, there'll be some major overhaul in the mid future once Mark W. new datetime dtype will be stable.
Jelle Feringa via LinkedIn | 1 Jul 12:23 2011
Picon

Invitation to connect on LinkedIn

LinkedIn

Jelle Feringa requested to add you as a connection on LinkedIn:

Jose,

I'd like to add you to my professional network on LinkedIn.

- Jelle

 
View invitation from Jelle Feringa

 

DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you?
Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image!

 

© 2011, LinkedIn Corporation

_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Jose Gomez-Dans | 1 Jul 12:36 2011
Picon

Weird error in fmin_l_bfgs_b

Hi,
I'm getting an error in scipy.optimize.fmin_l_bfgs_b, apparently related to the fortran wrapper. This is strange, because exactly the same problem works well with the TNC solver. I have a function that returns both a scalar value (that will be minimised) and the derivative of the function at that point. The error in the L-BFG-S solver is
File "/usr/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 181, in fmin_l_bfgs_b
    isave, dsave)
ValueError: failed to initialize intent(inout) array -- input not fortran contiguous


My code looks like this:

# x0 is the starting point, a 1d array
>>> solution, x, info = scipy.optimize.fmin_tnc( cost_function, x0,    args=([operators]),  bounds=bounds )
# Using fmin_tnc works well, solution is what I expect it to be

>> solution, cost, information = scipy.optimize.fmin_l_bfgs_b (  cost_function, solution, bounds=bounds,  args=[ operators ], iprint=101 )
2011-07-01 11:34:24,703 - eoldas.Model - INFO - 46 days, 46 quantised days
tnc: Version 1.3, (c) 2002-2003, Jean-Sebastien Roy (js <at> jeannot.org)
tnc: RCS ID: <at> (#) $Jeannot: tnc.c,v 1.205 2005/01/28 18:27:31 js Exp $
  NIT   NF   F                       GTG
    0    1  1.988301629303336E+02   8.17118991E+06
tnc: fscale = 0.000249879
    1    5  1.338514420154698E+01   1.82689516E+04
tnc: fscale = 0.00528464
    2    9  9.476573219561992E+00   2.21390020E+04
    3   19  6.684083971679802E+00   3.88897225E+03
    4   69  6.274247682836059E+00   2.43671753E+03
tnc: |fn-fn-1] = 4.5037e-13 -> convergence
    5  120  6.274247682835608E+00   2.43671753E+03
tnc: Converged (|f_n-f_(n-1)| ~= 0)
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 1.084D-19
 N =           46     M =           10

 L = -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01
     -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01
     -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01
     -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01
     -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01
     -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01
     -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01
     -2.0000D-01 -2.0000D-01 -2.0000D-01 -2.0000D-01

X0 =  5.6013D-02  1.1717D-01  1.9201D-01  2.7557D-01  3.7013D-01  4.5702D-01
      5.3491D-01  6.0661D-01  6.7624D-01  7.4649D-01  8.0318D-01  8.5203D-01
      8.8633D-01  9.0102D-01  8.9914D-01  8.7521D-01  8.2816D-01  7.6529D-01
      7.0559D-01  6.5371D-01  6.0520D-01  5.5814D-01  5.0991D-01  4.4783D-01
      3.7790D-01  3.0041D-01  2.1894D-01  1.5147D-01  1.0832D-01  8.3926D-02
      6.6473D-02  4.8621D-02  3.2567D-02  2.0086D-02  1.0881D-02  2.4890D-03
      8.8000D-04 -4.2729D-03 -4.6658D-03 -5.5940D-03 -4.1690D-03 -1.2577D-02
     -2.2529D-02 -2.9114D-02 -1.5938D-02  1.9755D-02

 U =  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00
      1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00
      1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00
      1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00
      1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00
      1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00
      1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00
      1.2000D+00  1.2000D+00  1.2000D+00  1.2000D+00

At X0         0 variables are exactly at the bounds
Traceback (most recent call last):
  File "example_identity.py", line 199, in <module>
    main ( sys.argv )
  File "example_identity.py", line 166, in main
    solution, cost, information = scipy.optimize.fmin_l_bfgs_b (  cost_function, solution, bounds=bounds,  args=[ operators ], iprint=101 )
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 181, in fmin_l_bfgs_b
    isave, dsave)
ValueError: failed to initialize intent(inout) array -- input not fortran contiguous


Any clues of where to look for issues?
Thanks!
jose

_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Wes McKinney | 1 Jul 18:52 2011
Picon

Re: Time Series using 15 minute intervals using scikits.timeseries

On Fri, Jul 1, 2011 at 6:11 AM, Pierre GM <pgmdevlist <at> gmail.com> wrote:
>
> On Jul 1, 2011, at 11:22 AM, David Montgomery wrote:
>
>> Awesoke...
>>
>> for the github version...any docs or an example for creating a 15 min array?
>
> Use the 'timestep' optional argument in scikits.timeseries.date_array.
>
> BTW, make sure you're using the https://github.com/pierregm/scikits.timeseries-sandbox/
repository (that's the experimental one I was telling you about).
> Note that support is *very* limited, as I don't really have time to work on scikits.timeseries these days.
Anyhow, there'll be some major overhaul in the mid future once Mark W. new datetime dtype will be stable.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

depending on your data manipulation needs, you could also give pandas
a shot-- generating 15-minute date ranges for example is quite simple:

In [3]: DateRange('7/1/2011', '7/2/2011', offset=datetools.Minute(15))
Out[3]:
<class 'pandas.core.daterange.DateRange'>
offset: <15 Minutes>, tzinfo: None
[2011-07-01 00:00:00, ..., 2011-07-02 00:00:00]
length: 97

The date range can be used to conform a time series you loaded from some source:

ts.reindex(dr, method='pad')

('pad' a.k.a. "ffill" propagates values forward into holes, optional)

I've got some resampling code in the works that would help with e.g.
converting 15-minute data into hourly data or that sort of thing but
it's in less-than-complete form at the moment so like I said depends
on what you need to do. Give me a few weeks on that bit =)

best,
Wes
J. David Lee | 1 Jul 19:11 2011
Picon

Re: Fitting procedure to take advantage of cluster

On 06/29/2011 11:54 AM, J. David Lee wrote:
> Hello,
>
> I'm attempting to perform a fit of a model function's output to some
> measured data. The model has around 12 parameters, and takes tens of
> minutes to run. I have access to a cluster with several thousand
> processors that can run the simulations in parallel, so I'm wondering if
> there are any algorithms out there that I can use to leverage this
> computing power to efficiently solve my problem - that is, besides grid
> searches or Monte-Carlo methods.
>
> Thanks for your help,
>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>    
I want to thank everyone for their suggestions. I've read through most 
of the links presented, and am getting a clearer idea of what I need to do.

Here's a quick clarification of my problem for those who are interested:

I'm running a single-processor plasma simulation modeling an experiment. 
It has tens or hundreds of parameters, but most are constrained by 
measurements. For my purposes, the output consists of several x-ray 
spectra which I am trying to match against measured spectra. I have 
about 12 or 14 parameters in all that I am changing in order to match 
the spectra. Each run of the simulation takes a few to a few tens of 
minutes. I have the ability to run the compiled code on a number of 
machines, but I can't easily run python scripts on the machines.

After some thinking, I'm considering the feasibility of parallelizing 
the routines in scipy's optimize module. My initial thought is to allow 
the user to specify a function that would run the objective function on 
multiple inputs. This would be useful, for example, when performing a 
simplex shrink, or in numerical gradient / hessian calculations with 
multiple variables.

 From my point of view, this would allow me to use a hybrid 
Monte-Carlo/minimization procedure to look for a global minimum.

I'm interested to hear other's opinions on the matter.

Thanks again,

David

Gmane