Hi all --

(Apologies if this is not the place to ask questions like this -- please let me know if this is the case!)

I'm a grad student in astrophysics, and in working on some machinery for my own research I've hit some stumbling blocks that I'm hoping someone might be able to help with.

Some background about me: I've been programming in Python for about ~8 years or so and am very comfortable with numerical methods. At one point was also fairly comfortable in C but now it has been a while... I've tried my best to learn Cython from the documentation, but it is somewhat lacking in details about more complicated use cases.

The core of the current project I'm working on relies on numerically integrating 1000's of orbits of test particles for 1000's of timesteps, buried within a likelihood function -- so I need this to be fast so I can do some optimization or sampling with this likelihood. The bulk of the time is therefore spent calling some function that computes the acceleration due to a gravitational potential at a given position. I also need it to be easy to swap out the Gravitational field (potential) in which these orbits are integrated.

The code has the basic structure:

- loop over optimization (e.g., MCMC)

- loop over number of particles (~1000's)

- loop over number of timesteps (~1000's)

Let me describe the way I'm doing this now. I have Python classes to represent the Gravitational potentials that implement some convenient high-level things and support array operations, which work by creating and storing a respective Cython (cdef'd) class that contains functions to evaluate the quantities at a single position using memory-views. The reason I've written it this way is because 1) I wanted to minimize the number of array allocations that happen, so my solution is to just pass the entire array of positions with an index to specify which row to take, and 2) I've thought about using openMP or some parallelization, and the only part of the loops I can parallelize is if I integrate each orbit separately.

The classes look like this:

class PointMassPotential(CPotential, CartesianPotential):

""" Gravitational potential for a point mass. """

def __init__(self, m):

# ... create the C-instance and store

self.c_instance = _PointMassPotential(m)

# etc.

cdef class _PointMassPotential(_CPotential):

# here need to cdef all the attributes

cdef public double G, GM

cdef public double m, c

def __init__(self, double m):

# gravitational constant in crazy units

self.G = 4.499753324353495e-12

# potential parameters

self.m = m

self.GM = self.G * self.m

cdef public inline void _acceleration(self, double[:,::1] r,

double[:,::1] grad, int k) nogil:

cdef double R, fac

R = sqrt(r[k,0]*r[k,0] + r[k,1]*r[k,1] + r[k,2]*r[k,2])

fac = self.GM / (R*R*R)

grad[k,0] += -fac*r[k,0]

grad[k,1] += -fac*r[k,1]

grad[k,2] += -fac*r[k,2]

To do the integration, I just use my own Leapfrog integration, the meat of which looks like this:

cdef inline void leapfrog_step(double[:,::1] r, double[:,::1] v,

double[:,::1] v_12, double[:,::1] acc,

int k, double dt, _CPotential potential) nogil:

# increment position by full-step

r[k,0] = r[k,0] + dt*v_12[k,0]

r[k,1] = r[k,1] + dt*v_12[k,1]

r[k,2] = r[k,2] + dt*v_12[k,2]

# zero out the acceleration container

acc[k,0] = 0.

acc[k,1] = 0.

acc[k,2] = 0.

potential._acceleration(r, acc, k)

# increment synced velocity by full-step

v[k,0] = v[k,0] + dt*acc[k,0]

v[k,1] = v[k,1] + dt*acc[k,1]

v[k,2] = v[k,2] + dt*acc[k,2]

# increment leapfrog velocity by full-step

v_12[k,0] = v_12[k,0] + dt*acc[k,0]

v_12[k,1] = v_12[k,1] + dt*acc[k,1]

v_12[k,2] = v_12[k,2] + dt*acc[k,2]

And finally, the likelihood function is just loops:

cpdef likelihood_func(...):

# define stuff...

with nogil:

for k in range(nparticles):

for i in range(nsteps):

leapfrog_step(x, v, v_12, acc, k, dt, potential)

# compute other stuff using integrated positions ...

My question is whether this is a sensible thing to do -- using memoryviews this way -- or whether I'm missing some obvious optimization points? I'm finding that this is only a factor of ~10s faster than a pure-Python implementation, and I'm a little surprised that this isn't many orders of magnitude faster. But also maybe my intuition for this is just way off... Happy to provide more detail if anyone feels up to helping out.

Thanks!

- Adrian

You received this message because you are subscribed to the Google Groups "cython-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to

.

.