[Numpy-discussion] Numpy Generalized Ufuncs: Pointer Arithmetic and Segmentation Faults (Debugging?)

Discussion:

e***@artorg.unibe.ch

2015-10-25 12:06:46 UTC

Dear Numpy maintainers and developers,

Thanks for providing such a great numerical library!

Iâm currently trying to implement the Dynamic Time Warping metric as a set of generalised numpy ufuncs, but unfortunately, I have lasting issues with pointer arithmetic and segmentation faults. Is there any way that I can
use GDB or some such to debug a python/numpy extension? Furthermore: is it necessary to use pointer arithmetic to access the function arguments (as seen on http://docs.scipy.org/doc/numpy/user/c-info.ufunc-tutorial.html)
or is element access (operator[]) also permissible?

To break it down quickly, I need to have a fast DTW distance function dist_dtw() with two vector inputs (broadcasting should be possible), two scalar parameters and one scalar output (signature: (i), (j), (), () -> ()) usable in python for a 1-Nearest Neighbor classification algorithm. The extension also implements two functions compute_envelope() and piecewise_mean_reduction() which are used for lower-bounding based on Keogh and Ratanamahatana, 2005. The source code is available at http://pastebin.com/MunNaP7V and the prominent segmentation fault happens somewhere in the chain dist_dtw() â> meta_dtw_dist() â> slow_dtw_dist(), but I fail to pin it down.

Aside from my primary questions, I wonder how to approach errors/exceptions and unit testing when developing numpy ufuncs. Are there any examples apart from the numpy manual that I could use as reference implementations of generalised numpy ufuncs?

I would greatly appreciate some insight into properly developing generalised ufuncs.

Best,
Eleanore

Jaime Fernández del Río

2015-10-25 14:13:02 UTC

Permalink

HI Eleanore,

Thanks for the kind words, you are very welcome!

As for your issues, I think they are coming from the handling of the
strides you are doing in the slow_dtw_dist function. The strides are the
number of bytes you have to advance your pointer to get to the next item.
In your code, you end up doing something akin to:

dtype *v_i = v0;
...
for (...) {
...
v_i += stride_v;
}

This, rather than increase the v_i pointer by stride_v bytes, increases it
by stride_v * sizeof(dtype), and with the npy_double you seem to be using
as dtype, sends you out of your allocated memory at a rate 8x too fast.

What you increase by stride_v has to be of char* type, so one simple
solution would be to do something like:

char *v_ptr = (char *)v0;
...
for (...) {
dtype v_val = *(dtype *)v_ptr;
...
v_ptr += stride_v;
}

And use v_val directly wherever you were dereferencing v_i before.

Jaime

Post by e***@artorg.unibe.ch
Dear Numpy maintainers and developers,
Thanks for providing such a great numerical library!
Iâm currently trying to implement the Dynamic Time Warping metric as a set
of generalised numpy ufuncs, but unfortunately, I have lasting issues with
pointer arithmetic and segmentation faults. Is there any way that I can
use GDB or some such to debug a python/numpy extension? Furthermore: is it
necessary to use pointer arithmetic to access the function arguments (as
seen on http://docs.scipy.org/doc/numpy/user/c-info.ufunc-tutorial.html)
or is element access (operator[]) also permissible?
To break it down quickly, I need to have a fast DTW distance function
dist_dtw() with two vector inputs (broadcasting should be possible), two
scalar parameters and one scalar output (signature: (i), (j), (), () -> ())
usable in python for a 1-Nearest Neighbor classification algorithm. The
extension also implements two functions compute_envelope() and
piecewise_mean_reduction() which are used for lower-bounding based on Keogh
and Ratanamahatana, 2005. The source code is available at
http://pastebin.com/MunNaP7V and the prominent segmentation fault happens
somewhere in the chain dist_dtw() â> meta_dtw_dist() â> slow_dtw_dist(),
but I fail to pin it down.
Aside from my primary questions, I wonder how to approach
errors/exceptions and unit testing when developing numpy ufuncs. Are there
any examples apart from the numpy manual that I could use as reference
implementations of generalised numpy ufuncs?
I would greatly appreciate some insight into properly developing generalised ufuncs.
Best,
Eleanore
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayÃºdale en sus planes
de dominaciÃ³n mundial.

Travis Oliphant

2015-10-26 05:04:28 UTC

Permalink

Two things that might help you create generalized ufuncs:

1) Look at Numba --- it makes it very easy to write generalized ufuncs in
simple Python code. Numba will compile to machine code so it can be as
fast as writing in C. Here is the documentation for that specific
feature:
http://numba.pydata.org/numba-doc/0.21.0/user/vectorize.html#the-guvectorize-decorator.
One wart of the interface is that scalars need to be treated as
1-element 1-d arrays (but still use '()' in the signature).

2) Look at the linear algebra module in NumPy which now wraps a bunch of
linear-algebra based generalized ufuncs (all written in C):
https://github.com/numpy/numpy/blob/master/numpy/linalg/umath_linalg.c.src

-Travis

--
*Travis Oliphant*
*Co-founder and CEO*

@teoliphant
512-222-5440
http://www.continuum.io

e***@artorg.unibe.ch

2015-10-26 16:25:05 UTC

Permalink

Dear Jaime, dear Travis

thanks for pointing out my stride errors. This just gets me every time. After trying out Travis’ suggestion to work with numba, I feel that this works best for me. Functions are easier to generalise to different data types and I can make use of my existing Python development environment that way.

Thanks again for your rapid and helpful support!

Best,
Eleanore