Discussion:
[Numpy-discussion] Cross-correlation PR stuck in limbo
Honi Sanders
2016-05-03 15:43:18 UTC
Permalink
Hello all,
I have completed a pull request to add a “maxlag” functionality to numpy.correlate. See here: https://github.com/numpy/numpy/pull/5978 <https://github.com/numpy/numpy/pull/5978>. This pull request has passed all tests and has been ready to be merged for around six months. Several people have commented requesting for it to be included on stackoverflow, the listserve, and github. Can someone please let me know what needs to be done or can it be merged?

Here is some background:
What was troubling me is that numpy.correlate does not have a maxlag feature. This means that even if I only want to see correlations between two time series with lags between -100 and +100 ms, for example, it will still calculate the correlation for every lag between -20000 and +20000 ms (which is the length of the time series). This (theoretically) gives a 200x performance hit!

I have introduced this question as a numpy issue <https://github.com/numpy/numpy/issues/5954>, a scipy issue <https://github.com/scipy/scipy/issues/4940> and on the scipy-dev list <http://mail.scipy.org/pipermail/scipy-dev/2015-June/020757.html>. It seems the best place to start is with numpy.correlate, so that is what I am requesting.

Previous discussion of this functionality can be found at another discussion on numpy correlate (and convolution) <http://numpy-discussion.10968.n7.nabble.com/another-discussion-on-numpy-correlate-and-convolution-td32925.html>. Other issues related to correlate functions include ENH: Fold fftconvolve into convolve/correlate functions as a parameter #2651 <https://github.com/scipy/scipy/issues/2651>, Use FFT in np.correlate/convolve? (Trac #1260) #1858 <https://github.com/numpy/numpy/issues/1858>, and normalized cross-correlation (Trac #1714) #2310 <https://github.com/numpy/numpy/issues/2310>.



The new implementation allows new types of the “mode” argument, to include an int value, which defines the maximum lag for which cross-correlation should be calculated, or a tuple, which defines the minlag, maxlag, and lagstep to be used in the same format as the arguments to numpy.arange.


Please let me know what should be done to move this pull request forward.

Honi
Pierre Haessig
2016-05-04 12:07:43 UTC
Permalink
Hi,

I don't know how to push the PR forward, but all I can say is that this
maxlag feature would be a major improvement for using Numpy in time
series analysis! Immediate benefits downstream for Matplotlib and
statsmodel.

Thanks Honi for having taken the time to implement this!

best,
Pierre
Elliot Hallmark
2016-05-27 19:51:22 UTC
Permalink
+1

This would really help with large data sets in certain situations.

Is there still disagreement about whether this should be included? Or are
there some minor details still? Or just lost in the shuffle?

Hopefully,
Elliot
Post by Pierre Haessig
Hi,
I don't know how to push the PR forward, but all I can say is that this
maxlag feature would be a major improvement for using Numpy in time
series analysis! Immediate benefits downstream for Matplotlib and
statsmodel.
Thanks Honi for having taken the time to implement this!
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Jaime Fernández del Río
2016-05-27 23:33:28 UTC
Permalink
I did an overall review of the code a couple of weeks ago (see the PR for
details), and there is quite some work to be done before we can merge
Honi's code. But if he can find the time to work on the coding, I'll try to
be more diligent about the reviewing.

Jaime
Post by Elliot Hallmark
+1
This would really help with large data sets in certain situations.
Is there still disagreement about whether this should be included? Or are
there some minor details still? Or just lost in the shuffle?
Hopefully,
Elliot
Post by Pierre Haessig
Hi,
I don't know how to push the PR forward, but all I can say is that this
maxlag feature would be a major improvement for using Numpy in time
series analysis! Immediate benefits downstream for Matplotlib and
statsmodel.
Thanks Honi for having taken the time to implement this!
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
Loading...