Discussion:
[Numpy-discussion] Weighted percentile / quantile
Alex Rogozhnikov
2016-03-01 23:03:45 UTC
Permalink
Hi,
I know the topic was already raised a long ago:
https://mail.scipy.org/pipermail/numpy-discussion/2010-July/051851.html

There are also several questions on SO:
http://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-weighted-median
http://stackoverflow.com/questions/13546146/percentile-calculation-with-weighted-data
http://stackoverflow.com/questions/26102867/python-weighted-median-algorithm-with-pandas

The only working solution with numpy:
http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy
uses sorting.

Are there better options at the moment (numpy/scipy/pandas)?

Cheers,
Alex.
Joseph Fox-Rabinovitz
2016-03-02 03:27:05 UTC
Permalink
Alex,

At the moment, there does not appear to be anything in numpy. However,
I am working (slowly) on upgrading the C code for partitioning with
arbitrary arrays of real weights. That will get `partition`, `median`,
`percentile` to work with weights, as well as enabling weights for the
automated bin estimators of `histogram`. `mean` already has an
implementation of weights via `average`.

You may be interested in my original post to the mailing list here:
https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075000.html.
Josef P. mentioned in one of his responses that statsmodels has a
weighted quantile computation available as of PR 2707:
https://github.com/statsmodels/statsmodels/pull/2707. That should
effectively serve your purpose.

-Joe


On Tue, Mar 1, 2016 at 6:03 PM, Alex Rogozhnikov
Post by Alex Rogozhnikov
Hi,
https://mail.scipy.org/pipermail/numpy-discussion/2010-July/051851.html
http://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-weighted-median
http://stackoverflow.com/questions/13546146/percentile-calculation-with-weighted-data
http://stackoverflow.com/questions/26102867/python-weighted-median-algorithm-with-pandas
http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy
uses sorting.
Are there better options at the moment (numpy/scipy/pandas)?
Cheers,
Alex.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Alex Rogozhnikov
2016-03-02 12:20:36 UTC
Permalink
Hi, Joe,
Post by Joseph Fox-Rabinovitz
I am working (slowly) on upgrading the C code for partitioning with
arbitrary arrays of real weights
really good to know there is some work in this direction.
Post by Joseph Fox-Rabinovitz
Alex,
At the moment, there does not appear to be anything in numpy. However,
I am working (slowly) on upgrading the C code for partitioning with
arbitrary arrays of real weights. That will get `partition`, `median`,
`percentile` to work with weights, as well as enabling weights for the
automated bin estimators of `histogram`. `mean` already has an
implementation of weights via `average`.
https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075000.html.
Josef P. mentioned in one of his responses that statsmodels has a
https://github.com/statsmodels/statsmodels/pull/2707. That should
effectively serve your purpose.
It’s the same sort+cumsum approach, and even worse because relies on aggregating.
Thanks for letting know, but I’ll definitely prefer implementation from SO (till numpy will support weights).

Cheers,
Alex
Post by Joseph Fox-Rabinovitz
-Joe
On Tue, Mar 1, 2016 at 6:03 PM, Alex Rogozhnikov
Post by Alex Rogozhnikov
Hi,
https://mail.scipy.org/pipermail/numpy-discussion/2010-July/051851.html
http://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-weighted-median
http://stackoverflow.com/questions/13546146/percentile-calculation-with-weighted-data
http://stackoverflow.com/questions/26102867/python-weighted-median-algorithm-with-pandas
http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy
uses sorting.
Are there better options at the moment (numpy/scipy/pandas)?
Cheers,
Alex.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...