Discussion:
[Numpy-discussion] Make np.bincount output same dtype as weights
Jaime Fernández del Río
2016-03-26 20:16:13 UTC
Permalink
Hi all,

I have just submitted a PR (#7464 <https://github.com/numpy/numpy/pull/7464>)
that fixes an enhancement request (#6854
<https://github.com/numpy/numpy/issues/6854>), making np.bincount return an
array of the same type as the weights parameter. This is an important
deviation from current behavior, which always casts weights to double, and
always returns a double array, so I would like to hear what others think
about the worthiness of this. Main discussion points:

- np.bincount now works with complex weights (yay!), I guess this should
be a pretty uncontroversial enhancement.
- The return is of the same type as weights, which means that small
integers are very likely to overflow. This is exactly what #6854
requested, but perhaps we should promote the output for integers to a
long, as we do in np.sum?
- Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
this what one would want? If we decide that integer promotion is the way to
go, perhaps booleans should go in the same pack?
- This new implementation currently supports all of the reasonable
native types, but has no fallback for user defined types. I guess we
should attempt to cast the array to double as before if no native loop can
be found? It would be good to have a way of testing this though, any
thoughts on how to go about this?
- Does a behavior change like this require some deprecation period? What
would that look like?
- I have also added broadcasting of weights to the full size of list, so
that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having
to tile the single weight to the size of the bins list.

Any other thoughts are very welcome as well!

Jaime
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
Juan Nunez-Iglesias
2016-03-26 22:10:00 UTC
Permalink
Just to clarify, this will only affect weighted bincounts, right? I can't tell you in how many places my code depends on the return type being integer!!!
Post by Jaime Fernández del Río
Hi all,
np.bincountnow works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement.
The return is of the same type asweights, which means that small integers are very likely to overflow.This is exactly what #6854 requested, but perhaps we should promote the output for integers to along, as we do innp.sum?
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack?
This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types.I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this?
Does a behavior change like this require some deprecation period? What would that look like?
I have also added broadcasting of weights to the full size of list, so that one can do e.g.np.bincount([1, 2, 3], weights=2j)without having to tile the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
Jaime
--
(\__/)
( O.o)
(><) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial._______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Jaime Fernández del Río
2016-03-26 22:21:46 UTC
Permalink
Post by Juan Nunez-Iglesias
Just to clarify, this will only affect weighted bincounts, right? I can't
tell you in how many places my code depends on the return type being
integer!!!
Indeed! Unweighted bincounts still return, as all counting operations, a
np.intp array. Sorry for the noise!

Jaime
Post by Juan Nunez-Iglesias
On 27 Mar 2016, 7:16 AM +1100, Jaime Fernández del Río <
Hi all,
I have just submitted a PR (#7464
<https://github.com/numpy/numpy/pull/7464>) that fixes an enhancement
request (#6854 <https://github.com/numpy/numpy/issues/6854>), making
np.bincount return an array of the same type as the weights parameter.
This is an important deviation from current behavior, which always casts
weights to double, and always returns a double array, so I would like to
hear what others think about the worthiness of this. Main discussion
- np.bincount now works with complex weights (yay!), I guess this
should be a pretty uncontroversial enhancement.
- The return is of the same type as weights, which means that small
integers are very likely to overflow. This is exactly what #6854
requested, but perhaps we should promote the output for integers to a
long, as we do in np.sum?
- Boolean arrays stay boolean, and OR, rather than sum, the weights.
Is this what one would want? If we decide that integer promotion is the way
to go, perhaps booleans should go in the same pack?
- This new implementation currently supports all of the reasonable
native types, but has no fallback for user defined types. I guess we
should attempt to cast the array to double as before if no native loop can
be found? It would be good to have a way of testing this though, any
thoughts on how to go about this?
- Does a behavior change like this require some deprecation period?
What would that look like?
- I have also added broadcasting of weights to the full size of list,
so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without
having to tile the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
Jaime
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
Joseph Fox-Rabinovitz
2016-03-27 01:54:21 UTC
Permalink
Would it make sense to just make the output type large enough to hold the cumulative sum of the weights?
- Joseph Fox-Rabinovitz


------ Original message------From: Jaime Fernández del RíoDate: Sat, Mar 26, 2016 16:16To: Discussion of Numerical Python;Subject:[Numpy-discussion] Make np.bincount output same dtype as weightsHi all,
I have just submitted a PR (#7464) that fixes an enhancement request (#6854), making np.bincount return an array of the same type as the weights parameter.  This is an important deviation from current behavior, which always casts weights to double, and always returns a double array, so I would like to hear what others think about the worthiness of this.  Main discussion points:np.bincount now works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement.The return is of the same type as weights, which means that small integers are very likely to overflow.  This is exactly what #6854 requested, but perhaps we should promote the output for integers to a long, as we do in np.sum?Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack?This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types.  I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this?Does a behavior change like this require some deprecation period? What would that look like?I have also added broadcasting of weights to the full size of list, so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
Jaime
--
(__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
j***@gmail.com
2016-03-27 02:58:00 UTC
Permalink
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
Post by Joseph Fox-Rabinovitz
Would it make sense to just make the output type large enough to hold the
cumulative sum of the weights?
- Joseph Fox-Rabinovitz
------ Original message------
From: Jaime Fernández del Río
Date: Sat, Mar 26, 2016 16:16
To: Discussion of Numerical Python;
Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
Hi all,
I have just submitted a PR (#7464) that fixes an enhancement request
(#6854), making np.bincount return an array of the same type as the weights
parameter. This is an important deviation from current behavior, which
always casts weights to double, and always returns a double array, so I
would like to hear what others think about the worthiness of this. Main
np.bincount now works with complex weights (yay!), I guess this should be a
pretty uncontroversial enhancement.
The return is of the same type as weights, which means that small integers
are very likely to overflow. This is exactly what #6854 requested, but
perhaps we should promote the output for integers to a long, as we do in
np.sum?
I always thought of bincount with weights just as a group-by sum. So
it would be easier to remember and have fewer surprises if it matches
the behavior of np.sum.
Post by Joseph Fox-Rabinovitz
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this
what one would want? If we decide that integer promotion is the way to go,
perhaps booleans should go in the same pack?
Isn't this calculating the sum, i.e. count of True by group, already?
Based on a quick example with numpy 1.9.2, I don't think I ever used
bool weights before.
Post by Joseph Fox-Rabinovitz
This new implementation currently supports all of the reasonable native
types, but has no fallback for user defined types. I guess we should
attempt to cast the array to double as before if no native loop can be
found? It would be good to have a way of testing this though, any thoughts
on how to go about this?
Does a behavior change like this require some deprecation period? What would
that look like?
I have also added broadcasting of weights to the full size of list, so that
one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile
the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
(2-D weights ?)


Josef
Post by Joseph Fox-Rabinovitz
Jaime
--
(__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de
dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Juan Nunez-Iglesias
2016-03-27 04:12:44 UTC
Permalink
Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect
np.bincount to behave like np.sum with regards to promoting weights dtypes.
Including bool.
Post by j***@gmail.com
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
Post by Joseph Fox-Rabinovitz
Would it make sense to just make the output type large enough to hold the
cumulative sum of the weights?
- Joseph Fox-Rabinovitz
------ Original message------
From: Jaime Fernández del Río
Date: Sat, Mar 26, 2016 16:16
To: Discussion of Numerical Python;
Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
Hi all,
I have just submitted a PR (#7464) that fixes an enhancement request
(#6854), making np.bincount return an array of the same type as the
weights
Post by Joseph Fox-Rabinovitz
parameter. This is an important deviation from current behavior, which
always casts weights to double, and always returns a double array, so I
would like to hear what others think about the worthiness of this. Main
np.bincount now works with complex weights (yay!), I guess this should
be a
Post by Joseph Fox-Rabinovitz
pretty uncontroversial enhancement.
The return is of the same type as weights, which means that small
integers
Post by Joseph Fox-Rabinovitz
are very likely to overflow. This is exactly what #6854 requested, but
perhaps we should promote the output for integers to a long, as we do in
np.sum?
I always thought of bincount with weights just as a group-by sum. So
it would be easier to remember and have fewer surprises if it matches
the behavior of np.sum.
Post by Joseph Fox-Rabinovitz
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
this
Post by Joseph Fox-Rabinovitz
what one would want? If we decide that integer promotion is the way to
go,
Post by Joseph Fox-Rabinovitz
perhaps booleans should go in the same pack?
Isn't this calculating the sum, i.e. count of True by group, already?
Based on a quick example with numpy 1.9.2, I don't think I ever used
bool weights before.
Post by Joseph Fox-Rabinovitz
This new implementation currently supports all of the reasonable native
types, but has no fallback for user defined types. I guess we should
attempt to cast the array to double as before if no native loop can be
found? It would be good to have a way of testing this though, any
thoughts
Post by Joseph Fox-Rabinovitz
on how to go about this?
Does a behavior change like this require some deprecation period? What
would
Post by Joseph Fox-Rabinovitz
that look like?
I have also added broadcasting of weights to the full size of list, so
that
Post by Joseph Fox-Rabinovitz
one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile
the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
(2-D weights ?)
Josef
Post by Joseph Fox-Rabinovitz
Jaime
--
(__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de
Post by Joseph Fox-Rabinovitz
dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
CJ Carey
2016-03-28 19:55:14 UTC
Permalink
Another +1 for Josef's interpretation from me. Consistency with np.sum
seems like the best option.
Post by Juan Nunez-Iglesias
Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect
np.bincount to behave like np.sum with regards to promoting weights dtypes.
Including bool.
Post by j***@gmail.com
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
Post by Joseph Fox-Rabinovitz
Would it make sense to just make the output type large enough to hold
the
Post by Joseph Fox-Rabinovitz
cumulative sum of the weights?
- Joseph Fox-Rabinovitz
------ Original message------
From: Jaime Fernández del Río
Date: Sat, Mar 26, 2016 16:16
To: Discussion of Numerical Python;
Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
Hi all,
I have just submitted a PR (#7464) that fixes an enhancement request
(#6854), making np.bincount return an array of the same type as the
weights
Post by Joseph Fox-Rabinovitz
parameter. This is an important deviation from current behavior, which
always casts weights to double, and always returns a double array, so I
would like to hear what others think about the worthiness of this. Main
np.bincount now works with complex weights (yay!), I guess this should
be a
Post by Joseph Fox-Rabinovitz
pretty uncontroversial enhancement.
The return is of the same type as weights, which means that small
integers
Post by Joseph Fox-Rabinovitz
are very likely to overflow. This is exactly what #6854 requested, but
perhaps we should promote the output for integers to a long, as we do in
np.sum?
I always thought of bincount with weights just as a group-by sum. So
it would be easier to remember and have fewer surprises if it matches
the behavior of np.sum.
Post by Joseph Fox-Rabinovitz
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
this
Post by Joseph Fox-Rabinovitz
what one would want? If we decide that integer promotion is the way to
go,
Post by Joseph Fox-Rabinovitz
perhaps booleans should go in the same pack?
Isn't this calculating the sum, i.e. count of True by group, already?
Based on a quick example with numpy 1.9.2, I don't think I ever used
bool weights before.
Post by Joseph Fox-Rabinovitz
This new implementation currently supports all of the reasonable native
types, but has no fallback for user defined types. I guess we should
attempt to cast the array to double as before if no native loop can be
found? It would be good to have a way of testing this though, any
thoughts
Post by Joseph Fox-Rabinovitz
on how to go about this?
Does a behavior change like this require some deprecation period? What
would
Post by Joseph Fox-Rabinovitz
that look like?
I have also added broadcasting of weights to the full size of list, so
that
Post by Joseph Fox-Rabinovitz
one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to
tile
Post by Joseph Fox-Rabinovitz
the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
(2-D weights ?)
Josef
Post by Joseph Fox-Rabinovitz
Jaime
--
(__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de
Post by Joseph Fox-Rabinovitz
dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Jaime Fernández del Río
2016-03-28 22:04:52 UTC
Permalink
Have modified the PR to do the "promote integers to at least long" we do in
np.sum.

Jaime
Post by CJ Carey
Another +1 for Josef's interpretation from me. Consistency with np.sum
seems like the best option.
Post by Juan Nunez-Iglesias
Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect
np.bincount to behave like np.sum with regards to promoting weights dtypes.
Including bool.
Post by j***@gmail.com
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
Post by Joseph Fox-Rabinovitz
Would it make sense to just make the output type large enough to hold
the
Post by Joseph Fox-Rabinovitz
cumulative sum of the weights?
- Joseph Fox-Rabinovitz
------ Original message------
From: Jaime Fernández del Río
Date: Sat, Mar 26, 2016 16:16
To: Discussion of Numerical Python;
Subject:[Numpy-discussion] Make np.bincount output same dtype as
weights
Post by Joseph Fox-Rabinovitz
Hi all,
I have just submitted a PR (#7464) that fixes an enhancement request
(#6854), making np.bincount return an array of the same type as the
weights
Post by Joseph Fox-Rabinovitz
parameter. This is an important deviation from current behavior, which
always casts weights to double, and always returns a double array, so I
would like to hear what others think about the worthiness of this.
Main
Post by Joseph Fox-Rabinovitz
np.bincount now works with complex weights (yay!), I guess this should
be a
Post by Joseph Fox-Rabinovitz
pretty uncontroversial enhancement.
The return is of the same type as weights, which means that small
integers
Post by Joseph Fox-Rabinovitz
are very likely to overflow. This is exactly what #6854 requested, but
perhaps we should promote the output for integers to a long, as we do
in
Post by Joseph Fox-Rabinovitz
np.sum?
I always thought of bincount with weights just as a group-by sum. So
it would be easier to remember and have fewer surprises if it matches
the behavior of np.sum.
Post by Joseph Fox-Rabinovitz
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
this
Post by Joseph Fox-Rabinovitz
what one would want? If we decide that integer promotion is the way to
go,
Post by Joseph Fox-Rabinovitz
perhaps booleans should go in the same pack?
Isn't this calculating the sum, i.e. count of True by group, already?
Based on a quick example with numpy 1.9.2, I don't think I ever used
bool weights before.
Post by Joseph Fox-Rabinovitz
This new implementation currently supports all of the reasonable native
types, but has no fallback for user defined types. I guess we should
attempt to cast the array to double as before if no native loop can be
found? It would be good to have a way of testing this though, any
thoughts
Post by Joseph Fox-Rabinovitz
on how to go about this?
Does a behavior change like this require some deprecation period? What
would
Post by Joseph Fox-Rabinovitz
that look like?
I have also added broadcasting of weights to the full size of list, so
that
Post by Joseph Fox-Rabinovitz
one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to
tile
Post by Joseph Fox-Rabinovitz
the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
(2-D weights ?)
Josef
Post by Joseph Fox-Rabinovitz
Jaime
--
(__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de
Post by Joseph Fox-Rabinovitz
dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
Loading...