Discussion:
[Numpy-discussion] Integers to integer powers, let's make a decision
Charles R Harris
2016-06-04 17:22:52 UTC
Permalink
Hi All,

I've made a new post so that we can make an explicit decision. AFAICT, the
two proposals are


1. Integers to negative integer powers raise an error.
2. Integers to integer powers always results in floats.

My own sense is that 1. would be closest to current behavior and using a
float exponential when a float is wanted is an explicit way to indicate
that desire. OTOH, 2. would be the most convenient default for everyday
numerical computation, but I think would more likely break current code. I
am going to come down on the side of 1., which I don't think should cause
too many problems if we start with a {Future, Deprecation}Warning
explaining the workaround.

Chuck
Nathaniel Smith
2016-06-04 17:45:24 UTC
Permalink
+1

On Sat, Jun 4, 2016 at 10:22 AM, Charles R Harris
Post by Charles R Harris
Hi All,
I've made a new post so that we can make an explicit decision. AFAICT, the
two proposals are
Integers to negative integer powers raise an error.
Integers to integer powers always results in floats.
My own sense is that 1. would be closest to current behavior and using a
float exponential when a float is wanted is an explicit way to indicate that
desire. OTOH, 2. would be the most convenient default for everyday numerical
computation, but I think would more likely break current code. I am going to
come down on the side of 1., which I don't think should cause too many
problems if we start with a {Future, Deprecation}Warning explaining the
workaround.
Chuck
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Nathaniel J. Smith -- https://vorpus.org
Matthew Brett
2016-06-04 17:46:50 UTC
Permalink
Post by Nathaniel Smith
+1
On Sat, Jun 4, 2016 at 10:22 AM, Charles R Harris
Post by Charles R Harris
Hi All,
I've made a new post so that we can make an explicit decision. AFAICT, the
two proposals are
Integers to negative integer powers raise an error.
Integers to integer powers always results in floats.
My own sense is that 1. would be closest to current behavior and using a
float exponential when a float is wanted is an explicit way to indicate that
desire. OTOH, 2. would be the most convenient default for everyday numerical
computation, but I think would more likely break current code. I am going to
come down on the side of 1., which I don't think should cause too many
problems if we start with a {Future, Deprecation}Warning explaining the
workaround.
I agree - error for negative integer powers seems like the safest option.

Cheers,

Matthew
Charles R Harris
2016-06-04 19:43:42 UTC
Permalink
Post by Charles R Harris
Hi All,
I've made a new post so that we can make an explicit decision. AFAICT, the
two proposals are
1. Integers to negative integer powers raise an error.
2. Integers to integer powers always results in floats.
My own sense is that 1. would be closest to current behavior and using a
float exponential when a float is wanted is an explicit way to indicate
that desire. OTOH, 2. would be the most convenient default for everyday
numerical computation, but I think would more likely break current code. I
am going to come down on the side of 1., which I don't think should cause
too many problems if we start with a {Future, Deprecation}Warning
explaining the workaround.
Note that current behavior in 1.11 is such a mess
```
In [5]: array([0], dtype=int64) ** -1
Out[5]: array([-9223372036854775808])

In [6]: array([0], dtype=uint64) ** -1
Out[6]: array([ inf])
```
That the simplest approach might be to start by raising an error rather
than by trying to maintain current behavior and issuing a warning.

Chuck
j***@gmail.com
2016-06-04 19:47:47 UTC
Permalink
On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris <
Post by Charles R Harris
Hi All,
I've made a new post so that we can make an explicit decision. AFAICT,
the two proposals are
1. Integers to negative integer powers raise an error.
2. Integers to integer powers always results in floats.
My own sense is that 1. would be closest to current behavior and using a
float exponential when a float is wanted is an explicit way to indicate
that desire. OTOH, 2. would be the most convenient default for everyday
numerical computation, but I think would more likely break current code. I
am going to come down on the side of 1., which I don't think should cause
too many problems if we start with a {Future, Deprecation}Warning
explaining the workaround.
I'm in favor of 2. always float for `**`
I don't see enough pure integer usecases to throw away a nice operator.

Josef
Note that current behavior in 1.11 is such a mess
```
In [5]: array([0], dtype=int64) ** -1
Out[5]: array([-9223372036854775808])
In [6]: array([0], dtype=uint64) ** -1
Out[6]: array([ inf])
```
That the simplest approach might be to start by raising an error rather
than by trying to maintain current behavior and issuing a warning.
Chuck
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Matthew Brett
2016-06-04 19:49:22 UTC
Permalink
Post by j***@gmail.com
On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris
Post by Charles R Harris
Hi All,
I've made a new post so that we can make an explicit decision. AFAICT,
the two proposals are
Integers to negative integer powers raise an error.
Integers to integer powers always results in floats.
My own sense is that 1. would be closest to current behavior and using a
float exponential when a float is wanted is an explicit way to indicate that
desire. OTOH, 2. would be the most convenient default for everyday numerical
computation, but I think would more likely break current code. I am going to
come down on the side of 1., which I don't think should cause too many
problems if we start with a {Future, Deprecation}Warning explaining the
workaround.
I'm in favor of 2. always float for `**`
I don't see enough pure integer usecases to throw away a nice operator.
I can't make sense of 'throw away a nice operator' - you still have
arr ** 2.0 if you want floats.

Matthew
j***@gmail.com
2016-06-04 20:16:21 UTC
Permalink
Post by Charles R Harris
On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris <
On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris
Post by Charles R Harris
Hi All,
I've made a new post so that we can make an explicit decision. AFAICT,
the two proposals are
Integers to negative integer powers raise an error.
Integers to integer powers always results in floats.
My own sense is that 1. would be closest to current behavior and using
a
Post by Charles R Harris
float exponential when a float is wanted is an explicit way to
indicate that
Post by Charles R Harris
desire. OTOH, 2. would be the most convenient default for everyday
numerical
Post by Charles R Harris
computation, but I think would more likely break current code. I am
going to
Post by Charles R Harris
come down on the side of 1., which I don't think should cause too many
problems if we start with a {Future, Deprecation}Warning explaining the
workaround.
I'm in favor of 2. always float for `**`
I don't see enough pure integer usecases to throw away a nice operator.
I can't make sense of 'throw away a nice operator' - you still have
arr ** 2.0 if you want floats.
but if we have x**y, then we always need to check the dtype. If we don't we
get RuntimeErrors or overflow, where we might have forgotten to include the
relevant cases in the unit tests.

numpy has got pickier with using only integers in some areas (index, ...).
Now we have to watch out that we convert back to floats for power.

Not a serious problem for a library with unit tests and enough users who
run into the dtype issues and report them.
But I'm sure I will have to fix any scripts or interactive work that I'm
writing.
It's just another thing to watch out for, after we managed to get rid of
integer division 1/2=?.

Josef
Post by Charles R Harris
Matthew
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
V. Armando Sole
2016-06-04 21:07:28 UTC
Permalink
Also in favor of 2. Always return a float for '**'
On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris
On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris
Post by Charles R Harris
Hi All,
I've made a new post so that we can make an explicit decision.
AFAICT, the two proposals are
* Integers to negative integer powers raise an error.
* Integers to integer powers always results in floats.
My own sense is that 1. would be closest to current behavior and
using a float exponential when a float is wanted is an explicit
way to indicate that desire. OTOH, 2. would be the most convenient
default for everyday numerical computation, but I think would more
likely break current code. I am going to come down on the side of
1., which I don't think should cause too many problems if we start
with a {Future, Deprecation}Warning explaining the workaround.
I'm in favor of 2. always float for `**`
I don't see enough pure integer usecases to throw away a nice
operator.
Nathaniel Smith
2016-06-04 22:10:28 UTC
Permalink
Post by V. Armando Sole
Also in favor of 2. Always return a float for '**'
Even if we did want to switch to this, it's such a major
backwards-incompatible change that I'm not sure how we could actually
make the transition without first making it an error for a while.

-n
--
Nathaniel J. Smith -- https://vorpus.org
j***@gmail.com
2016-06-04 23:27:30 UTC
Permalink
Post by Nathaniel Smith
Post by V. Armando Sole
Also in favor of 2. Always return a float for '**'
Even if we did want to switch to this, it's such a major
backwards-incompatible change that I'm not sure how we could actually
make the transition without first making it an error for a while.
AFAIU, only the dtype for int**int would change. So, what would be the
problem with FutureWarnings as with other dtype changes that were done in
recent releases.

Josef
Post by Nathaniel Smith
-n
--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2016-06-05 00:07:22 UTC
Permalink
Post by j***@gmail.com
Post by Nathaniel Smith
Post by V. Armando Sole
Also in favor of 2. Always return a float for '**'
Even if we did want to switch to this, it's such a major
backwards-incompatible change that I'm not sure how we could actually
make the transition without first making it an error for a while.
AFAIU, only the dtype for int**int would change. So, what would be the
problem with FutureWarnings as with other dtype changes that were done in
recent releases.
The main problem I see with that is that numpy integers would behave
differently than Python integers, and the difference would be silent. With
option 1 it is possible to write code that behaves the same up to overflow
and the error message would supply a warning when the exponent should be
float. One could argue that numpy scalar integer types could be made to
behave like python integers, but then their behavior would differ from
numpy arrays and numpy scalar arrays.

Chuck
j***@gmail.com
2016-06-05 00:17:59 UTC
Permalink
Post by Charles R Harris
Post by j***@gmail.com
Post by Nathaniel Smith
Post by V. Armando Sole
Also in favor of 2. Always return a float for '**'
Even if we did want to switch to this, it's such a major
backwards-incompatible change that I'm not sure how we could actually
make the transition without first making it an error for a while.
AFAIU, only the dtype for int**int would change. So, what would be the
problem with FutureWarnings as with other dtype changes that were done in
recent releases.
The main problem I see with that is that numpy integers would behave
differently than Python integers, and the difference would be silent. With
option 1 it is possible to write code that behaves the same up to overflow
and the error message would supply a warning when the exponent should be
float. One could argue that numpy scalar integer types could be made to
behave like python integers, but then their behavior would differ from
numpy arrays and numpy scalar arrays.
I'm not sure I understand.

Do you mean

np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2)

or 2**2 would behave differently than np.int_(2)**np.int(2)

?


AFAICS, there are many cases where numpy scalars don't behave like python
scalars. Also, does different behavior mean different type/dtype or
different numbers. (The first I can live with, the second requires human
memory usage, which is a scarce resource.)
Post by Charles R Harris
Post by j***@gmail.com
Post by Nathaniel Smith
2**(-2)
0.25

Josef
Post by Charles R Harris
Chuck
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2016-06-05 01:16:39 UTC
Permalink
On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris <
Post by Charles R Harris
Post by j***@gmail.com
Post by Nathaniel Smith
Post by V. Armando Sole
Also in favor of 2. Always return a float for '**'
Even if we did want to switch to this, it's such a major
backwards-incompatible change that I'm not sure how we could actually
make the transition without first making it an error for a while.
AFAIU, only the dtype for int**int would change. So, what would be the
problem with FutureWarnings as with other dtype changes that were done in
recent releases.
The main problem I see with that is that numpy integers would behave
differently than Python integers, and the difference would be silent. With
option 1 it is possible to write code that behaves the same up to overflow
and the error message would supply a warning when the exponent should be
float. One could argue that numpy scalar integer types could be made to
behave like python integers, but then their behavior would differ from
numpy arrays and numpy scalar arrays.
I'm not sure I understand.
Do you mean
np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2)
or 2**2 would behave differently than np.int_(2)**np.int(2)
The second case. Python returns ints for non-negative integer powers of
ints.
?
AFAICS, there are many cases where numpy scalars don't behave like python
scalars. Also, does different behavior mean different type/dtype or
different numbers. (The first I can live with, the second requires human
memory usage, which is a scarce resource.)
Post by Charles R Harris
Post by j***@gmail.com
Post by Nathaniel Smith
2**(-2)
0.25
But we can't mix types in np.arrays and we can't depend on the element
values of arrays in the exponent, but only on their type, so 2 ** array([1,
-1]) must contain a single type and making that type float would surely
break code. Scalar arrays, which are arrays, have the same problem. We
can't do what Python does with ndarrays and numpy scalars, and it would be
best to be consistent. Division was a simpler problem to deal with, as
there were two operators, `//` and `/`. If there were two exponential
operators life would be simpler.

Chuck
j***@gmail.com
2016-06-05 01:54:26 UTC
Permalink
Post by Charles R Harris
On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris <
Post by Charles R Harris
Post by j***@gmail.com
Post by Nathaniel Smith
Post by V. Armando Sole
Also in favor of 2. Always return a float for '**'
Even if we did want to switch to this, it's such a major
backwards-incompatible change that I'm not sure how we could actually
make the transition without first making it an error for a while.
AFAIU, only the dtype for int**int would change. So, what would be the
problem with FutureWarnings as with other dtype changes that were done in
recent releases.
The main problem I see with that is that numpy integers would behave
differently than Python integers, and the difference would be silent. With
option 1 it is possible to write code that behaves the same up to overflow
and the error message would supply a warning when the exponent should be
float. One could argue that numpy scalar integer types could be made to
behave like python integers, but then their behavior would differ from
numpy arrays and numpy scalar arrays.
I'm not sure I understand.
Do you mean
np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2)
or 2**2 would behave differently than np.int_(2)**np.int(2)
The second case. Python returns ints for non-negative integer powers of
ints.
?
AFAICS, there are many cases where numpy scalars don't behave like python
scalars. Also, does different behavior mean different type/dtype or
different numbers. (The first I can live with, the second requires human
memory usage, which is a scarce resource.)
Post by Charles R Harris
Post by j***@gmail.com
Post by Nathaniel Smith
2**(-2)
0.25
But we can't mix types in np.arrays and we can't depend on the element
values of arrays in the exponent, but only on their type, so 2 ** array([1,
-1]) must contain a single type and making that type float would surely
break code. Scalar arrays, which are arrays, have the same problem. We
can't do what Python does with ndarrays and numpy scalars, and it would be
best to be consistent. Division was a simpler problem to deal with, as
there were two operators, `//` and `/`. If there were two exponential
operators life would be simpler.
What bothers me with the entire argument is that you are putting higher
priority on returning a dtype than on returning the correct numbers.

Reverse the argument: Because we cannot make the return type value
dependent we **have** to return float, in order to get the correct number.
(It's an argument not what we really have to do.)


Which code really breaks, code that gets a float instead of an int, and
with some advance warning users that really need to watch their memory can
use np.power.

My argument before was that I think a simple operator like `**` should work
for 90+% of the users and match their expectation, and the users that need
to watch dtypes can as well use the function.

(I can also live with the exception from case 1., but I really think this
is like the python 2 integer division "surprise")

Josef
Post by Charles R Harris
Chuck
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2016-06-05 02:23:41 UTC
Permalink
On Sat, Jun 4, 2016 at 9:16 PM, Charles R Harris <
Post by Charles R Harris
On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris <
Post by Charles R Harris
Post by j***@gmail.com
Post by Nathaniel Smith
Post by V. Armando Sole
Also in favor of 2. Always return a float for '**'
Even if we did want to switch to this, it's such a major
backwards-incompatible change that I'm not sure how we could actually
make the transition without first making it an error for a while.
AFAIU, only the dtype for int**int would change. So, what would be the
problem with FutureWarnings as with other dtype changes that were done in
recent releases.
The main problem I see with that is that numpy integers would behave
differently than Python integers, and the difference would be silent. With
option 1 it is possible to write code that behaves the same up to overflow
and the error message would supply a warning when the exponent should be
float. One could argue that numpy scalar integer types could be made to
behave like python integers, but then their behavior would differ from
numpy arrays and numpy scalar arrays.
I'm not sure I understand.
Do you mean
np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2)
or 2**2 would behave differently than np.int_(2)**np.int(2)
The second case. Python returns ints for non-negative integer powers of
ints.
?
AFAICS, there are many cases where numpy scalars don't behave like
python scalars. Also, does different behavior mean different type/dtype or
different numbers. (The first I can live with, the second requires human
memory usage, which is a scarce resource.)
Post by Charles R Harris
Post by j***@gmail.com
Post by Nathaniel Smith
2**(-2)
0.25
But we can't mix types in np.arrays and we can't depend on the element
values of arrays in the exponent, but only on their type, so 2 ** array([1,
-1]) must contain a single type and making that type float would surely
break code. Scalar arrays, which are arrays, have the same problem. We
can't do what Python does with ndarrays and numpy scalars, and it would be
best to be consistent. Division was a simpler problem to deal with, as
there were two operators, `//` and `/`. If there were two exponential
operators life would be simpler.
What bothers me with the entire argument is that you are putting higher
priority on returning a dtype than on returning the correct numbers.
Overflow in integer powers would be correct in modular arithmetic, at least
for unsigned. Signed is a bit trickier. But overflow is a known property of
numpy integer types. If we raise an exception for the negative exponents we
at least aren't returning incorrect numbers.
Reverse the argument: Because we cannot make the return type value
dependent we **have** to return float, in order to get the correct number.
(It's an argument not what we really have to do.)
From my point of view, backwards compatibility is the main reason for
choosing 1, otherwise I'd pick 2. If it weren't so easy to get floating
point by using floating exponents I'd probably choose differently.
Which code really breaks, code that gets a float instead of an int, and
with some advance warning users that really need to watch their memory can
use np.power.
My argument before was that I think a simple operator like `**` should
work for 90+% of the users and match their expectation, and the users that
need to watch dtypes can as well use the function.
(I can also live with the exception from case 1., but I really think this
is like the python 2 integer division "surprise")
Well, that is why we would raise an exception, making it less surprising ;)

We could always try the float option and see what breaks, but I expect
there is a fair amount of code using small exponents like 2 or 3 where it
is expected that the result is still integer. I would like more input from
users than we have seen so far...

Chuck
Nathaniel Smith
2016-06-05 03:26:15 UTC
Permalink
On Jun 4, 2016 7:23 PM, "Charles R Harris" <***@gmail.com>
wrote:
[...]
Post by Charles R Harris
We could always try the float option and see what breaks, but I expect
there is a fair amount of code using small exponents like 2 or 3 where it
is expected that the result is still integer. I would like more input from
users than we have seen so far...

Just to highlight this, if anyone wants to strengthen the argument for
switching to float then this is something you can literally do: tweak a
local checkout of numpy to return float from int**int and
array-of-int**array-of-int, and then try running the test suites of
projects like scikit-learn, astropy, nipy, scikit-image, ...

(The reason I'm phrasing this as something that people who like the float
idea should do is that generally when proposing a risky
compatibility-breaking change, the onus is on the ones proposing it to
demonstrate that the risk is ok.)

-n
Charles R Harris
2016-06-05 03:44:52 UTC
Permalink
Post by Charles R Harris
[...]
Post by Charles R Harris
We could always try the float option and see what breaks, but I expect
there is a fair amount of code using small exponents like 2 or 3 where it
is expected that the result is still integer. I would like more input from
users than we have seen so far...
Just to highlight this, if anyone wants to strengthen the argument for
switching to float then this is something you can literally do: tweak a
local checkout of numpy to return float from int**int and
array-of-int**array-of-int, and then try running the test suites of
projects like scikit-learn, astropy, nipy, scikit-image, ...
(The reason I'm phrasing this as something that people who like the float
idea should do is that generally when proposing a risky
compatibility-breaking change, the onus is on the ones proposing it to
demonstrate that the risk is ok.)
I was tempted for a bit, but I think the biggest compatibility problem is
not current usage, but the fact that code written assuming float results
will not work for earlier versions of numpy, and that would be a nasty
situation. Given that integers raised to negative integer powers is already
pretty much broken, making folks write around an exception will result in
code compatible with previous numpy versions.

Chuck
Alan Isaac
2016-06-05 13:05:32 UTC
Permalink
Post by Charles R Harris
From my point of view, backwards compatibility is the main reason for
choosing 1, otherwise I'd pick 2. If it weren't so easy to get
floating point by using floating exponents I'd probably choose
differently.
As an interested user, I offer a summary of
some things I believe are being claimed about
the two proposals on the table (for int**int),
which are are:
1. raise an error for negative powers
2. always return float
Here is a first draft comparison (for int**int only)

Proposal 1. effectively throws away an operator
- true in this:
np.arange(10)**10 already overflows
even for int32 much less smaller sizes,
and negative powers are now errors
- fale in this:
you can change an argument to float

Proposal 1. effectively behaves more like Python
- true in this:
for a very small range of numbers, int**int will
return int in Python 2
- false in this:
In Python, negative exponents produce floats,
and int**int does not overflow

Proposal 1 is more backwards compatible:
true, but this really only affects int**2
(larger arguments quickly overflow)

Proposal 2 is a better match for other languages:
basically true (see e.g., C++'s overloaded `pow`)

Proposal 2 better satisfies the principle of least surprise:
probably true for most users,
possibly false for some


Feel free to add, correct, modify. I think there is a strong
argument to always return float, and the real question is
whether it is strong enough tosacrifice backwards compatibility.

Hope this summary is of some use and not too tendentious,
Alan
Peter Creasey
2016-06-04 20:35:30 UTC
Permalink
Post by Nathaniel Smith
+1
On Sat, Jun 4, 2016 at 10:22 AM, Charles R Harris
Post by Charles R Harris
Hi All,
I've made a new post so that we can make an explicit decision. AFAICT, the
two proposals are
Integers to negative integer powers raise an error.
Integers to integer powers always results in floats.
My own sense is that 1. would be closest to current behavior and using a
float exponential when a float is wanted is an explicit way to indicate that
desire. OTOH, 2. would be the most convenient default for everyday numerical
computation, but I think would more likely break current code. I am going to
come down on the side of 1., which I don't think should cause too many
problems if we start with a {Future, Deprecation}Warning explaining the
workaround.
Chuck
+1 (grudgingly)

My thoughts on this are:
(i) Intuitive APIs are better, and power(a,b) suggests to a lot of
(most?) readers that you are going to invoke a function like the C
pow(double x, double y) on every element. Doing positive integer
powers with the same function name suggests a correspondence that is
in practice not that helpful. With a time machine I’d suggest a
separate function for positive integer powers, however...
(ii) I think that ship has sailed, and particularly with e.g. a**3 the
numpy conventions are backed up by quite a bit of code, probably too
much to change without a lot of problems. So I’d go with integer ^
negative integer is an error.

Peter
Sturla Molden
2016-06-06 02:33:00 UTC
Permalink
Post by Charles R Harris
1. Integers to negative integer powers raise an error.
2. Integers to integer powers always results in floats.
2
Marten van Kerkwijk
2016-06-06 20:11:17 UTC
Permalink
Hi Chuck,

I consider either proposal an improvement, but among the two I favour
returning float for `**`, because, like for `/`, it ensures one gets
closest to the (mathematically) true answer in most cases, and makes
duck-typing that much easier -- I'd like to be able to do x** y without
having to worry whether x and y are python scalars or numpy arrays of
certain type.

I do agree with Nathaniel that it would be good to check what actually
breaks. Certainly, if anybody is up to making a PR that implements either
suggestion, I'd gladly check whether it breaks anything in astropy.

I should add that I have no idea how to assuage the fear that new code
would break with old versions of numpy, but on the other hand, I don't know
its vailidity either, as it seems one either develops larger projects for
multiple versions and tests, or writes more scripty things for whatever the
current versions are. Certainly, by this argument I better not start using
the new `@` operator!

I do think the argument that for division it was easier because there was
`//` already available is a red herring: here one can use `np.power(a, b,
dtype=...)` if one really needs to.

All the best,

Marten
Charles R Harris
2016-06-06 20:17:40 UTC
Permalink
On Mon, Jun 6, 2016 at 2:11 PM, Marten van Kerkwijk <
Post by Marten van Kerkwijk
Hi Chuck,
I consider either proposal an improvement, but among the two I favour
returning float for `**`, because, like for `/`, it ensures one gets
closest to the (mathematically) true answer in most cases, and makes
duck-typing that much easier -- I'd like to be able to do x** y without
having to worry whether x and y are python scalars or numpy arrays of
certain type.
I do agree with Nathaniel that it would be good to check what actually
breaks. Certainly, if anybody is up to making a PR that implements either
suggestion, I'd gladly check whether it breaks anything in astropy.
I should add that I have no idea how to assuage the fear that new code
would break with old versions of numpy, but on the other hand, I don't know
its vailidity either, as it seems one either develops larger projects for
multiple versions and tests, or writes more scripty things for whatever the
current versions are. Certainly, by this argument I better not start using
I do think the argument that for division it was easier because there was
`//` already available is a red herring: here one can use `np.power(a, b,
dtype=...)` if one really needs to.
It looks to me like users want floats, while developers want the easy path
of raising an error. Darn those users, they just make life sooo difficult...

Chuck
Nathaniel Smith
2016-06-10 06:42:47 UTC
Permalink
On Mon, Jun 6, 2016 at 1:17 PM, Charles R Harris
Post by Marten van Kerkwijk
Hi Chuck,
I consider either proposal an improvement, but among the two I favour returning float for `**`, because, like for `/`, it ensures one gets closest to the (mathematically) true answer in most cases, and makes duck-typing that much easier -- I'd like to be able to do x** y without having to worry whether x and y are python scalars or numpy arrays of certain type.
I do agree with Nathaniel that it would be good to check what actually breaks. Certainly, if anybody is up to making a PR that implements either suggestion, I'd gladly check whether it breaks anything in astropy.
I do think the argument that for division it was easier because there was `//` already available is a red herring: here one can use `np.power(a, b, dtype=...)` if one really needs to.
It looks to me like users want floats, while developers want the easy path of raising an error. Darn those users, they just make life sooo difficult...
I dunno, with my user hat on I'd be incredibly surprised / confused /
annoyed if an innocent-looking expression like

np.arange(10) ** 2

started returning floats... having exact ints is a really nice feature
of Python/numpy as compared to R/Javascript, and while it's true that
int64 can overflow, there are also large powers that can be more
precisely represented as int64 than float.

-n
--
Nathaniel J. Smith -- https://vorpus.org
Peter Cock
2016-06-10 08:11:17 UTC
Permalink
Post by Nathaniel Smith
On Mon, Jun 6, 2016 at 1:17 PM, Charles R Harris
Post by Charles R Harris
...
It looks to me like users want floats, while developers want the
easy path of raising an error. Darn those users, they just make
life sooo difficult...
I dunno, with my user hat on I'd be incredibly surprised / confused /
annoyed if an innocent-looking expression like
np.arange(10) ** 2
started returning floats... having exact ints is a really nice feature
of Python/numpy as compared to R/Javascript, and while it's true that
int64 can overflow, there are also large powers that can be more
precisely represented as int64 than float.
-n
I was about to express an preference for (1), preserving integers
on output but treating negative powers as an error. However,
I realised the use case I had in mind does not apply:

Where I've used integer matrices as network topology adjacency
matrixes, to get connectivity by paths of n steps you use A**n,
by which I mean A x A x ... A using matrix multiplication. But
in NumPy A**n will do element wise multiplication, so this
example is not helpful.
Post by Nathaniel Smith
1. Integers to negative integer powers raise an error.
2. Integers to integer powers always results in floats.
As an aside, using boolean matrices can be helpful in the
context of connectivity matrices. How would the proposals
here affect booleans, where there is no risk of overflow?
If we went with (2), using promotion to floats here would
Post by Nathaniel Smith
Post by Charles R Harris
import numpy
A = numpy.array([[False,True,False],[True,False,True],[True,True,False]], dtype=numpy.bool)
A
array([[False, True, False],
[ True, False, True],
[ True, True, False]], dtype=bool)
Post by Nathaniel Smith
Post by Charles R Harris
A*A
array([[False, True, False],
[ True, False, True],
[ True, True, False]], dtype=bool)
Post by Nathaniel Smith
Post by Charles R Harris
A**2
array([[False, True, False],
[ True, False, True],
[ True, True, False]], dtype=bool)
Post by Nathaniel Smith
Post by Charles R Harris
numpy.dot(A,A)
array([[ True, False, True],
[ True, True, False],
[ True, True, True]], dtype=bool)
Regards,

Peter
Alan Isaac
2016-06-10 12:10:57 UTC
Permalink
Post by Nathaniel Smith
I dunno, with my user hat on I'd be incredibly surprised / confused /
annoyed if an innocent-looking expression like
np.arange(10) ** 2
started returning floats... having exact ints is a really nice feature
of Python/numpy as compared to R/Javascript, and while it's true that
int64 can overflow, there are also large powers that can be more
precisely represented as int64 than float.
Is np.arange(10)**10 also "innocent looking" to a Python user?

Also, I am confused by what "large powers" means in this context.
Is 2**40 a "large power"?

Finally, is np.arange(1,3)**-2 "innocent looking" to a Python user?

Cheers,
Alan
Juan Nunez-Iglesias
2016-06-10 16:49:10 UTC
Permalink
+1 to Alan's point. Having different type behaviour depending on the values
of x and y for np.arange(x) ** y would be awful, and it would also be awful
to have to worry about overflow here...

...

Having said that, it would be equally annoying to not have a way to define
integer powers...


From: Alan Isaac <***@gmail.com> <***@gmail.com>
Reply: Discussion of Numerical Python <numpy-***@scipy.org>
<numpy-***@scipy.org>
Date: 10 June 2016 at 5:10:57 AM
To: Discussion of Numerical Python <numpy-***@scipy.org>
<numpy-***@scipy.org>
Subject: Re: [Numpy-discussion] Integers to integer powers, let's make a
decision
Post by Nathaniel Smith
I dunno, with my user hat on I'd be incredibly surprised / confused /
annoyed if an innocent-looking expression like
np.arange(10) ** 2
started returning floats... having exact ints is a really nice feature
of Python/numpy as compared to R/Javascript, and while it's true that
int64 can overflow, there are also large powers that can be more
precisely represented as int64 than float.
Is np.arange(10)**10 also "innocent looking" to a Python user?
Also, I am confused by what "large powers" means in this context.
Is 2**40 a "large power"?
Finally, is np.arange(1,3)**-2 "innocent looking" to a Python user?
Cheers,
Alan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Allan Haldane
2016-06-10 17:20:50 UTC
Permalink
Post by Alan Isaac
Is np.arange(10)**10 also "innocent looking" to a Python user?
This doesn't bother me much because numpy users have to be aware of
overflow issues in lots of other (simple) cases anyway, eg plain
addition and multiplication.

I'll add my +1 for integer powers returning an integer, and an error for
negative powers. Integer powers are a useful operation that I would bet
a lot of code currently depends on.

Allan
Alan Isaac
2016-06-10 17:38:31 UTC
Permalink
Post by Allan Haldane
numpy users have to be aware of
overflow issues in lots of other (simple) cases anyway, eg plain
addition and multiplication.
This is not comparable because *almost all* integer combinations
overflow for exponentiation. See the discussion at
https://wiki.haskell.org/Power_function
http://stackoverflow.com/questions/6400568/exponentiation-in-haskell

Alan
Nathaniel Smith
2016-06-10 17:34:57 UTC
Permalink
Post by Alan Isaac
Post by Nathaniel Smith
I dunno, with my user hat on I'd be incredibly surprised / confused /
annoyed if an innocent-looking expression like
np.arange(10) ** 2
started returning floats... having exact ints is a really nice feature
of Python/numpy as compared to R/Javascript, and while it's true that
int64 can overflow, there are also large powers that can be more
precisely represented as int64 than float.
Is np.arange(10)**10 also "innocent looking" to a Python user?
You keep pounding on this example. It's a fine example, but, c'mon. **2 is
probably at least 100x more common in real source code. Maybe 1000x more
common. Why should we break the common case for your edge case?
Post by Alan Isaac
Also, I am confused by what "large powers" means in this context.
Is 2**40 a "large power"?
I meant the range 2**53 -- 2**63 where integers have more precision than
floats. It's not a terribly important point, but it is true that there are
currently ** operations that return exact results, and that would become
impossible to do exactly if we switch ** to return floats.
Post by Alan Isaac
Finally, is np.arange(1,3)**-2 "innocent looking" to a Python user?
Maybe, maybe not. Historically your example here has always silently
returned nonsense, and no one seems to complain. Maybe this just means
everyone's become used to the pain, I dunno. But probably it also has
something to do with how uncommon this code is in practice. OTOH I'm
convinced that making **2 return floats is going to generate a ton of
complaints -- first because of all the code we broke, but then (I predict,
could be wrong) on an ongoing basis, as new users trip over the unexpected
**2 behavior. Because, again, **2 is something that orders of magnitude
more people will actually trip over, and in contexts where they'll have no
idea why it would return float.

(Remember that whole discussion we had at the beginning of the thread,
where very experienced numpy users started out thinking we should return
float for negative powers only, and then we had to carefully think through
how numpy's type system works to convince ourselves that this wasn't
possible? I don't really want to force every new user to recapitulate that
discussion just when they're learning what types even are...)

-n
Alan Isaac
2016-06-10 17:50:47 UTC
Permalink
You keep pounding on this example. It's a fine example, but, c'mon. **2 is probably at least 100x more common in real source code. Maybe 1000x more common. Why should we break the
common case for your edge case?
It is hardly an "edge case".
Again, **almost all** integer combinations overflow: that's the point.
If you were promoting to a Python long integer, that would change things.
But hobbling a whole operator so that people don't have to say `a*a` seems
absurdly wasteful. Additionally, returning floats provides a better
match to Python's behavior (i.e., it allows sensible handling of
negative powers). Users who really want in output and understand
overflow should be supported with a function.

Anyway, I've said my piece and will shut up now.

Cheers,
Alan
Nathaniel Smith
2016-06-10 18:00:48 UTC
Permalink
Post by Alan Isaac
Post by Nathaniel Smith
You keep pounding on this example. It's a fine example, but, c'mon. **2
is probably at least 100x more common in real source code. Maybe 1000x more
common. Why should we break the
Post by Alan Isaac
Post by Nathaniel Smith
common case for your edge case?
It is hardly an "edge case".
Again, **almost all** integer combinations overflow: that's the point.
When you say "almost all", you're assuming inputs that are uniformly
sampled integers. I'm much more interested in what proportion of calls to
the ** operator involve inputs that can overflow, and in real life those
inputs are very heavily biased towards small numbers.

(I also think we should default to raising an error on overflow in general,
with a seterr switch to turn it off when desired. But that's another
discussion...)

-n
j***@gmail.com
2016-06-10 19:01:00 UTC
Permalink
Post by Alan Isaac
Post by Nathaniel Smith
You keep pounding on this example. It's a fine example, but, c'mon. **2
is probably at least 100x more common in real source code. Maybe 1000x more
common. Why should we break the
Post by Alan Isaac
Post by Nathaniel Smith
common case for your edge case?
It is hardly an "edge case".
Again, **almost all** integer combinations overflow: that's the point.
When you say "almost all", you're assuming inputs that are uniformly
sampled integers. I'm much more interested in what proportion of calls to
the ** operator involve inputs that can overflow, and in real life those
inputs are very heavily biased towards small numbers.
(I also think we should default to raising an error on overflow in
general, with a seterr switch to turn it off when desired. But that's
another discussion...)
but x**2 is just x*x which some seem to recommend (I have no idea why), and
then there are not so many "common" cases left.


(However, I find integers pretty useless except in some very specific
cases. When I started to cleanup scipy.stats.distribution, I threw out
integers for discrete distributions and replaced all or most `**` by
np.power, IIRC mainly because of the old python behavior and better numpy
behavior.)

(I'd rather use robust calculations that provide correct numbers, than
chasing individual edge cases and save a bit of memory in some common
cases. scipy stats also doesn't use factorial in almost all cases, because
special.gamma and variants are more robust )


Josef
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Ian Henriksen
2016-06-10 20:16:06 UTC
Permalink
Post by Alan Isaac
Post by Nathaniel Smith
You keep pounding on this example. It's a fine example, but, c'mon. **2
is probably at least 100x more common in real source code. Maybe 1000x more
common. Why should we break the
Post by Alan Isaac
Post by Nathaniel Smith
common case for your edge case?
It is hardly an "edge case".
Again, **almost all** integer combinations overflow: that's the point.
When you say "almost all", you're assuming inputs that are uniformly
sampled integers. I'm much more interested in what proportion of calls to
the ** operator involve inputs that can overflow, and in real life those
inputs are very heavily biased towards small numbers.
(I also think we should default to raising an error on overflow in
general, with a seterr switch to turn it off when desired. But that's
another discussion...)
-n
Another thing that would need separate discussion...
Making 64 bit integers default in more cases would help here.
Currently arange gives 32 bit integers on 64 bit Windows, but
64 bit integers on 64 bit Linux/OSX. Using size_t (or even
int64_t) as the default size would help with overflows in
the more common use cases. It's a hefty backcompat
break, but 64 bit systems are much more common now,
and using 32 bit integers on 64 bit windows is a bit odd.
Anyway, hopefully that's not too off-topic.
Best,
Ian Henriksen
Sebastian Berg
2016-06-11 10:05:23 UTC
Permalink
Post by Alan Isaac
Post by Nathaniel Smith
You keep pounding on this example. It's a fine example, but,
c'mon. **2 is probably at least 100x more common in real source
code. Maybe 1000x more common. Why should we break the
Post by Alan Isaac
Post by Nathaniel Smith
common case for your edge case?
It is hardly an "edge case".
Again, **almost all** integer combinations overflow: that's the
point.
When you say "almost all", you're assuming inputs that are
uniformly sampled integers. I'm much more interested in what
proportion of calls to the ** operator involve inputs that can
overflow, and in real life those inputs are very heavily biased
towards small numbers.
(I also think we should default to raising an error on overflow in
general, with a seterr switch to turn it off when desired. But
that's another discussion...)
-n
Another thing that would need separate discussion...
Making 64 bit integers default in more cases would help here.
Currently arange gives 32 bit integers on 64 bit Windows, but
64 bit integers on 64 bit Linux/OSX. Using size_t (or even
int64_t) as the default size would help with overflows in
the more common use cases. It's a hefty backcompat
break, but 64 bit systems are much more common now,
and using 32 bit integers on 64 bit windows is a bit odd.
Anyway, hopefully that's not too off-topic.
Best,
I agree, at least on python3 (the reason is that python 3, the subclass
thingy goes away, so it is less likely to break anything). I think we
could have a shot at this, it is quirky, but the current incosistency
is pretty bad too (and probably has a lot of bugs out in the wild,
because of tests on systems where long is 64bits).

A different issue though, though I wouldn't mind if someone ponders
this a bit more and maybe creates a pull request.

- Sebastian
Ian Henriksen
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Alan Isaac
2016-06-10 19:38:30 UTC
Permalink
I guess I have one more question; sorry.

Suppose we stipulate that `np.int_(9)**np.int__(10)` should
just overflow, since that appears to be the clear intent of
the (informed) user.

When a Python 3 user writes `np.arange(10)**10`,
how are we to infer the intended type of the output?
(I specify Python 3, since it has a unified treatment of integers.)
np.find_common_type([np.int32],[int])
dtype('int32')

If this were indeed an enforced numpy convention, I would
see better the point of view on the integer exponentiation
np.find_common_type([np.int8],[np.int32])
dtype('int8')
(np.arange(10,dtype=np.int8)+np.int32(2**10)).dtype
dtype('int16')

And so on. If these other binary operators upcast based
on the scalar value, why wouldn't exponentiation?
I suppose the answer is: they upcast only insofar
as necessary to fit the scalar value, which I see is
a simple and enforceable rule. However, that seems the wrong
(np.int8(2)**2).dtype
dtype('int32')

OK, my question to those who have argued a**2 should
produce an int32 when a is an int32: what if a is an int8?
(Obviously the overflow problem is becoming extremely pressing ...)

Thanks,
Alan

PS Where are these casting rules documented?
Allan Haldane
2016-06-11 00:28:30 UTC
Permalink
Post by Alan Isaac
np.find_common_type([np.int8],[np.int32])
dtype('int8')
(np.arange(10,dtype=np.int8)+np.int32(2**10)).dtype
dtype('int16')
And so on. If these other binary operators upcast based
on the scalar value, why wouldn't exponentiation?
I suppose the answer is: they upcast only insofar
as necessary to fit the scalar value, which I see is
a simple and enforceable rule. However, that seems the wrong
(np.int8(2)**2).dtype
dtype('int32')
My understanding is that numpy never upcasts based on the values, it
upcasts based on the datatype ranges.

http://docs.scipy.org/doc/numpy-1.10.1/reference/ufuncs.html#casting-rules

For arrays of different datatype, numpy finds the datatype which can
store values in both dtype's ranges, *not* the type which is large
enough to accurately store the result values.

So for instance,
Post by Alan Isaac
(np.arange(10, dtype=np.uint8) + np.uint32(2**32-1)).dtype
dtype('uint32')

Overflow has occurred, but numpy didn't upcast to uint64.

This rule has some slightly strange consequences. For example, the
ranges of np.int8 and np.uint64 don't match up, and numpy has decided
that the only type covering both ranges is np.float64.

So as an extra twist in this discussion, this means numpy actually
Post by Alan Isaac
type( np.uint64(2) ** np.int8(3) )
numpy.float64
Post by Alan Isaac
OK, my question to those who have argued a**2 should
produce an int32 when a is an int32: what if a is an int8?
(Obviously the overflow problem is becoming extremely pressing ...)
To me, whether it's int8 or int32, the user should just be aware of
overflow.

Also, I like to think of numpy as having quite C-like behavior, allowing
you to play with the lowlevel bits and bytes. (I actually wish its
casting behavior was more C-like). I suspect that people working with
uint8 arrays might be doing byte-fiddling hacks and actually *want*
overflow/wraparound to occur, at least when multiplying/adding.

Allan



PS
I would concede that numpy's uint8 integer power currently doesn't
wraparound like mutliply does, but it would be cool if it did.
(modulo arithmetic is associative, so it should, right?).
Post by Alan Isaac
x = np.arange(256, dtype='uint8')
x**8 # returns all 0
x*x*x*x*x*x*x*x # returns wrapped values
Marten van Kerkwijk
2016-06-11 00:44:33 UTC
Permalink
I do think one of the main arguments for returning float remains the
analogy with division. I don't know about the rest of you, but it has been
such a relief not to have to tell students any more "you should add a ".",
otherwise it does integer division". For most purposes, it simply shouldn't
matter whether one types an integer or a float; if it does, then one has to
think about it, and it seems fine for that relatively specialized case to
have to use a specialized function. -- Marten
Antoine Pitrou
2016-06-13 08:47:08 UTC
Permalink
On Fri, 10 Jun 2016 20:28:30 -0400
Post by Allan Haldane
Also, I like to think of numpy as having quite C-like behavior, allowing
you to play with the lowlevel bits and bytes. (I actually wish its
casting behavior was more C-like). I suspect that people working with
uint8 arrays might be doing byte-fiddling hacks and actually *want*
overflow/wraparound to occur, at least when multiplying/adding.
I agree. Currently, the choice is simple: if you want an int output,
have an int input; if you want a float output, have a float output.
This fidelity to the user's data type choice allows people to make
informed decisions.

Regards

Antoine.
Alan Isaac
2016-06-13 14:05:08 UTC
Permalink
Post by Antoine Pitrou
Currently, the choice is simple: if you want an int output,
have an int input; if you want a float output, have a float output.
That is a misunderstanding, which may be influencing the discussion.
Post by Antoine Pitrou
type(np.int8(2)**2)
<type 'numpy.int32'>
Post by Antoine Pitrou
type(np.uint64(2)**np.int8(2))
<type 'numpy.float64'>

I don't think anyone has proposed first principles
from which the desirable behavior could be deduced.
I do think reference to the reasoning used by other
languages in making this decision could be helpful.

Alan Isaac
(on Windows)
Antoine Pitrou
2016-06-13 14:42:51 UTC
Permalink
On Mon, 13 Jun 2016 10:05:08 -0400
Post by Alan Isaac
That is a misunderstanding, which may be influencing the discussion.
type(np.int8(2)**2)
<type 'numpy.int32'>
type(np.uint64(2)**np.int8(2))
<type 'numpy.float64'>
The `uint64 x int8 -> float64` is IMHO an abberration in Numpy's
Post by Alan Isaac
np.int64(2) + np.int32(3)
5
Post by Alan Isaac
np.uint64(2) + np.int32(3)
5.0

The other complications have to do with the type width, which are less
annoying than changing the numeric kind altogether (as would be done by
mandating int x int -> float in all cases).

Regards

Antoine.
j***@gmail.com
2016-06-13 14:49:44 UTC
Permalink
Post by Alan Isaac
Post by Antoine Pitrou
Currently, the choice is simple: if you want an int output,
have an int input; if you want a float output, have a float output.
That is a misunderstanding, which may be influencing the discussion.
Post by Antoine Pitrou
type(np.int8(2)**2)
<type 'numpy.int32'>
Post by Antoine Pitrou
type(np.uint64(2)**np.int8(2))
<type 'numpy.float64'>
I don't think anyone has proposed first principles
from which the desirable behavior could be deduced.
I do think reference to the reasoning used by other
languages in making this decision could be helpful.
I think the main principle is whether an operator is a "float" operator.

for example, I don't think anyone would expect sqrt(int) to return int,
even if it would have exact results in a countable infinite number of cases
(theoretically)

another case is division which moved from return-int to return-float
definition in the py2 - py3 move.

My argument is that `**` is like integer division and sqrt where the domain
where integer return are the correct numbers is too small to avoid
headaches by users.

Josef
Post by Alan Isaac
Alan Isaac
(on Windows)
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Antoine Pitrou
2016-06-13 15:25:21 UTC
Permalink
On Mon, 13 Jun 2016 10:49:44 -0400
Post by j***@gmail.com
My argument is that `**` is like integer division and sqrt where the domain
where integer return are the correct numbers is too small to avoid
headaches by users.
math.pow(3, 39) == 3**39
False
Post by j***@gmail.com
np.int64(3)**39 == 3**39
True


(as a sidenote, np.float64's equality operator seems to be slightly
Post by j***@gmail.com
np.float64(3)**39 == 3**39
True
Post by j***@gmail.com
int(np.float64(3)**39) == 3**39
False
Post by j***@gmail.com
float(np.float64(3)**39) == 3**39
False
)

Regards

Antoine.
j***@gmail.com
2016-06-13 15:51:07 UTC
Permalink
Post by Antoine Pitrou
On Mon, 13 Jun 2016 10:49:44 -0400
Post by j***@gmail.com
My argument is that `**` is like integer division and sqrt where the
domain
Post by j***@gmail.com
where integer return are the correct numbers is too small to avoid
headaches by users.
math.pow(3, 39) == 3**39
False
Post by j***@gmail.com
np.int64(3)**39 == 3**39
True
but if a user does this, then ??? (headaches or head scratching)
Post by Antoine Pitrou
Post by j***@gmail.com
np.array([3])**39
RuntimeWarning: invalid value encountered in power

array([-2147483648], dtype=int32)

Josef
Post by Antoine Pitrou
(as a sidenote, np.float64's equality operator seems to be slightly
Post by j***@gmail.com
np.float64(3)**39 == 3**39
True
Post by j***@gmail.com
int(np.float64(3)**39) == 3**39
False
Post by j***@gmail.com
float(np.float64(3)**39) == 3**39
False
)
Regards
Antoine.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-06-13 16:07:11 UTC
Permalink
Post by j***@gmail.com
Post by Antoine Pitrou
On Mon, 13 Jun 2016 10:49:44 -0400
Post by j***@gmail.com
My argument is that `**` is like integer division and sqrt where the
domain
Post by j***@gmail.com
where integer return are the correct numbers is too small to avoid
headaches by users.
math.pow(3, 39) == 3**39
False
Post by j***@gmail.com
np.int64(3)**39 == 3**39
True
but if a user does this, then ??? (headaches or head scratching)
Post by Antoine Pitrou
Post by j***@gmail.com
np.array([3])**39
RuntimeWarning: invalid value encountered in power
array([-2147483648], dtype=int32)
I forgot to add

the real headaches start in the second call, when we don't get the
RuntimeWarning anymore
Post by j***@gmail.com
Post by Antoine Pitrou
Post by j***@gmail.com
np.array([4])**39
array([-2147483648], dtype=int32)


("Now, why do I owe so much money, when I made a huge profit all year." )

Josef
Post by j***@gmail.com
Josef
Post by Antoine Pitrou
(as a sidenote, np.float64's equality operator seems to be slightly
Post by j***@gmail.com
np.float64(3)**39 == 3**39
True
Post by j***@gmail.com
int(np.float64(3)**39) == 3**39
False
Post by j***@gmail.com
float(np.float64(3)**39) == 3**39
False
)
Regards
Antoine.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-06-13 16:15:40 UTC
Permalink
Post by j***@gmail.com
Post by j***@gmail.com
Post by Antoine Pitrou
On Mon, 13 Jun 2016 10:49:44 -0400
Post by j***@gmail.com
My argument is that `**` is like integer division and sqrt where the
domain
Post by j***@gmail.com
where integer return are the correct numbers is too small to avoid
headaches by users.
math.pow(3, 39) == 3**39
False
Post by j***@gmail.com
np.int64(3)**39 == 3**39
True
but if a user does this, then ??? (headaches or head scratching)
Post by Antoine Pitrou
Post by j***@gmail.com
np.array([3])**39
RuntimeWarning: invalid value encountered in power
array([-2147483648], dtype=int32)
I forgot to add
the real headaches start in the second call, when we don't get the
RuntimeWarning anymore
Post by j***@gmail.com
Post by Antoine Pitrou
Post by j***@gmail.com
np.array([4])**39
array([-2147483648], dtype=int32)
("Now, why do I owe so much money, when I made a huge profit all year." )
(grumpy off-topic complaint:
The Canadian tax system is like this. They make a mistake in transferring
information to a new computerized system, and then they send a bill for
taxes based on reassessment of something that happened 5 years ago because
their computerized record is wrong.
)
Post by j***@gmail.com
Josef
Post by j***@gmail.com
Josef
Post by Antoine Pitrou
(as a sidenote, np.float64's equality operator seems to be slightly
Post by j***@gmail.com
np.float64(3)**39 == 3**39
True
Post by j***@gmail.com
int(np.float64(3)**39) == 3**39
False
Post by j***@gmail.com
float(np.float64(3)**39) == 3**39
False
)
Regards
Antoine.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
V. Armando Solé
2016-06-13 09:05:42 UTC
Permalink
Post by Allan Haldane
So as an extra twist in this discussion, this means numpy actually
type( np.uint64(2) ** np.int8(3) )
numpy.float64
Shouldn't that example end up the discussion? I find that behaviour for
any integer power of an np.uint64. I guess if something was to be
broken, I guess it is already the case.

We were given the choice between:

1 - Integers to negative integer powers raise an error.
2 - Integers to integer powers always results in floats.

and we were never given the choice to adapt the returned type to the
result. Assuming that option is not possible, it is certainly better
option 2 than 1 (why to refuse to perform a clearly defined
operation???) *and* returning a float is already the behaviour for
integer powers of np.uint64.

Armando
Allan Haldane
2016-06-13 17:07:53 UTC
Permalink
Post by V. Armando Solé
Post by Allan Haldane
So as an extra twist in this discussion, this means numpy actually
type( np.uint64(2) ** np.int8(3) )
numpy.float64
Shouldn't that example end up the discussion? I find that behaviour for
any integer power of an np.uint64. I guess if something was to be
broken, I guess it is already the case.
1 - Integers to negative integer powers raise an error.
2 - Integers to integer powers always results in floats.
and we were never given the choice to adapt the returned type to the
result. Assuming that option is not possible, it is certainly better
option 2 than 1 (why to refuse to perform a clearly defined
operation???) *and* returning a float is already the behaviour for
integer powers of np.uint64.
Not for any uints: "type( np.uint64(2) ** np.uint8(3) )" is uint64.

Although I brought it up I think the mixed dtype case is a bit of a red
herring. The single-dtype case is better to think about for now, eg
"np.uint64(2) ** np.uint64(3)".

Allan
Marten van Kerkwijk
2016-06-13 17:54:51 UTC
Permalink
Hi All,

​I think we're getting a little off the rails, perhaps because two
questions are being conflated:

1. What in principle is the best return type for int ** int (which Josef I
think most properly rephrased as whether `**` should be thought of as a
float operator, like `/` in python3 and `sqrt` etc.);

2. Whether one is willing to possibly break code by implementing this.

My sense is that most discussion is about (1), where a majority may well
agree the answer is float, instead of about (2), where it ends up boiling
down to a judgment call of "eternal small pain" or "possible short-time big
pain but consistency from now on".

Perhaps I can introduce an alternative (likely shot down immediately...).
For this, note that for division at least, numpy follows python closely, so
that one has the following in python2:
```
In [2]: np.arange(10) / 2
Out[2]: array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])

In [3]: from __future__ import division

In [4]: np.arange(10) / 2
Out[4]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
```
Since negative exponents are really just 1 over the positive one, could we
use the same logic for **? I.e., let what type is returned by int1 ** int2
be the same as that returned by int1 / int2? If we then also ensure that
for integer output type, int1 ** -int2 returns 1 // (int1 ** int2), we have
well-defined rules all around, so there would be no need for raising
zero-division error.

All the best,

Marten
Nathaniel Smith
2016-06-13 18:50:25 UTC
Permalink
Post by Charles R Harris
Hi All,
​I think we're getting a little off the rails, perhaps because two
1. What in principle is the best return type for int ** int (which Josef
I think most properly rephrased as whether `**` should be thought of as a
float operator, like `/` in python3 and `sqrt` etc.);
Post by Charles R Harris
2. Whether one is willing to possibly break code by implementing this.
My sense is that most discussion is about (1), where a majority may well
agree the answer is float, instead of about (2), where it ends up boiling
down to a judgment call of "eternal small pain" or "possible short-time big
pain but consistency from now on".
Post by Charles R Harris
Perhaps I can introduce an alternative (likely shot down immediately...).
For this, note that for division at least, numpy follows python closely, so
Post by Charles R Harris
```
In [2]: np.arange(10) / 2
Out[2]: array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])
In [3]: from __future__ import division
In [4]: np.arange(10) / 2
Out[4]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ,
4.5])
Post by Charles R Harris
```
Since negative exponents are really just 1 over the positive one, could
we use the same logic for **? I.e., let what type is returned by int1 **
int2 be the same as that returned by int1 / int2?

There isn't any reasonable way for numpy's ** operator to check whether the
caller has future division enabled, so I think this proposal boils down to,
int ** int returning int in py2 and float on py3? It has a certain
Solomonic appeal, in that I think it would make everyone equally unhappy
:-). But probably now is not the time to be introducing new py2/py3
incompatibilities...
Post by Charles R Harris
If we then also ensure that for integer output type, int1 ** -int2
returns 1 // (int1 ** int2), we have well-defined rules all around, so
there would be no need for raising zero-division error.

Not sure what to make of this part -- converting int ** -int into 1 // (int
** int) will return zero in almost all cases, is the unfortunate behavior
that kicked off this whole discussion. AFAICT everyone agrees that we don't
want *that*. And I don't think this gets around the need to decide how to
handle 0 ** -1, whether by raising ZeroDivisionError or what.

-n
Alan Isaac
2016-06-20 20:31:04 UTC
Permalink
Post by Marten van Kerkwijk
1. What in principle is the best return type for int ** int (which
Josef I think most properly rephrased as whether `**` should be
thought of as a float operator, like `/` in python3 and `sqrt` etc.);
Perhaps the question is somewhat different. Maybe it is: what type
should a user expect when the exponent is a Python int? The obvious
choices seem to be an object array of Python ints, or an array of
floats. So far, nobody has proposed the former, and concerns have
been expressed about the latter. More important, either would break
the rule that the scalar type is not important in array operations,
which seems like a good general rule (useful and easy to remember).

How much commitment is there to such a rule? E.g.,
np.int64(2**7)*np.arange(5,dtype=np.int8)
violates this. One thing that has come out of this
discussion for me is that the actual rules in play are
hard to keep track of. Are they all written down in
one place?

I suspect there is general support for the idea that if someone
explicitly specifies the same dtype for the base and the
exponent then the result should also have that dtype.
I think this is already true for array exponentiation
and for scalar exponentiation.

One other thing that a user might expect, I believe, is that
any type promotion rules for scalars and arrays will be the same.
This is not currently the case, and that feels like an
inconsistency. But is it an inconsistency? If the rule is that
that array type dominates the scalar type, that may
be understandable, but then it should be a firm rule.
In this case, an exponent that is a Python int should not
affect the dtype of the (array) result.

In sum, as a user, I've come around to Chuck's original proposal:
integers raised to negative integer powers raise an error.
My reason for coming around is that I believe it meshes
well with a general rule that in binary operations the
scalar dtypes should not influence the dtype of an array result.
Otoh, it is unclear to me how much commitment there is to that rule.

Thanks in advance to anyone who can help me understand better
the issues in play.

Cheers,
Alan Isaac
j***@gmail.com
2016-06-20 22:09:18 UTC
Permalink
Post by Alan Isaac
Post by Marten van Kerkwijk
1. What in principle is the best return type for int ** int (which
Josef I think most properly rephrased as whether `**` should be
thought of as a float operator, like `/` in python3 and `sqrt` etc.);
Perhaps the question is somewhat different. Maybe it is: what type
should a user expect when the exponent is a Python int? The obvious
choices seem to be an object array of Python ints, or an array of
floats. So far, nobody has proposed the former, and concerns have
been expressed about the latter. More important, either would break
the rule that the scalar type is not important in array operations,
which seems like a good general rule (useful and easy to remember).
How much commitment is there to such a rule? E.g.,
np.int64(2**7)*np.arange(5,dtype=np.int8)
violates this. One thing that has come out of this
discussion for me is that the actual rules in play are
hard to keep track of. Are they all written down in
one place?
I suspect there is general support for the idea that if someone
explicitly specifies the same dtype for the base and the
exponent then the result should also have that dtype.
I think this is already true for array exponentiation
and for scalar exponentiation.
One other thing that a user might expect, I believe, is that
any type promotion rules for scalars and arrays will be the same.
This is not currently the case, and that feels like an
inconsistency. But is it an inconsistency? If the rule is that
that array type dominates the scalar type, that may
be understandable, but then it should be a firm rule.
In this case, an exponent that is a Python int should not
affect the dtype of the (array) result.
integers raised to negative integer powers raise an error.
My reason for coming around is that I believe it meshes
well with a general rule that in binary operations the
scalar dtypes should not influence the dtype of an array result.
Otoh, it is unclear to me how much commitment there is to that rule.
Thanks in advance to anyone who can help me understand better
the issues in play.
the main thing I get out of the discussion in this thread is that this
is way to complicated.

which ints do I have?

is it python or one of the many numpy int types, or two different
(u)int types or maybe one is a scalar so it shouldn't count?


scalar dominates here
Post by Alan Isaac
Post by Marten van Kerkwijk
(np.ones(5, np.int8) *1.0).dtype
dtype('float64')

otherwise a huge amount of code would be broken that uses the *1. trick

Josef
Post by Alan Isaac
Cheers,
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2016-06-20 22:15:00 UTC
Permalink
Post by j***@gmail.com
Post by Alan Isaac
Post by Marten van Kerkwijk
1. What in principle is the best return type for int ** int (which
Josef I think most properly rephrased as whether `**` should be
thought of as a float operator, like `/` in python3 and `sqrt` etc.);
Perhaps the question is somewhat different. Maybe it is: what type
should a user expect when the exponent is a Python int? The obvious
choices seem to be an object array of Python ints, or an array of
floats. So far, nobody has proposed the former, and concerns have
been expressed about the latter. More important, either would break
the rule that the scalar type is not important in array operations,
which seems like a good general rule (useful and easy to remember).
How much commitment is there to such a rule? E.g.,
np.int64(2**7)*np.arange(5,dtype=np.int8)
violates this. One thing that has come out of this
discussion for me is that the actual rules in play are
hard to keep track of. Are they all written down in
one place?
I suspect there is general support for the idea that if someone
explicitly specifies the same dtype for the base and the
exponent then the result should also have that dtype.
I think this is already true for array exponentiation
and for scalar exponentiation.
One other thing that a user might expect, I believe, is that
any type promotion rules for scalars and arrays will be the same.
This is not currently the case, and that feels like an
inconsistency. But is it an inconsistency? If the rule is that
that array type dominates the scalar type, that may
be understandable, but then it should be a firm rule.
In this case, an exponent that is a Python int should not
affect the dtype of the (array) result.
integers raised to negative integer powers raise an error.
My reason for coming around is that I believe it meshes
well with a general rule that in binary operations the
scalar dtypes should not influence the dtype of an array result.
Otoh, it is unclear to me how much commitment there is to that rule.
Thanks in advance to anyone who can help me understand better
the issues in play.
the main thing I get out of the discussion in this thread is that this
is way to complicated.
which ints do I have?
is it python or one of the many numpy int types, or two different
(u)int types or maybe one is a scalar so it shouldn't count?
scalar dominates here
Post by Alan Isaac
Post by Marten van Kerkwijk
(np.ones(5, np.int8) *1.0).dtype
dtype('float64')
otherwise a huge amount of code would be broken that uses the *1. trick
I *think* the documented rule is that scalar *kind* matters (so we pay
attention to it being a float) but scalar *type* doesn't (we ignore
whether it's float64 versus float32) and scalar *value* doesn't (we
ignore whether it's 1.0 or 2.0**53). Obviously even this is not 100%
true, but I think it is the original intent.

My suspicion is that a better rule would be: *Python* types (int,
float, bool) are treated as having an unspecified width, but all numpy
types/dtypes are treated the same regardless of whether they're a
scalar or not. So np.int8(2) * 2 would return an int8, but np.int8(2)
* np.int64(2) would return an int64. But this is totally separate from
the issues around **, and would require a longer discussion and larger
overhaul of the typing system.

-n
--
Nathaniel J. Smith -- https://vorpus.org
Sebastian Berg
2016-06-20 22:22:56 UTC
Permalink
Post by Nathaniel Smith
Post by j***@gmail.com
Post by Marten van Kerkwijk
1. What in principle is the best return type for int ** int (which
Josef I think most properly rephrased as whether `**` should be
thought of as a float operator, like `/` in python3 and `sqrt` etc.);
Perhaps the question is somewhat different.  Maybe it is: what
type
should a user expect when the exponent is a Python int?  The
obvious
choices seem to be an object array of Python ints, or an array of
floats.  So far, nobody has proposed the former, and concerns
have
been expressed about the latter.  More important, either would
break
the rule that the scalar type is not important in array
operations,
which seems like a good general rule (useful and easy to
remember).
How much commitment is there to such a rule?  E.g.,
np.int64(2**7)*np.arange(5,dtype=np.int8)
violates this.  One thing that has come out of this
discussion for me is that the actual rules in play are
hard to keep track of.  Are they all written down in
one place?
I suspect there is general support for the idea that if someone
explicitly specifies the same dtype for the base and the
exponent then the result should also have that dtype.
I think this is already true for array exponentiation
and for scalar exponentiation.
One other thing that a user might expect, I believe, is that
any type promotion rules for scalars and arrays will be the same.
This is not currently the case, and that feels like an
inconsistency.  But is it an inconsistency?  If the rule is that
that array type dominates the scalar type, that may
be understandable, but then it should be a firm rule.
In this case, an exponent that is a Python int should not
affect the dtype of the (array) result.
integers raised to negative integer powers raise an error.
My reason for coming around is that I believe it meshes
well with a general rule that in binary operations the
scalar dtypes should not influence the dtype of an array result.
Otoh, it is unclear to me how much commitment there is to that rule.
Thanks in advance to anyone who can help me understand better
the issues in play.
the main thing I get out of the discussion in this thread is that this
is way to complicated.
which ints do I have?
is it python or one of the many numpy int types, or two different
(u)int types or maybe one is a scalar so it shouldn't count?
scalar dominates here
Post by Marten van Kerkwijk
(np.ones(5, np.int8) *1.0).dtype
dtype('float64')
otherwise a huge amount of code would be broken that uses the *1. trick
I *think* the documented rule is that scalar *kind* matters (so we pay
attention to it being a float) but scalar *type* doesn't (we ignore
whether it's float64 versus float32) and scalar *value* doesn't (we
ignore whether it's 1.0 or 2.0**53). Obviously even this is not 100%
true, but I think it is the original intent.
Except for int types, which force a result type large enough to hold
the input value.
Post by Nathaniel Smith
My suspicion is that a better rule would be: *Python* types (int,
float, bool) are treated as having an unspecified width, but all numpy
types/dtypes are treated the same regardless of whether they're a
scalar or not. So np.int8(2) * 2 would return an int8, but np.int8(2)
* np.int64(2) would return an int64. But this is totally separate from
the issues around **, and would require a longer discussion and larger
overhaul of the typing system.
I agree with that. The rule makes sense for python types, but somewhat
creates oddities for numpy types and could probably just be made more
array like there.

- Sebastian
Post by Nathaniel Smith
-n
Alan Isaac
2016-06-20 21:11:54 UTC
Permalink
Post by Allan Haldane
My understanding is that numpy never upcasts based on the values, it
upcasts based on the datatype ranges.
http://docs.scipy.org/doc/numpy-1.10.1/reference/ufuncs.html#casting-rules
(np.int64(2**6)*np.arange(5,dtype=np.int8)).dtype
dtype('int8')
Post by Allan Haldane
(np.int64(2**7)*np.arange(5,dtype=np.int8)).dtype
dtype('int16')

fwiw,
Alan Isaac
Nathaniel Smith
2016-06-20 21:59:44 UTC
Permalink
Post by Alan Isaac
Post by Allan Haldane
My understanding is that numpy never upcasts based on the values, it
upcasts based on the datatype ranges.
http://docs.scipy.org/doc/numpy-1.10.1/reference/ufuncs.html#casting-rules
(np.int64(2**6)*np.arange(5,dtype=np.int8)).dtype
dtype('int8')
Post by Allan Haldane
(np.int64(2**7)*np.arange(5,dtype=np.int8)).dtype
dtype('int16')
If you have the time to check for existing bug reports about this, and
file a new bug if you don't find one, then it'd be appreciated. I
suspect it's something that would be better handled as part of an
overhaul of the casting rules in general rather than on its own, but
it'd be good to at least have a record of it somewhere.

-n
--
Nathaniel J. Smith -- https://vorpus.org
Alan Isaac
2016-06-21 22:54:06 UTC
Permalink
Post by Nathaniel Smith
If you have the time to check for existing bug reports about this, and
file a new bug if you don't find one, then it'd be appreciated.
https://github.com/numpy/numpy/issues/7770

Alan

Allan Haldane
2016-06-10 19:59:08 UTC
Permalink
Post by Alan Isaac
Again, **almost all** integer combinations overflow: that's the point.
Don't almost all integer combinations overflow for multiplication as well?

I estimate that for unsigned 32 bit integers, only roughly 1 in 2e8
combinations don't overflow.

The fraction is approximately (np.euler_gamma + 32*np.log(2))/2.0**32,
if I didn't make a mistake.

:)
Allan
Ian Henriksen
2016-06-10 17:20:39 UTC
Permalink
Post by Nathaniel Smith
On Mon, Jun 6, 2016 at 1:17 PM, Charles R Harris
Post by Charles R Harris
On Mon, Jun 6, 2016 at 2:11 PM, Marten van Kerkwijk <
Post by Marten van Kerkwijk
Hi Chuck,
I consider either proposal an improvement, but among the two I favour
returning float for `**`, because, like for `/`, it ensures one gets
closest to the (mathematically) true answer in most cases, and makes
duck-typing that much easier -- I'd like to be able to do x** y without
having to worry whether x and y are python scalars or numpy arrays of
certain type.
Post by Charles R Harris
Post by Marten van Kerkwijk
I do agree with Nathaniel that it would be good to check what actually
breaks. Certainly, if anybody is up to making a PR that implements either
suggestion, I'd gladly check whether it breaks anything in astropy.
Post by Charles R Harris
Post by Marten van Kerkwijk
I should add that I have no idea how to assuage the fear that new code
would break with old versions of numpy, but on the other hand, I don't know
its vailidity either, as it seems one either develops larger projects for
multiple versions and tests, or writes more scripty things for whatever the
current versions are. Certainly, by this argument I better not start using
Post by Charles R Harris
Post by Marten van Kerkwijk
I do think the argument that for division it was easier because there
was `//` already available is a red herring: here one can use `np.power(a,
b, dtype=...)` if one really needs to.
Post by Charles R Harris
It looks to me like users want floats, while developers want the easy
path of raising an error. Darn those users, they just make life sooo
difficult...
I dunno, with my user hat on I'd be incredibly surprised / confused /
annoyed if an innocent-looking expression like
np.arange(10) ** 2
started returning floats... having exact ints is a really nice feature
of Python/numpy as compared to R/Javascript, and while it's true that
int64 can overflow, there are also large powers that can be more
precisely represented as int64 than float.
-n
This is very much my line of thinking as well. Generally when I'm doing
operations with integers, I expect integer output, regardless of floor
division and overflow.
There's a lot to both sides of the argument though. Python's arbitrary
precision integers alleviate overflow concerns very nicely, but forcing
float output for people who actually want integers is not at all ideal
either.
Best,
Ian Henriksen
Alan Isaac
2016-06-10 17:28:32 UTC
Permalink
forcing float output for people who actually want integers is not at all ideal
Yes, there definitely should be a function supporting this.

Alan
Loading...