Discussion:
[Numpy-discussion] reshaping empty array bug?
Benjamin Root
2016-02-23 16:32:12 UTC
Permalink
Not exactly sure if this should be a bug or not. This came up in a fairly
general function of mine to process satellite data. Unexpectedly, one of
the satellite files had no scans in it, triggering an exception when I
tried to reshape the data from it.
import numpy as np
a = np.zeros((0, 5*64))
a.shape
(0, 320)
a.shape = (0, 5, 64)
a.shape
(0, 5, 64)
a.shape = (0, 5*64)
a.shape = (0, 5, -1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged

So, if I know all of the dimensions, I can reshape just fine. But if I
wanted to use the nifty -1 semantic, it completely falls apart. I can see
arguments going either way for whether this is a bug or not.

Thoughts?

Ben Root
Warren Weckesser
2016-02-23 16:41:01 UTC
Permalink
Post by Benjamin Root
Not exactly sure if this should be a bug or not. This came up in a fairly
general function of mine to process satellite data. Unexpectedly, one of
the satellite files had no scans in it, triggering an exception when I
tried to reshape the data from it.
import numpy as np
a = np.zeros((0, 5*64))
a.shape
(0, 320)
a.shape = (0, 5, 64)
a.shape
(0, 5, 64)
a.shape = (0, 5*64)
a.shape = (0, 5, -1)
File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
So, if I know all of the dimensions, I can reshape just fine. But if I
wanted to use the nifty -1 semantic, it completely falls apart. I can see
arguments going either way for whether this is a bug or not.
When you try `a.shape = (0, 5, -1)`, the size of the third dimension is
ambiguous. From the Zen of Python: "In the face of ambiguity, refuse the
temptation to guess."

Warren
Post by Benjamin Root
Thoughts?
Ben Root
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Benjamin Root
2016-02-23 16:45:38 UTC
Permalink
but, it isn't really ambiguous, is it? The -1 can only refer to a single
dimension, and if you ignore the zeros in the original and new shape, the
-1 is easily solvable, right?

Ben Root

On Tue, Feb 23, 2016 at 11:41 AM, Warren Weckesser <
Post by Warren Weckesser
Post by Benjamin Root
Not exactly sure if this should be a bug or not. This came up in a fairly
general function of mine to process satellite data. Unexpectedly, one of
the satellite files had no scans in it, triggering an exception when I
tried to reshape the data from it.
import numpy as np
a = np.zeros((0, 5*64))
a.shape
(0, 320)
a.shape = (0, 5, 64)
a.shape
(0, 5, 64)
a.shape = (0, 5*64)
a.shape = (0, 5, -1)
File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
So, if I know all of the dimensions, I can reshape just fine. But if I
wanted to use the nifty -1 semantic, it completely falls apart. I can see
arguments going either way for whether this is a bug or not.
When you try `a.shape = (0, 5, -1)`, the size of the third dimension is
ambiguous. From the Zen of Python: "In the face of ambiguity, refuse the
temptation to guess."
Warren
Post by Benjamin Root
Thoughts?
Ben Root
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-02-23 18:58:41 UTC
Permalink
Post by Benjamin Root
but, it isn't really ambiguous, is it? The -1 can only refer to a
single dimension, and if you ignore the zeros in the original and new
shape, the -1 is easily solvable, right?
I think if there is a simple logic (like using 1 for all zeros in both
input and output shape for the -1 calculation), maybe we could do it. I
would like someone to think about it carefully that it would not also
allow some unexpected generalizations. And at least I am getting a
BrainOutOfResourcesError right now trying to figure that out :).

- Sebastian
Post by Benjamin Root
Ben Root
On Tue, Feb 23, 2016 at 11:41 AM, Warren Weckesser <
On Tue, Feb 23, 2016 at 11:32 AM, Benjamin Root <
Post by Benjamin Root
Not exactly sure if this should be a bug or not. This came up in
a fairly general function of mine to process satellite data.
Unexpectedly, one of the satellite files had no scans in it,
triggering an exception when I tried to reshape the data from it.
import numpy as np
a = np.zeros((0, 5*64))
a.shape
(0, 320)
a.shape = (0, 5, 64)
a.shape
(0, 5, 64)
a.shape = (0, 5*64)
a.shape = (0, 5, -1)
File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
So, if I know all of the dimensions, I can reshape just fine. But
if I wanted to use the nifty -1 semantic, it completely falls
apart. I can see arguments going either way for whether this is a
bug or not.
When you try `a.shape = (0, 5, -1)`, the size of the third
dimension is ambiguous. From the Zen of Python: "In the face of
ambiguity, refuse the temptation to guess."
Warren
Post by Benjamin Root
Thoughts?
Ben Root
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Benjamin Root
2016-02-23 19:57:25 UTC
Permalink
I'd be more than happy to write up the patch. I don't think it would be
quite like make zeros be ones, but it would be along those lines. One case
I need to wrap my head around is to make sure that a 0 would happen if the
Post by Sebastian Berg
Post by Benjamin Root
a = np.ones((0, 5*64))
a.shape = (-1, 5, 64)
EDIT: Just tried the above, and it works as expected (zero in the first
dim)!
Post by Sebastian Berg
Post by Benjamin Root
a.shape = (-1,)
a.shape
(0,)
Post by Sebastian Berg
Post by Benjamin Root
a.shape = (-1, 5, 64)
a.shape
(0, 5, 64)


This is looking more and more like a bug to me.

Ben Root
Post by Sebastian Berg
Post by Benjamin Root
but, it isn't really ambiguous, is it? The -1 can only refer to a
single dimension, and if you ignore the zeros in the original and new
shape, the -1 is easily solvable, right?
I think if there is a simple logic (like using 1 for all zeros in both
input and output shape for the -1 calculation), maybe we could do it. I
would like someone to think about it carefully that it would not also
allow some unexpected generalizations. And at least I am getting a
BrainOutOfResourcesError right now trying to figure that out :).
- Sebastian
Post by Benjamin Root
Ben Root
On Tue, Feb 23, 2016 at 11:41 AM, Warren Weckesser <
On Tue, Feb 23, 2016 at 11:32 AM, Benjamin Root <
Post by Benjamin Root
Not exactly sure if this should be a bug or not. This came up in
a fairly general function of mine to process satellite data.
Unexpectedly, one of the satellite files had no scans in it,
triggering an exception when I tried to reshape the data from it.
import numpy as np
a = np.zeros((0, 5*64))
a.shape
(0, 320)
a.shape = (0, 5, 64)
a.shape
(0, 5, 64)
a.shape = (0, 5*64)
a.shape = (0, 5, -1)
File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
So, if I know all of the dimensions, I can reshape just fine. But
if I wanted to use the nifty -1 semantic, it completely falls
apart. I can see arguments going either way for whether this is a
bug or not.
When you try `a.shape = (0, 5, -1)`, the size of the third
dimension is ambiguous. From the Zen of Python: "In the face of
ambiguity, refuse the temptation to guess."
Warren
Post by Benjamin Root
Thoughts?
Ben Root
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-02-23 20:06:43 UTC
Permalink
Post by Benjamin Root
I'd be more than happy to write up the patch. I don't think it would
be quite like make zeros be ones, but it would be along those lines.
One case I need to wrap my head around is to make sure that a 0 would
Post by Sebastian Berg
Post by Benjamin Root
a = np.ones((0, 5*64))
a.shape = (-1, 5, 64)
EDIT: Just tried the above, and it works as expected (zero in the
first dim)!
Post by Sebastian Berg
Post by Benjamin Root
a.shape = (-1,)
a.shape
(0,)
Post by Sebastian Berg
Post by Benjamin Root
a.shape = (-1, 5, 64)
a.shape
(0, 5, 64)
Seems right to me on first sight :). (I don't like shape assignments
though, who cares for one extra view). Well, maybe 1 instead of 0
(ignore 0s), but if the result for -1 is to use 1 and the shape is 0
convert the 1 back to 0. But it is starting to sound a bit tricky,
though I think it might be straight forward (i.e. no real traps and
when it works it always is what you expect).
The main point is, whether you can design cases where the conversion
back to 0 hides bugs by not failing when it should. And whether that
would be a tradeoff we are willing to accept.

- Sebastian
Post by Benjamin Root
This is looking more and more like a bug to me.
Ben Root
On Tue, Feb 23, 2016 at 1:58 PM, Sebastian Berg <
Post by Sebastian Berg
Post by Benjamin Root
but, it isn't really ambiguous, is it? The -1 can only refer to a
single dimension, and if you ignore the zeros in the original and
new
Post by Benjamin Root
shape, the -1 is easily solvable, right?
I think if there is a simple logic (like using 1 for all zeros in both
input and output shape for the -1 calculation), maybe we could do it. I
would like someone to think about it carefully that it would not also
allow some unexpected generalizations. And at least I am getting a
BrainOutOfResourcesError right now trying to figure that out :).
- Sebastian
Post by Benjamin Root
Ben Root
On Tue, Feb 23, 2016 at 11:41 AM, Warren Weckesser <
On Tue, Feb 23, 2016 at 11:32 AM, Benjamin Root <
Post by Benjamin Root
Not exactly sure if this should be a bug or not. This came up
in
Post by Benjamin Root
Post by Benjamin Root
a fairly general function of mine to process satellite data.
Unexpectedly, one of the satellite files had no scans in it,
triggering an exception when I tried to reshape the data from
it.
Post by Benjamin Root
Post by Benjamin Root
import numpy as np
a = np.zeros((0, 5*64))
a.shape
(0, 320)
a.shape = (0, 5, 64)
a.shape
(0, 5, 64)
a.shape = (0, 5*64)
a.shape = (0, 5, -1)
File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
So, if I know all of the dimensions, I can reshape just fine.
But
Post by Benjamin Root
Post by Benjamin Root
if I wanted to use the nifty -1 semantic, it completely falls
apart. I can see arguments going either way for whether this
is a
Post by Benjamin Root
Post by Benjamin Root
bug or not.
When you try `a.shape = (0, 5, -1)`, the size of the third
dimension is ambiguous. From the Zen of Python: "In the face
of
Post by Benjamin Root
ambiguity, refuse the temptation to guess."
Warren
Post by Benjamin Root
Thoughts?
Ben Root
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-02-23 20:14:03 UTC
Permalink
Post by Sebastian Berg
Post by Benjamin Root
I'd be more than happy to write up the patch. I don't think it would
be quite like make zeros be ones, but it would be along those lines.
One case I need to wrap my head around is to make sure that a 0 would
Post by Sebastian Berg
Post by Benjamin Root
a = np.ones((0, 5*64))
a.shape = (-1, 5, 64)
EDIT: Just tried the above, and it works as expected (zero in the
first dim)!
Post by Sebastian Berg
Post by Benjamin Root
a.shape = (-1,)
a.shape
(0,)
Post by Sebastian Berg
Post by Benjamin Root
a.shape = (-1, 5, 64)
a.shape
(0, 5, 64)
Seems right to me on first sight :). (I don't like shape assignments
though, who cares for one extra view). Well, maybe 1 instead of 0
(ignore 0s), but if the result for -1 is to use 1 and the shape is 0
convert the 1 back to 0. But it is starting to sound a bit tricky,
though I think it might be straight forward (i.e. no real traps and
when it works it always is what you expect).
The main point is, whether you can design cases where the conversion
back to 0 hides bugs by not failing when it should. And whether that
would be a tradeoff we are willing to accept.
Another thought. Maybe you can figure out the -1 correctly, if there is
no *other* 0 involved. If there is any other 0, I could imagine
problems.
Post by Sebastian Berg
- Sebastian
Post by Benjamin Root
This is looking more and more like a bug to me.
Ben Root
On Tue, Feb 23, 2016 at 1:58 PM, Sebastian Berg <
Post by Sebastian Berg
Post by Benjamin Root
but, it isn't really ambiguous, is it? The -1 can only refer to a
single dimension, and if you ignore the zeros in the original and
new
Post by Benjamin Root
shape, the -1 is easily solvable, right?
I think if there is a simple logic (like using 1 for all zeros in both
input and output shape for the -1 calculation), maybe we could do it. I
would like someone to think about it carefully that it would not also
allow some unexpected generalizations. And at least I am getting a
BrainOutOfResourcesError right now trying to figure that out :).
- Sebastian
Post by Benjamin Root
Ben Root
On Tue, Feb 23, 2016 at 11:41 AM, Warren Weckesser <
On Tue, Feb 23, 2016 at 11:32 AM, Benjamin Root <
Post by Benjamin Root
Not exactly sure if this should be a bug or not. This came up
in
Post by Benjamin Root
Post by Benjamin Root
a fairly general function of mine to process satellite data.
Unexpectedly, one of the satellite files had no scans in it,
triggering an exception when I tried to reshape the data from
it.
Post by Benjamin Root
Post by Benjamin Root
import numpy as np
a = np.zeros((0, 5*64))
a.shape
(0, 320)
a.shape = (0, 5, 64)
a.shape
(0, 5, 64)
a.shape = (0, 5*64)
a.shape = (0, 5, -1)
File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
So, if I know all of the dimensions, I can reshape just fine.
But
Post by Benjamin Root
Post by Benjamin Root
if I wanted to use the nifty -1 semantic, it completely falls
apart. I can see arguments going either way for whether this
is a
Post by Benjamin Root
Post by Benjamin Root
bug or not.
When you try `a.shape = (0, 5, -1)`, the size of the third
dimension is ambiguous. From the Zen of Python: "In the face
of
Post by Benjamin Root
ambiguity, refuse the temptation to guess."
Warren
Post by Benjamin Root
Thoughts?
Ben Root
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2016-02-23 20:14:25 UTC
Permalink
Post by Benjamin Root
but, it isn't really ambiguous, is it? The -1 can only refer to a single
dimension, and if you ignore the zeros in the original and new shape, the -1
is easily solvable, right?
Sure, it's totally ambiguous. These are all legal:

In [1]: a = np.zeros((0, 5, 64))

In [2]: a.shape = (0, 5 * 64)

In [3]: a.shape = (0, 5 * 65)

In [4]: a.shape = (0, 5, 102)

In [5]: a.shape = (0, 102, 64)

Generally, the -1 gets replaced by prod(old_shape) //
prod(specified_entries_in_new_shape). If the specified new shape has a
0 in it, then this is a divide-by-zero. In this case it happens
because it's the solution to the equation
prod((0, 5, 64)) == prod((0, 5, x))
for which there is no unique solution for 'x'.

Your proposed solution feels very heuristic-y to me, and heuristics
make me very nervous :-/

If what you really want to say is "flatten axes 1 and 2 together",
then maybe there should be some API that lets you directly specify
*that*? As a bonus you might be able to avoid awkward tuple
manipulations to compute the new shape.

-n
--
Nathaniel J. Smith -- https://vorpus.org
Benjamin Root
2016-02-23 20:23:03 UTC
Permalink
I would argue that except for the first reshape, all of those should be an
error, and that the current algorithm is buggy.

This isn't a heuristic. It isn't guessing. It is making the semantics
consistent. The fact that I can do:
a.shape = (-1, 5, 64)
or
a.shape = (0, 5, 64)

but not
a.shape = (0, 5, -1)

is totally inconsistent.

Ben Root
Nathaniel Smith
2016-02-23 20:30:41 UTC
Permalink
Post by Benjamin Root
I would argue that except for the first reshape, all of those should be an
error, and that the current algorithm is buggy.
Reshape doesn't care about axes at all; all it cares about is that the
number of elements stay the same. E.g. this is also totally legal:

np.zeros((12, 5)).reshape((10, 3, 2))

And so are the equivalents

np.zeros((12, 5)).reshape((-1, 3, 2))
np.zeros((12, 5)).reshape((10, -1, 2))
np.zeros((12, 5)).reshape((10, 3, -1))
Post by Benjamin Root
This isn't a heuristic. It isn't guessing. It is making the semantics
a.shape = (-1, 5, 64)
or
a.shape = (0, 5, 64)
but not
a.shape = (0, 5, -1)
is totally inconsistent.
It's certainly annoying and unpleasant, but it follows inevitably from
the most natural way of defining the -1 semantics, so I'm not sure I'd
say "inconsistent" :-)

What should this do?

np.zeros((12, 0)).reshape((10, -1, 2))

-n
--
Nathaniel J. Smith -- https://vorpus.org
Benjamin Root
2016-02-23 20:50:14 UTC
Permalink
Post by Nathaniel Smith
What should this do?
np.zeros((12, 0)).reshape((10, -1, 2))
It should error out, I already covered that. 12 != 20.

Ben Root

Loading...