[Numpy-discussion] Indexing structured masked arrays with multidimensional fields; what with fill

Gerrit Holl

2015-12-01 11:29:14 UTC

Hello,

usually, a masked array's .fill_value attribute has ndim=0 and the
same dtype as the data attribute:

In [27]: ar = array((0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], 0.0),
dtype="int, (2,3)float, float")

In [28]: arm = ma.masked_array(ar)

In [29]: arm.fill_value.ndim
Out[29]: 0

In [31]: arm.fill_value.dtype
Out[31]: dtype([('f0', '<i8'), ('f1', '<f8', (2, 3)), ('f2', '<f8')])

What would be the optimal approach for .fill_value if I address the
member "f1" in this case? The current behaviour is:

In [32]: f = arm["f1"]

In [36]: f.fill_value
Out[36]:
array([[ 1.00000000e+20, 1.00000000e+20, 1.00000000e+20],
[ 1.00000000e+20, 1.00000000e+20, 1.00000000e+20]])

This breaks the usual behaviour that .fill_value has ndim=0, which can
cause bugs such as reported in issue #6723:
https://github.com/numpy/numpy/issues/6723

What should numpy do instead? In pull request 6728, I propose to
change the behaviour so that arm["f1"].fill_value is set to
arm.fill_value["f1"].flat[0]. This is an arbitrary and somewhat
ad-hoc solution. If I have chosen to set arm.fill_value["f1"] to
something else, such as array([[1., 2., 3.], [4., 5., 6.]]), then the
rest of my fill_value is lost. I don't know if this might lead to
problems. Does it matter? See also
http://stackoverflow.com/questions/33921579/what-practical-impact-if-any-does-the-fill-value-of-a-masked-array-have
.

regards,
Gerrit.