Discussion:
[Numpy-discussion] Subclassing ma.masked_array, code broken after version 1.9
Gutenkunst, Ryan N - (rgutenk)
2016-02-13 00:06:32 UTC
Permalink
Hello all,

In 2009 I developed an application that uses a subclass of masked arrays as a central data object. My subclass Spectrum possesses additional attributes along with many custom methods. It was very convenient to be able to use standard numpy functions for doing arithmetic on these objects. However, my code broke with numpy 1.10. I've finally had a chance to track down the problem, and I am hoping someone can suggest a workaround.

See below for an example, which is as minimal as I could concoct. In this case, I have a Spectrum object that I'd like to take the logarithm of using numpy.ma.log, while preserving the value of the "folded" attribute. Up to numpy 1.9, this worked as expected, but in numpy 1.10 and 1.11 the attribute is not preserved.

The change in behavior appears to be driven by a commit made on Jun 16th, 2015 by Marten van Kerkwijk. In particular, the commit changed _MaskedUnaryOperation.__call__ so that the result array's update_from method is no longer called with the input array as the argument, but rather the result of the numpy UnaryOperation (old line 889, new line 885). Because that UnaryOperation doesn't carry my new attribute, it's not present for update_from to access. I notice that similar changes were made to MaskedBinaryOperation, although I haven't tested those. It's not clear to me from the commit message why this particular change was made, so I don't know whether this new behavior is intentional.

I know that subclassing arrays isn't widely encouraged, but it has been very convenient in my code. Is it still possible to subclass masked_array in such a way that functions like numpy.ma.log preserve additional attributes? If so, can someone point me in the right direction?

Thanks!
Ryan

*** Begin example

import numpy
print 'Working with numpy {0}'.format(numpy.__version__)

class Spectrum(numpy.ma.masked_array):
def __new__(cls, data, mask=numpy.ma.nomask, data_folded=None):
subarr = numpy.ma.masked_array(data, mask=mask, keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded

return subarr

def __array_finalize__(self, obj):
if obj is None:
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')

def _update_from(self, obj):
print('Input to update_from: {0}'.format(repr(obj)))
numpy.ma.masked_array._update_from(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')

def __repr__(self):
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))

fs1 = Spectrum([2,3,4.], data_folded=True)
fs2 = numpy.ma.log(fs1)
print('fs2.folded status: {0}'.format(fs2.folded))
print('Expectation is True, achieved with numpy 1.9')

*** End example

--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325
http://gutengroup.mcb.arizona.edu
Latest paper: "Computationally efficient composite likelihood statistics for demographic inference"
Molecular Biology and Evolution; http://dx.doi.org/10.1093/molbev/msv255
Jonathan Helmus
2016-02-13 18:48:57 UTC
Permalink
Post by Gutenkunst, Ryan N - (rgutenk)
Hello all,
In 2009 I developed an application that uses a subclass of masked arrays as a central data object. My subclass Spectrum possesses additional attributes along with many custom methods. It was very convenient to be able to use standard numpy functions for doing arithmetic on these objects. However, my code broke with numpy 1.10. I've finally had a chance to track down the problem, and I am hoping someone can suggest a workaround.
See below for an example, which is as minimal as I could concoct. In this case, I have a Spectrum object that I'd like to take the logarithm of using numpy.ma.log, while preserving the value of the "folded" attribute. Up to numpy 1.9, this worked as expected, but in numpy 1.10 and 1.11 the attribute is not preserved.
The change in behavior appears to be driven by a commit made on Jun 16th, 2015 by Marten van Kerkwijk. In particular, the commit changed _MaskedUnaryOperation.__call__ so that the result array's update_from method is no longer called with the input array as the argument, but rather the result of the numpy UnaryOperation (old line 889, new line 885). Because that UnaryOperation doesn't carry my new attribute, it's not present for update_from to access. I notice that similar changes were made to MaskedBinaryOperation, although I haven't tested those. It's not clear to me from the commit message why this particular change was made, so I don't know whether this new behavior is intentional.
I know that subclassing arrays isn't widely encouraged, but it has been very convenient in my code. Is it still possible to subclass masked_array in such a way that functions like numpy.ma.log preserve additional attributes? If so, can someone point me in the right direction?
Thanks!
Ryan
*** Begin example
import numpy
print 'Working with numpy {0}'.format(numpy.__version__)
subarr = numpy.ma.masked_array(data, mask=mask, keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded
return subarr
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
print('Input to update_from: {0}'.format(repr(obj)))
numpy.ma.masked_array._update_from(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))
fs1 = Spectrum([2,3,4.], data_folded=True)
fs2 = numpy.ma.log(fs1)
print('fs2.folded status: {0}'.format(fs2.folded))
print('Expectation is True, achieved with numpy 1.9')
*** End example
--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325
http://gutengroup.mcb.arizona.edu
Latest paper: "Computationally efficient composite likelihood statistics for demographic inference"
Molecular Biology and Evolution; http://dx.doi.org/10.1093/molbev/msv255
Ryan,

I'm not sure if you will be able to get this to work as in NumPy 1.9,
but the __array_wrap__ method is intended to be the mechanism for
subclasses to set their return type, adjust metadata, etc [1].
Unfortunately, the numpy.ma.log function does not seem to make a call
to __array_wrap__ (at least in NumPy 1.10.2) although numpy.log does:

from __future__ import print_function
import numpy
print('Working with numpy {0}'.format(numpy.__version__))


class Spectrum(numpy.ma.masked_array):
def __new__(cls, data, mask=numpy.ma.nomask, data_folded=None):
subarr = numpy.ma.masked_array(data, mask=mask, keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded

return subarr

def __array_finalize__(self, obj):
if obj is None:
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')

def __array_wrap__(self, out_arr, context=None):
print('__array_wrap__ called')
return numpy.ndarray.__array_wrap__(self, out_arr, context)

def __repr__(self):
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))

fs1 = Spectrum([2,3,4.], data_folded=True)

print('numpy.ma.log:')
fs2 = numpy.ma.log(fs1)
print('fs2 type:', type(fs2))
print('fs2.folded status: {0}'.format(fs2.folded))

print('numpy.log:')
fs3 = numpy.log(fs1)
print('fs3 type:', type(fs3))
print('fs3.folded status: {0}'.format(fs3.folded))

----
$ python example.py
Working with numpy 1.10.2
numpy.ma.log:
fs2 type: <class '__main__.Spectrum'>
fs2.folded status: unspecified
numpy.log:
__array_wrap__ called
fs3 type: <class '__main__.Spectrum'>
fs3.folded status: True


The change mentioned in the original message was made in pull request
3907 [2] in case anyone wants to have a look.

Cheers,

- Jonathan Helmus

[1]
http://docs.scipy.org/doc/numpy-1.10.1/user/basics.subclassing.html#array-wrap-for-ufuncs
[2] https://github.com/numpy/numpy/pull/3907
Gutenkunst, Ryan N - (rgutenk)
2016-02-15 17:06:28 UTC
Permalink
Thank Jonathan,

Good to confirm this isn't something inappropriate I'm doing. I give up transparency here in my application, so I'll just work around it. I leave it up to wiser numpy heads as to whether it's worth altering these numpy.ma functions to enable subclassing.

Best,
Ryan
Post by Jonathan Helmus
Post by Gutenkunst, Ryan N - (rgutenk)
Hello all,
In 2009 I developed an application that uses a subclass of masked arrays as a central data object. My subclass Spectrum possesses additional attributes along with many custom methods. It was very convenient to be able to use standard numpy functions for doing arithmetic on these objects. However, my code broke with numpy 1.10. I've finally had a chance to track down the problem, and I am hoping someone can suggest a workaround.
See below for an example, which is as minimal as I could concoct. In this case, I have a Spectrum object that I'd like to take the logarithm of using numpy.ma.log, while preserving the value of the "folded" attribute. Up to numpy 1.9, this worked as expected, but in numpy 1.10 and 1.11 the attribute is not preserved.
The change in behavior appears to be driven by a commit made on Jun 16th, 2015 by Marten van Kerkwijk. In particular, the commit changed _MaskedUnaryOperation.__call__ so that the result array's update_from method is no longer called with the input array as the argument, but rather the result of the numpy UnaryOperation (old line 889, new line 885). Because that UnaryOperation doesn't carry my new attribute, it's not present for update_from to access. I notice that similar changes were made to MaskedBinaryOperation, although I haven't tested those. It's not clear to me from the commit message why this particular change was made, so I don't know whether this new behavior is intentional.
I know that subclassing arrays isn't widely encouraged, but it has been very convenient in my code. Is it still possible to subclass masked_array in such a way that functions like numpy.ma.log preserve additional attributes? If so, can someone point me in the right direction?
Thanks!
Ryan
*** Begin example
import numpy
print 'Working with numpy {0}'.format(numpy.__version__)
subarr = numpy.ma.masked_array(data, mask=mask, keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded
return subarr
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
print('Input to update_from: {0}'.format(repr(obj)))
numpy.ma.masked_array._update_from(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))
fs1 = Spectrum([2,3,4.], data_folded=True)
fs2 = numpy.ma.log(fs1)
print('fs2.folded status: {0}'.format(fs2.folded))
print('Expectation is True, achieved with numpy 1.9')
*** End example
--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325
http://gutengroup.mcb.arizona.edu
Latest paper: "Computationally efficient composite likelihood statistics for demographic inference"
Molecular Biology and Evolution; http://dx.doi.org/10.1093/molbev/msv255
Ryan,
from __future__ import print_function
import numpy
print('Working with numpy {0}'.format(numpy.__version__))
subarr = numpy.ma.masked_array(data, mask=mask, keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded
return subarr
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
print('__array_wrap__ called')
return numpy.ndarray.__array_wrap__(self, out_arr, context)
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))
fs1 = Spectrum([2,3,4.], data_folded=True)
print('numpy.ma.log:')
fs2 = numpy.ma.log(fs1)
print('fs2 type:', type(fs2))
print('fs2.folded status: {0}'.format(fs2.folded))
print('numpy.log:')
fs3 = numpy.log(fs1)
print('fs3 type:', type(fs3))
print('fs3.folded status: {0}'.format(fs3.folded))
----
$ python example.py
Working with numpy 1.10.2
fs2 type: <class '__main__.Spectrum'>
fs2.folded status: unspecified
__array_wrap__ called
fs3 type: <class '__main__.Spectrum'>
fs3.folded status: True
The change mentioned in the original message was made in pull request 3907 [2] in case anyone wants to have a look.
Cheers,
- Jonathan Helmus
[1] http://docs.scipy.org/doc/numpy-1.10.1/user/basics.subclassing.html#array-wrap-for-ufuncs
[2] https://github.com/numpy/numpy/pull/3907
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325
http://gutengroup.mcb.arizona.edu
Latest paper: "Computationally efficient composite likelihood statistics for demographic inference"
Molecular Biology and Evolution; http://dx.doi.org/10.1093/molbev/msv255
Sebastian Berg
2016-02-15 18:31:48 UTC
Permalink
Post by Gutenkunst, Ryan N - (rgutenk)
Thank Jonathan,
Good to confirm this isn't something inappropriate I'm doing. I give
up transparency here in my application, so I'll just work around it.
I leave it up to wiser numpy heads as to whether it's worth altering
these numpy.ma functions to enable subclassing.
Frankly, when it comes to masked array stuff, at least I am puzzled
most of the time, so input is very welcome.
Most of the people currently contributing, barely use masked arrays as
far as I know, and sometimes it is hard to make good calls. It is a not
the easiest code base and any feedback or nudging is important. A new
release is about to come out, and if you feel it there is a serious
regression, we may want to push for fixing it (or even better, you may
have time to suggest a fix yourself).

- Sebastian
Post by Gutenkunst, Ryan N - (rgutenk)
Best,
Ryan
Post by Jonathan Helmus
Post by Gutenkunst, Ryan N - (rgutenk)
Hello all,
In 2009 I developed an application that uses a subclass of masked
arrays as a central data object. My subclass Spectrum possesses
additional attributes along with many custom methods. It was very
convenient to be able to use standard numpy functions for doing
arithmetic on these objects. However, my code broke with numpy
1.10. I've finally had a chance to track down the problem, and I
am hoping someone can suggest a workaround.
See below for an example, which is as minimal as I could concoct.
In this case, I have a Spectrum object that I'd like to take the
logarithm of using numpy.ma.log, while preserving the value of
the "folded" attribute. Up to numpy 1.9, this worked as expected,
but in numpy 1.10 and 1.11 the attribute is not preserved.
The change in behavior appears to be driven by a commit made on
Jun 16th, 2015 by Marten van Kerkwijk. In particular, the commit
changed _MaskedUnaryOperation.__call__ so that the result array's
update_from method is no longer called with the input array as
the argument, but rather the result of the numpy UnaryOperation
(old line 889, new line 885). Because that UnaryOperation doesn't
carry my new attribute, it's not present for update_from to
access. I notice that similar changes were made to
MaskedBinaryOperation, although I haven't tested those. It's not
clear to me from the commit message why this particular change
was made, so I don't know whether this new behavior is
intentional.
I know that subclassing arrays isn't widely encouraged, but it
has been very convenient in my code. Is it still possible to
subclass masked_array in such a way that functions like
numpy.ma.log preserve additional attributes? If so, can someone
point me in the right direction?
Thanks!
Ryan
*** Begin example
import numpy
print 'Working with numpy {0}'.format(numpy.__version__)
def __new__(cls, data, mask=numpy.ma.nomask,
subarr = numpy.ma.masked_array(data, mask=mask,
keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded
return subarr
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
print('Input to update_from: {0}'.format(repr(obj)))
numpy.ma.masked_array._update_from(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))
fs1 = Spectrum([2,3,4.], data_folded=True)
fs2 = numpy.ma.log(fs1)
print('fs2.folded status: {0}'.format(fs2.folded))
print('Expectation is True, achieved with numpy 1.9')
*** End example
--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325
http://gutengroup.mcb.arizona.edu
Latest paper: "Computationally efficient composite likelihood
statistics for demographic inference"
Molecular Biology and Evolution;
http://dx.doi.org/10.1093/molbev/msv255
Ryan,
I'm not sure if you will be able to get this to work as in NumPy
1.9, but the __array_wrap__ method is intended to be the mechanism
for subclasses to set their return type, adjust metadata, etc [1].
Unfortunately, the numpy.ma.log function does not seem to make a
call to __array_wrap__ (at least in NumPy 1.10.2) although
from __future__ import print_function
import numpy
print('Working with numpy {0}'.format(numpy.__version__))
subarr = numpy.ma.masked_array(data, mask=mask,
keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded
return subarr
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
print('__array_wrap__ called')
return numpy.ndarray.__array_wrap__(self, out_arr, context)
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))
fs1 = Spectrum([2,3,4.], data_folded=True)
print('numpy.ma.log:')
fs2 = numpy.ma.log(fs1)
print('fs2 type:', type(fs2))
print('fs2.folded status: {0}'.format(fs2.folded))
print('numpy.log:')
fs3 = numpy.log(fs1)
print('fs3 type:', type(fs3))
print('fs3.folded status: {0}'.format(fs3.folded))
----
$ python example.py
Working with numpy 1.10.2
fs2 type: <class '__main__.Spectrum'>
fs2.folded status: unspecified
__array_wrap__ called
fs3 type: <class '__main__.Spectrum'>
fs3.folded status: True
The change mentioned in the original message was made in pull
request 3907 [2] in case anyone wants to have a look.
Cheers,
- Jonathan Helmus
[1] http://docs.scipy.org/doc/numpy-1.10.1/user/basics.subclassing.
html#array-wrap-for-ufuncs
[2] https://github.com/numpy/numpy/pull/3907
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325
http://gutengroup.mcb.arizona.edu
Latest paper: "Computationally efficient composite likelihood
statistics for demographic inference"
Molecular Biology and Evolution;
http://dx.doi.org/10.1093/molbev/msv255
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2016-02-16 04:09:13 UTC
Permalink
On Mon, Feb 15, 2016 at 10:06 AM, Gutenkunst, Ryan N - (rgutenk) <
Post by Gutenkunst, Ryan N - (rgutenk)
Thank Jonathan,
Good to confirm this isn't something inappropriate I'm doing. I give up
transparency here in my application, so I'll just work around it. I leave
it up to wiser numpy heads as to whether it's worth altering these
numpy.ma functions to enable subclassing.
There is a known bug MaskedArrays that might account for this. It will
hopefully be fixed in the next beta.

Chuck

Loading...