Gutenkunst, Ryan N - (rgutenk)
2016-02-13 00:06:32 UTC
Hello all,
In 2009 I developed an application that uses a subclass of masked arrays as a central data object. My subclass Spectrum possesses additional attributes along with many custom methods. It was very convenient to be able to use standard numpy functions for doing arithmetic on these objects. However, my code broke with numpy 1.10. I've finally had a chance to track down the problem, and I am hoping someone can suggest a workaround.
See below for an example, which is as minimal as I could concoct. In this case, I have a Spectrum object that I'd like to take the logarithm of using numpy.ma.log, while preserving the value of the "folded" attribute. Up to numpy 1.9, this worked as expected, but in numpy 1.10 and 1.11 the attribute is not preserved.
The change in behavior appears to be driven by a commit made on Jun 16th, 2015 by Marten van Kerkwijk. In particular, the commit changed _MaskedUnaryOperation.__call__ so that the result array's update_from method is no longer called with the input array as the argument, but rather the result of the numpy UnaryOperation (old line 889, new line 885). Because that UnaryOperation doesn't carry my new attribute, it's not present for update_from to access. I notice that similar changes were made to MaskedBinaryOperation, although I haven't tested those. It's not clear to me from the commit message why this particular change was made, so I don't know whether this new behavior is intentional.
I know that subclassing arrays isn't widely encouraged, but it has been very convenient in my code. Is it still possible to subclass masked_array in such a way that functions like numpy.ma.log preserve additional attributes? If so, can someone point me in the right direction?
Thanks!
Ryan
*** Begin example
import numpy
print 'Working with numpy {0}'.format(numpy.__version__)
class Spectrum(numpy.ma.masked_array):
def __new__(cls, data, mask=numpy.ma.nomask, data_folded=None):
subarr = numpy.ma.masked_array(data, mask=mask, keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded
return subarr
def __array_finalize__(self, obj):
if obj is None:
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
def _update_from(self, obj):
print('Input to update_from: {0}'.format(repr(obj)))
numpy.ma.masked_array._update_from(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
def __repr__(self):
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))
fs1 = Spectrum([2,3,4.], data_folded=True)
fs2 = numpy.ma.log(fs1)
print('fs2.folded status: {0}'.format(fs2.folded))
print('Expectation is True, achieved with numpy 1.9')
*** End example
--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325
http://gutengroup.mcb.arizona.edu
Latest paper: "Computationally efficient composite likelihood statistics for demographic inference"
Molecular Biology and Evolution; http://dx.doi.org/10.1093/molbev/msv255
In 2009 I developed an application that uses a subclass of masked arrays as a central data object. My subclass Spectrum possesses additional attributes along with many custom methods. It was very convenient to be able to use standard numpy functions for doing arithmetic on these objects. However, my code broke with numpy 1.10. I've finally had a chance to track down the problem, and I am hoping someone can suggest a workaround.
See below for an example, which is as minimal as I could concoct. In this case, I have a Spectrum object that I'd like to take the logarithm of using numpy.ma.log, while preserving the value of the "folded" attribute. Up to numpy 1.9, this worked as expected, but in numpy 1.10 and 1.11 the attribute is not preserved.
The change in behavior appears to be driven by a commit made on Jun 16th, 2015 by Marten van Kerkwijk. In particular, the commit changed _MaskedUnaryOperation.__call__ so that the result array's update_from method is no longer called with the input array as the argument, but rather the result of the numpy UnaryOperation (old line 889, new line 885). Because that UnaryOperation doesn't carry my new attribute, it's not present for update_from to access. I notice that similar changes were made to MaskedBinaryOperation, although I haven't tested those. It's not clear to me from the commit message why this particular change was made, so I don't know whether this new behavior is intentional.
I know that subclassing arrays isn't widely encouraged, but it has been very convenient in my code. Is it still possible to subclass masked_array in such a way that functions like numpy.ma.log preserve additional attributes? If so, can someone point me in the right direction?
Thanks!
Ryan
*** Begin example
import numpy
print 'Working with numpy {0}'.format(numpy.__version__)
class Spectrum(numpy.ma.masked_array):
def __new__(cls, data, mask=numpy.ma.nomask, data_folded=None):
subarr = numpy.ma.masked_array(data, mask=mask, keep_mask=True,
shrink=True)
subarr = subarr.view(cls)
subarr.folded = data_folded
return subarr
def __array_finalize__(self, obj):
if obj is None:
return
numpy.ma.masked_array.__array_finalize__(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
def _update_from(self, obj):
print('Input to update_from: {0}'.format(repr(obj)))
numpy.ma.masked_array._update_from(self, obj)
self.folded = getattr(obj, 'folded', 'unspecified')
def __repr__(self):
return 'Spectrum(%s, folded=%s)'\
% (str(self), str(self.folded))
fs1 = Spectrum([2,3,4.], data_folded=True)
fs2 = numpy.ma.log(fs1)
print('fs2.folded status: {0}'.format(fs2.folded))
print('Expectation is True, achieved with numpy 1.9')
*** End example
--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325
http://gutengroup.mcb.arizona.edu
Latest paper: "Computationally efficient composite likelihood statistics for demographic inference"
Molecular Biology and Evolution; http://dx.doi.org/10.1093/molbev/msv255