Stuart Reynolds
2016-11-15 17:37:42 UTC
I'm trying to subclass an ndarray so that I can add some additional fields.
When I do this however, I get new odd behavior when my object is passed to
a variety of numpy functions. For example nanmin returns now return an
object of the type of my new array class, whereas previously I'd get a
float64. Why? Is this a bug with nanmin or my class?
import numpy as np
class NDArrayWithColumns(np.ndarray):
def __new__(cls, obj, columns=None):
obj = obj.view(cls)
obj.columns = tuple(columns)
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.columns = getattr(obj, 'columns', None)
NAN = float("nan")
r = np.array([1.,0.,1.,0.,1.,0.,1.,0.,NAN, 1., 1.])print "MIN",
np.nanmin(r), type(np.nanmin(r))
gives:
MIN 0.0 <type 'numpy.float64'>
but
Note the change in type, and also that str(np.nanmin(r)) shows 1 field, not
11 as indicated by its shape. This seems wrong. Is there a way to get my
subclass to behave more like an ndarray?
I realize from the docs that I can override __array_wrap__, but its not
clear me how how to use it to solve this issue. Or whether its the right
tool.
In case you're interested, I'm subclassing because I'd like to track column
names in matrices of a single type. This is pretty common wish in scikit
pipelines. Structured arrays and record type arrays allow for varying type.
Pandas provides this functionality, but dealing with numpy arrays is easier
(and more efficient) when writing cython extensions. Also, I think the
structured arrays and record types are unlikely to play nice with cython
because they're more freely typed -- I want to deal exclusively with arrays
of doubles.
Any thoughts of how to subclass ndarray and keep original behavior in
ufuncs?
When I do this however, I get new odd behavior when my object is passed to
a variety of numpy functions. For example nanmin returns now return an
object of the type of my new array class, whereas previously I'd get a
float64. Why? Is this a bug with nanmin or my class?
import numpy as np
class NDArrayWithColumns(np.ndarray):
def __new__(cls, obj, columns=None):
obj = obj.view(cls)
obj.columns = tuple(columns)
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.columns = getattr(obj, 'columns', None)
NAN = float("nan")
r = np.array([1.,0.,1.,0.,1.,0.,1.,0.,NAN, 1., 1.])print "MIN",
np.nanmin(r), type(np.nanmin(r))
gives:
MIN 0.0 <type 'numpy.float64'>
but
r = NDArrayWithColumns(r, ["a"])>>> print "MIN", np.nanmin(r), type(np.nanmin(r))
MIN 0.0 <class '__main__.NDArrayWithColumns'>>>> print r.shape # ?!(11,)Note the change in type, and also that str(np.nanmin(r)) shows 1 field, not
11 as indicated by its shape. This seems wrong. Is there a way to get my
subclass to behave more like an ndarray?
I realize from the docs that I can override __array_wrap__, but its not
clear me how how to use it to solve this issue. Or whether its the right
tool.
In case you're interested, I'm subclassing because I'd like to track column
names in matrices of a single type. This is pretty common wish in scikit
pipelines. Structured arrays and record type arrays allow for varying type.
Pandas provides this functionality, but dealing with numpy arrays is easier
(and more efficient) when writing cython extensions. Also, I think the
structured arrays and record types are unlikely to play nice with cython
because they're more freely typed -- I want to deal exclusively with arrays
of doubles.
Any thoughts of how to subclass ndarray and keep original behavior in
ufuncs?