On Tue, Sep 29, 2015 at 2:07 PM, Sebastian Berg
[...]
Post by Sebastian BergPost by Nathaniel SmithIn general I'm not a big fan of trying to do all kinds of guessing
about how to handle random objects in object arrays, the kind that
ends up with a big chain of type checks and fallback behaviors. Pretty
soon we find ourselves trying to extend the language with our own
generic dispatch system for arbitrary python types, just for object
arrays. (The current hack where for object arrays np.log will try
calling obj.log() is particularly horrible. There is no rule in python
that "log" is a reserved method name for "logarithm" on arbitrary
objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be
supplanted by better dtype support, so now may not be the best time to
invest heavily in making object arrays complicated and powerful.
I have the little dream here that what could happen is that we create a
PyFloatDtype kind of thing (it is a bit different from our float because
it would always convert back to a python float and maybe raises more
errors), which "registers" with the dtype system in that it says "I know
how to handle python floats and store them in an array and provide ufunc
implementations for it".
Then, the "object" dtype ufuncs would try to call the ufunc on each
element, including "conversion". They would find a "float", since it is
not an array-like container, they interpret it as a PyFloatDtype scalar
and call the scalars ufunc (the PyFloatDtype scalar would be a python
float).
I'm not sure I understand this, but it did make me think of one
possible approach --
in my notebook sketches for what the New and Improved ufunc API might
look like, I was already pondering whether the inner loop should
receive a pointer to the ufunc object itself. Not for any reason in
particular, but just because hey they're sorta vaguely like methods
and methods get pointers to the object. But now I know what this is
useful for :-).
If ufunc loops get a pointer to the ufunc object itself, then we can
define a single inner loop function that looks like (sorta-Cython
code):
cdef generic_object_inner_loop(ufunc, args, strides, n, ...):
for i in range(n):
arg_objs = []
for i in range(ufunc.narg):
args_objs.append(<object> (args[j] + strides[j] * i))
ufunc(*arg_objs[:ufunc.nin], out=arg_objs[ufunc.nin:])
and register it by default in every ufunc with signature
"{}->{}".format("O" * ufunc.nin, "O" * ufunc.nout). And this would in
just a few lines of code provide a pretty sensible generic behavior
for *all* object array ufuncs -- they recursively call the ufunc on
their contents.
As a prerequisite of course we would need to remove the auto-coercion
of unknown objects to object arrays, otherwise this becomes an
infinite recursion. But we already decided to do that.
And for this to be really useful for arbitrary objects, not just the
ones that asarray recognizes, then we need __numpy_ufunc__. But again,
we already decided to do that :-).
-n
--
Nathaniel J. Smith -- http://vorpus.org