[Numpy-discussion] reorganizing numpy internal extensions (was: Re: Should we drop support for "one file" compilation mode?)

Discussion:

Nathaniel Smith

2015-10-06 18:30:53 UTC

Permalink

[splitting this off into a new thread]

On Tue, Oct 6, 2015 at 3:00 AM, David Cournapeau <***@gmail.com> wrote:
[...]

I also agree the current situation is not sustainable -- as we discussed
privately before, cythonizing numpy.core is made quite more complicated by
this. I have myself quite a few issues w/ cythonizing the other parts of
umath. I would also like to support the static link better than we do now
(do we know some static link users we can contact to validate our approach
?)
numpy.core.multiarray -> compilation units in numpy/core/src/multiarray/ +
statically link npymath
numpy.core.umath -> compilation units in numpy/core/src/umath + statically
link npymath/npysort + some shenanigans to use things in
numpy.core.multiarray

There are also shenanigans in the other direction - supposedly umath
is layered "above" multiarray, but in practice there are circular
dependencies (see e.g. np.set_numeric_ops).

I would suggest to have a more layered approach, to enable both 'normal'
build and static build, without polluting the public namespace too much.
This is an approach followed by most large libraries (e.g. MKL), and is
fairly flexible.
Concretely, we could start by putting more common functionalities (aka the
'core' library) into its own static library. The API would be considered
private to numpy (no stability guaranteed outside numpy), and every exported
symbol from that library would be decorated appropriately to avoid potential
clashes (e.g. '_npy_internal_').

I don't see why we need this multi-layered complexity, though.

npymath is a well-defined utility library that other people use, so
sure, it makes sense to keep that somewhat separate as a static
library (as discussed in the other thread).

Beyond that -- NumPy is really not a large library. multiarray is <50k
lines of code, and umath is only ~6k (!). And there's no particular
reason to keep them split up from the user point of view -- all their
functionality gets combined into the flat numpy namespace anyway. So
we *could* rewrite them as three libraries, with a "common core" that
then gets exported via two different wrapper libraries -- but it's
much simpler just to do

mv umath/* multiarray/
rmdir umath

and then make multiarray work the way we want. (After fixing up the
build system of course :-).)

-n

--
Nathaniel J. Smith -- http://vorpus.org

David Cournapeau

2015-10-06 18:52:11 UTC

Permalink

Post by Nathaniel Smith
[splitting this off into a new thread]
[...]

I also agree the current situation is not sustainable -- as we discussed
privately before, cythonizing numpy.core is made quite more complicated

this. I have myself quite a few issues w/ cythonizing the other parts of
umath. I would also like to support the static link better than we do now
(do we know some static link users we can contact to validate our

approach

?)
numpy.core.multiarray -> compilation units in numpy/core/src/multiarray/

statically link npymath
numpy.core.umath -> compilation units in numpy/core/src/umath +

statically

link npymath/npysort + some shenanigans to use things in
numpy.core.multiarray

There are also shenanigans in the other direction - supposedly umath
is layered "above" multiarray, but in practice there are circular
dependencies (see e.g. np.set_numeric_ops).

Indeed, I am not arguing about merging umath and multiarray.

Post by Nathaniel Smith

the

'core' library) into its own static library. The API would be considered
private to numpy (no stability guaranteed outside numpy), and every

exported

symbol from that library would be decorated appropriately to avoid

potential

clashes (e.g. '_npy_internal_').

I don't see why we need this multi-layered complexity, though.

For several reasons:

- when you want to cythonize either extension, it is much easier to
separate it as cython for CPython API, C for the rest.
- if numpy.core.multiarray.so is built as cython-based .o + a 'large' C
static library, it should become much simpler to support static link.
- maybe that's just personal, but I find the whole multiarray + umath
quite beyond manageable in terms of intertwined complexity. You may argue
it is not that big, and we all have different preferences in terms of
organization, but if I look at the binary size of multiarray + umath, it is
quite larger than the median size of the .so I have in my /usr/lib.

I am also hoping that splitting up numpy.core in separate elements that
communicate through internal APIs would make participating into numpy
easier.

We could also swap the argument: assuming it does not make the build more
complex, and that it does help static linking, why not doing it ?

David

Nathaniel Smith

2015-10-06 19:04:59 UTC