David Cournapeau
2007-08-03 06:06:58 UTC
Hi,
Following an ongoing discussion with S. Johnson, one of the developer
of fftw3, I would be interested in what people think about adding
infrastructure in numpy related to SIMD alignement (that is 16 bytes
alignement for SSE/ALTIVEC, I don't know anything about other archs).
The problem is that right now, it is difficult to get information for
alignement in numpy (by alignement here, I mean something different than
what is normally meant in numpy context; whether, in my understanding,
NPY_ALIGNED refers to a pointer which is aligned wrt his type, here, I
am talking about arbitrary alignement).
For example, for fftw3, we need to know whether a given data buffer is
16 bytes aligned to get optimal performances; generally, SSE needs 16
byte alignement for optimal performances, as well as altivec. I think it
would be nice to get some infrastructure to help developers to get those
kind of information, and maybe to be able to request 16 aligned buffers.
Here is what I can think of:
- adding an API to know whether a given PyArrayObject has its data
buffer 16 bytes aligned, and requesting a 16 bytes aligned
PyArrayObject. Something like NPY_ALIGNED, basically.
- forcing data allocation to be 16 bytes aligned in numpy (eg
define PyDataMem_Mem to a 16 bytes aligned allocator instead of malloc).
This would mean that many arrays would be "naturally" 16 bytes aligned
without effort.
Point 2 is really easy to implement I think: actually, on some platforms
(Mac OS X and FreeBSD), malloc returning 16 bytes aligned buffers
anyway, so I don't think the wasted space is a real problem. Linux with
glibc is 8 bytes aligned, I don't know about windows. Implementing our
own 16 bytes aligned memory allocator for cross platform compatibility
should be relatively easy. I don't see any drawback, but I guess other
people will.
Point 1 is more tricky, as this requires much more changes in the code.
Do main developers of numpy have an opinion on this ?
cheers,
David
Following an ongoing discussion with S. Johnson, one of the developer
of fftw3, I would be interested in what people think about adding
infrastructure in numpy related to SIMD alignement (that is 16 bytes
alignement for SSE/ALTIVEC, I don't know anything about other archs).
The problem is that right now, it is difficult to get information for
alignement in numpy (by alignement here, I mean something different than
what is normally meant in numpy context; whether, in my understanding,
NPY_ALIGNED refers to a pointer which is aligned wrt his type, here, I
am talking about arbitrary alignement).
For example, for fftw3, we need to know whether a given data buffer is
16 bytes aligned to get optimal performances; generally, SSE needs 16
byte alignement for optimal performances, as well as altivec. I think it
would be nice to get some infrastructure to help developers to get those
kind of information, and maybe to be able to request 16 aligned buffers.
Here is what I can think of:
- adding an API to know whether a given PyArrayObject has its data
buffer 16 bytes aligned, and requesting a 16 bytes aligned
PyArrayObject. Something like NPY_ALIGNED, basically.
- forcing data allocation to be 16 bytes aligned in numpy (eg
define PyDataMem_Mem to a 16 bytes aligned allocator instead of malloc).
This would mean that many arrays would be "naturally" 16 bytes aligned
without effort.
Point 2 is really easy to implement I think: actually, on some platforms
(Mac OS X and FreeBSD), malloc returning 16 bytes aligned buffers
anyway, so I don't think the wasted space is a real problem. Linux with
glibc is 8 bytes aligned, I don't know about windows. Implementing our
own 16 bytes aligned memory allocator for cross platform compatibility
should be relatively easy. I don't see any drawback, but I guess other
people will.
Point 1 is more tricky, as this requires much more changes in the code.
Do main developers of numpy have an opinion on this ?
cheers,
David