Discussion:
[Numpy-discussion] Question about structure arrays
aerojockey
2015-11-07 21:18:22 UTC
Permalink
Hello,

Recently I made some changes to a program I'm working on, and found that the
changes made it four times slower than before. After some digging, I found
out that one of the new costs was that I added structure arrays. Inside a
low-level loop, I create a structure array, populate it Python, then turn it
over to some handwritten C code for processing. It turned out that, when
passed a structure array as a dtype, numpy has to parse the dtype, which
included calls to re.match and eval.

Now, this is not a big deal for me to work around by using ordinary slicing
and such, and also I can improve things by reusing arrays. Since this is
inner loop stuff, sacrificing readability for speed is an appropriate
tradeoff.

Nevertheless, I was curious if there was a way (or any plans for there to be
a way) to compile a struture array dtype. I realize it's not the
bread-and-butter of numpy, but it turned out to be a very convenient feature
for my use case (populating an array of structures to pass off to C).

Thanks



--
View this message in context: http://numpy-discussion.10968.n7.nabble.com/Question-about-structure-arrays-tp41653.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
Nathaniel Smith
2015-11-07 23:49:22 UTC
Permalink
Post by aerojockey
Hello,
Recently I made some changes to a program I'm working on, and found that the
changes made it four times slower than before. After some digging, I found
out that one of the new costs was that I added structure arrays. Inside a
low-level loop, I create a structure array, populate it Python, then turn it
over to some handwritten C code for processing. It turned out that, when
passed a structure array as a dtype, numpy has to parse the dtype, which
included calls to re.match and eval.
Now, this is not a big deal for me to work around by using ordinary slicing
and such, and also I can improve things by reusing arrays. Since this is
inner loop stuff, sacrificing readability for speed is an appropriate
tradeoff.
Nevertheless, I was curious if there was a way (or any plans for there to be
a way) to compile a struture array dtype. I realize it's not the
bread-and-butter of numpy, but it turned out to be a very convenient feature
for my use case (populating an array of structures to pass off to C).
Does it help to turn your dtype string into a dtype object and then
pass the dtype object around? E.g.

In [1]: dt = np.dtype("i4,i4")

In [2]: np.zeros(2, dtype=dt)
Out[2]:
array([(0, 0), (0, 0)],
dtype=[('f0', '<i4'), ('f1', '<i4')])

-n
--
Nathaniel J. Smith -- http://vorpus.org
aerojockey
2015-11-11 05:40:32 UTC
Permalink
On Sat, Nov 7, 2015 at 1:18 PM, aerojockey &lt;
Post by aerojockey
Hello,
Recently I made some changes to a program I'm working on, and found that the
changes made it four times slower than before. After some digging, I found
out that one of the new costs was that I added structure arrays. Inside a
low-level loop, I create a structure array, populate it Python, then turn it
over to some handwritten C code for processing. It turned out that, when
passed a structure array as a dtype, numpy has to parse the dtype, which
included calls to re.match and eval.
Now, this is not a big deal for me to work around by using ordinary slicing
and such, and also I can improve things by reusing arrays. Since this is
inner loop stuff, sacrificing readability for speed is an appropriate
tradeoff.
Nevertheless, I was curious if there was a way (or any plans for there to be
a way) to compile a struture array dtype. I realize it's not the
bread-and-butter of numpy, but it turned out to be a very convenient feature
for my use case (populating an array of structures to pass off to C).
Does it help to turn your dtype string into a dtype object and then
pass the dtype object around? E.g.
In [1]: dt = np.dtype("i4,i4")
In [2]: np.zeros(2, dtype=dt)
array([(0, 0), (0, 0)],
dtype=[('f0', '&lt;i4'), ('f1', '&lt;i4')])
-n
I actually don't know, since I removed the structure array part about ten
minutes after I posted. However, I did a quick test of your suggestion, and
indeed numpy calls exec and re.match only when creating the dtype object,
not when creating the array. So certainly it would have helped.

I wasn't actually aware you could do that with dtypes. In fact, I was only
vaguely that there were dtype types at all. Thanks for the suggestion.

Carl Banks



--
View this message in context: http://numpy-discussion.10968.n7.nabble.com/Question-about-structure-arrays-tp41653p41676.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
David Morris
2015-11-09 14:27:19 UTC
Permalink
Post by aerojockey
Hello,
Recently I made some changes to a program I'm working on, and found that the
changes made it four times slower than before. After some digging, I found
out that one of the new costs was that I added structure arrays. Inside a
low-level loop, I create a structure array, populate it Python, then turn it
over to some handwritten C code for processing. It turned out that, when
passed a structure array as a dtype, numpy has to parse the dtype, which
included calls to re.match and eval.
Now, this is not a big deal for me to work around by using ordinary slicing
and such, and also I can improve things by reusing arrays. Since this is
inner loop stuff, sacrificing readability for speed is an appropriate
tradeoff.
Nevertheless, I was curious if there was a way (or any plans for there to be
a way) to compile a struture array dtype. I realize it's not the
bread-and-butter of numpy, but it turned out to be a very convenient feature
for my use case (populating an array of structures to pass off to C).
I was just looking into structured arrays. In case it is relevant: Are you
using certain 1.10? They are apparently a LOT slower than 1.9.3, an issue
which will be fixed in a future version.

David
Chris Barker
2015-11-09 23:53:03 UTC
Permalink
Post by aerojockey
Inside a
low-level loop, I create a structure array, populate it Python, then turn it
over to some handwritten C code for processing.
can you do that inside bit of the low-level loop in C (or cython?) you
often want to put the guts of your loop in C anyway...

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Loading...