Francesc Alted
2015-07-06 15:18:13 UTC
Hi,
I have stumbled into this:
In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
np.int64), ('f1', np.int32)])
In [63]: %timeit sa['f0'].sum()
100 loops, best of 3: 4.52 ms per loop
In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
np.int64), ('f1', np.int64)])
In [65]: %timeit sa['f0'].sum()
1000 loops, best of 3: 896 µs per loop
The first structured array is made of 12-byte records, while the second is
made by 16-byte records, but the latter performs 5x faster. Also, using an
structured array that is made of 8-byte records is the fastest (expected):
In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
np.int64)])
In [67]: %timeit sa['f0'].sum()
1000 loops, best of 3: 567 µs per loop
Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
quite well on unaligned data:
http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
So, if 4 years-old Intel architectures do not have a penalty for unaligned
access, why I am seeing that in NumPy? That strikes like a quite strange
thing to me.
Thanks,
Francesc
I have stumbled into this:
In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
np.int64), ('f1', np.int32)])
In [63]: %timeit sa['f0'].sum()
100 loops, best of 3: 4.52 ms per loop
In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
np.int64), ('f1', np.int64)])
In [65]: %timeit sa['f0'].sum()
1000 loops, best of 3: 896 µs per loop
The first structured array is made of 12-byte records, while the second is
made by 16-byte records, but the latter performs 5x faster. Also, using an
structured array that is made of 8-byte records is the fastest (expected):
In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
np.int64)])
In [67]: %timeit sa['f0'].sum()
1000 loops, best of 3: 567 µs per loop
Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
quite well on unaligned data:
http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
So, if 4 years-old Intel architectures do not have a penalty for unaligned
access, why I am seeing that in NumPy? That strikes like a quite strange
thing to me.
Thanks,
Francesc
--
Francesc Alted
Francesc Alted