[Numpy-discussion] Question about unaligned access

Oops, forgot to mention my NumPy version:

In [72]: np.__version__
Out[72]: '1.9.2'

Francesc

--
Francesc Alted

Jaime Fernández del Río

2015-07-06 16:04:11 UTC

I believe that the way numpy is setup, it never does unaligned access,
regardless of the platform, in case it gets run on one that would go up in
flames if you tried to. So my guess would be that you are seeing chunked
copies into a buffer, as opposed to bulk copying or no copying at all, and
that would explain your timing differences. But Julian or Sebastian can
probably give you a more informed answer.

Jaime

Post by Francesc Alted
Thanks,
Francesc
--
Francesc Alted
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayÃºdale en sus planes
de dominaciÃ³n mundial.

Francesc Alted

2015-07-06 16:21:20 UTC

Yes, my guess is that you are right. I suppose that it is possible to
improve the numpy codebase to accelerate this particular access pattern on
Intel platforms, but provided that structured arrays are not that used
(pandas is probably leading this use case by far, and as far as I know,
they are not using structured arrays internally in DataFrames), then maybe
it is not worth to worry about this too much.

Thanks anyway,
Francesc

Post by Jaime FernÃ¡ndez del RÃo
Jaime

Post by Francesc Alted
Thanks,
Francesc
--
Francesc Alted
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayÃºdale en sus planes
de dominaciÃ³n mundial.
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Francesc Alted

Julian Taylor

2015-07-06 18:11:47 UTC

Post by Francesc Alted
Hi,
In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
dtype=[('f0', np.int64), ('f1', np.int32)])
In [63]: %timeit sa['f0'].sum()
100 loops, best of 3: 4.52 ms per loop
In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
dtype=[('f0', np.int64), ('f1', np.int64)])
In [65]: %timeit sa['f0'].sum()
1000 loops, best of 3: 896 µs per loop
The first structured array is made of 12-byte records, while the
second is made by 16-byte records, but the latter performs 5x
faster. Also, using an structured array that is made of 8-byte
In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)),
dtype=[('f0', np.int64)])
In [67]: %timeit sa['f0'].sum()
1000 loops, best of 3: 567 µs per loop
Now, my laptop has a Ivy Bridge processor (i5-3380M) that should
http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
So, if 4 years-old Intel architectures do not have a penalty for
unaligned access, why I am seeing that in NumPy? That strikes
like a quite strange thing to me.
I believe that the way numpy is setup, it never does unaligned
access, regardless of the platform, in case it gets run on one that
would go up in flames if you tried to. So my guess would be that you
are seeing chunked copies into a buffer, as opposed to bulk copying
or no copying at all, and that would explain your timing
differences. But Julian or Sebastian can probably give you a more
informed answer.
Yes, my guess is that you are right. I suppose that it is possible to
improve the numpy codebase to accelerate this particular access pattern
on Intel platforms, but provided that structured arrays are not that
used (pandas is probably leading this use case by far, and as far as I
know, they are not using structured arrays internally in DataFrames),
then maybe it is not worth to worry about this too much.
Thanks anyway,
Francesc
Jaime
Thanks,
Francesc
--
Francesc Alted
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Francesc Alted
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Julian Taylor

2015-07-06 18:11:47 UTC

Julian Taylor

2015-07-06 18:11:47 UTC

Julian Taylor

2015-07-06 18:32:34 UTC

sorry for the 3 empty mails, my client bugged out...

as a workaround you can align structured dtypes to avoid this issue:

sa = np.fromiter(((i,i) for i in range(1000*1000)),
dtype=np.dtype([('f0', np.int64), ('f1', np.int32)], align=True))

Todd

2015-07-07 07:53:50 UTC

Post by Francesc Alted
Hi,
In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)),

dtype=[('f0', np.int64), ('f1', np.int32)])

Post by Francesc Alted
In [63]: %timeit sa['f0'].sum()
100 loops, best of 3: 4.52 ms per loop
In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)),

dtype=[('f0', np.int64), ('f1', np.int64)])

Post by Francesc Alted
In [65]: %timeit sa['f0'].sum()
1000 loops, best of 3: 896 Âµs per loop
The first structured array is made of 12-byte records, while the second

is made by 16-byte records, but the latter performs 5x faster. Also, using
an structured array that is made of 8-byte records is the fastest

Post by Francesc Alted
In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)),

dtype=[('f0', np.int64)])

Post by Francesc Alted
In [67]: %timeit sa['f0'].sum()
1000 loops, best of 3: 567 Âµs per loop
Now, my laptop has a Ivy Bridge processor (i5-3380M) that should

perform quite well on unaligned data:
http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

Post by Francesc Alted
So, if 4 years-old Intel architectures do not have a penalty for

unaligned access, why I am seeing that in NumPy? That strikes like a quite
strange thing to me.