Discussion:
[Numpy-discussion] Casting to np.byte before clearing values
Nicolas P. Rougier
2016-12-26 09:34:06 UTC
Permalink
Hi all,


I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.


# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)

%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop

%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop

%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop

%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop


Nicolas
Sebastian Berg
2016-12-26 10:48:19 UTC
Permalink
Post by Nicolas P. Rougier
Hi all,
I'm trying to understand why viewing an array as bytes before
clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but
I've no clue. 
Sure, if its a 1-byte width type, the code will end up calling
`memset`. If it is not, it will end up calling a loop with:

while (N > 0) {
    *dst = output;
    *dst += 8;  /* or whatever element size/stride is */
    --N;
}

now why this gives such a difference, I don't really know, but I guess
it is not too surprising and may depend on other things as well.

- Sebastian
Post by Nicolas P. Rougier
# Native float
Z_float = np.ones(1000000, float)
Z_int   = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nicolas P. Rougier
2016-12-26 11:15:25 UTC
Permalink
Thanks for the explanation Sebastian, makes sense.

Nicolas
Post by Sebastian Berg
Post by Nicolas P. Rougier
Hi all,
I'm trying to understand why viewing an array as bytes before
clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.
Sure, if its a 1-byte width type, the code will end up calling
while (N > 0) {
*dst = output;
*dst += 8; /* or whatever element size/stride is */
--N;
}
now why this gives such a difference, I don't really know, but I guess
it is not too surprising and may depend on other things as well.
- Sebastian
Post by Nicolas P. Rougier
# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Benjamin Root
2016-12-26 16:01:25 UTC
Permalink
Might be os-specific, too. Some virtual memory management systems might
special case the zeroing out of memory. Try doing the same thing with a
different value than zero.

On Dec 26, 2016 6:15 AM, "Nicolas P. Rougier" <***@inria.fr>
wrote:


Thanks for the explanation Sebastian, makes sense.

Nicolas
Post by Sebastian Berg
Post by Nicolas P. Rougier
Hi all,
I'm trying to understand why viewing an array as bytes before
clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.
Sure, if its a 1-byte width type, the code will end up calling
while (N > 0) {
*dst = output;
*dst += 8; /* or whatever element size/stride is */
--N;
}
now why this gives such a difference, I don't really know, but I guess
it is not too surprising and may depend on other things as well.
- Sebastian
Post by Nicolas P. Rougier
# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Chris Barker
2016-12-27 19:52:20 UTC
Permalink
On Mon, Dec 26, 2016 at 1:34 AM, Nicolas P. Rougier <
Post by Nicolas P. Rougier
I'm trying to understand why viewing an array as bytes before clearing
makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.
I notice that the code is simply setting a value using broadcasting -- I
don't think there is anything special about zero in that case. But your
subject refers to "clearing" an array.

So I wonder if you have a use case where the performance difference
matters, in which case _maybe_ it would be worth having a ndarray.zero()
method that efficiently zeros out an array.

Actually, there is ndarray.fill():

In [7]: %timeit Z_float[...] = 0

1000 loops, best of 3: 380 µs per loop


In [8]: %timeit Z_float.view(np.byte)[...] = 0

1000 loops, best of 3: 271 µs per loop


In [9]: %timeit Z_float.fill(0)

1000 loops, best of 3: 363 µs per loop

which seems to take an insignificantly shorter time than assignment.
Probably because it's doing exactly the same loop.

whereas a .zero() could use a memset, like it does with bytes.

can't say I have a use-case that would justify this, though.

-CHB
Post by Nicolas P. Rougier
# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Nicolas P. Rougier
2016-12-27 22:11:06 UTC
Permalink
Yes, clearing is not the proper word but the "trick" works only work for 0 (I'll get the same result in both cases).


Nicolas
Post by Nicolas P. Rougier
I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.
I notice that the code is simply setting a value using broadcasting -- I don't think there is anything special about zero in that case. But your subject refers to "clearing" an array.
So I wonder if you have a use case where the performance difference matters, in which case _maybe_ it would be worth having a ndarray.zero() method that efficiently zeros out an array.
In [7]: %timeit Z_float[...] = 0
1000 loops, best of 3: 380 µs per loop
In [8]: %timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 271 µs per loop
In [9]: %timeit Z_float.fill(0)
1000 loops, best of 3: 363 µs per loop
which seems to take an insignificantly shorter time than assignment. Probably because it's doing exactly the same loop.
whereas a .zero() could use a memset, like it does with bytes.
can't say I have a use-case that would justify this, though.
-CHB
# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...