[Numpy-discussion] Casting to np.byte before clearing values

Discussion:

Nicolas P. Rougier

2016-12-26 09:34:06 UTC

Hi all,

I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.

# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)

%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop

%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop

%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop

%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop

Nicolas

Sebastian Berg

2016-12-26 10:48:19 UTC

Permalink

Post by Nicolas P. Rougier
Hi all,
I'm trying to understand why viewing an array as bytes before
clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but
I've no clue.Â

Sure, if its a 1-byte width type, the code will end up calling
`memset`. If it is not, it will end up calling a loop with:

while (N > 0) {
Â Â *dst = output;
Â Â *dst += 8; Â /* or whatever element size/stride is */
Â Â --N;
}

now why this gives such a difference, I don't really know, but I guess
it is not too surprising and may depend on other things as well.

- Sebastian

Post by Nicolas P. Rougier
# Native float
Z_float = np.ones(1000000, float)
Z_intÂ Â Â = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 Âµs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 Âµs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 Âµs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 Âµs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Nicolas P. Rougier

2016-12-26 11:15:25 UTC

Permalink

Thanks for the explanation Sebastian, makes sense.

Nicolas

Post by Sebastian Berg

Sure, if its a 1-byte width type, the code will end up calling
while (N > 0) {
*dst = output;
*dst += 8; /* or whatever element size/stride is */
--N;
}
now why this gives such a difference, I don't really know, but I guess
it is not too surprising and may depend on other things as well.
- Sebastian

Post by Nicolas P. Rougier
# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Benjamin Root

2016-12-26 16:01:25 UTC

Permalink

Might be os-specific, too. Some virtual memory management systems might
special case the zeroing out of memory. Try doing the same thing with a
different value than zero.

On Dec 26, 2016 6:15 AM, "Nicolas P. Rougier" <***@inria.fr>
wrote:

Thanks for the explanation Sebastian, makes sense.

Nicolas

Post by Sebastian Berg

Post by Nicolas P. Rougier
# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 Âµs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 Âµs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 Âµs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 Âµs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Chris Barker

2016-12-27 19:52:20 UTC

Permalink

On Mon, Dec 26, 2016 at 1:34 AM, Nicolas P. Rougier <

I notice that the code is simply setting a value using broadcasting -- I
don't think there is anything special about zero in that case. But your
subject refers to "clearing" an array.

So I wonder if you have a use case where the performance difference
matters, in which case _maybe_ it would be worth having a ndarray.zero()
method that efficiently zeros out an array.

Actually, there is ndarray.fill():

In [7]: %timeit Z_float[...] = 0

1000 loops, best of 3: 380 Âµs per loop

In [8]: %timeit Z_float.view(np.byte)[...] = 0

1000 loops, best of 3: 271 Âµs per loop

In [9]: %timeit Z_float.fill(0)

1000 loops, best of 3: 363 Âµs per loop

which seems to take an insignificantly shorter time than assignment.
Probably because it's doing exactly the same loop.

whereas a .zero() could use a memset, like it does with bytes.

can't say I have a use-case that would justify this, though.

-CHB

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Nicolas P. Rougier

2016-12-27 22:11:06 UTC

Permalink

Yes, clearing is not the proper word but the "trick" works only work for 0 (I'll get the same result in both cases).

Nicolas

Post by Nicolas P. Rougier
I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.
I notice that the code is simply setting a value using broadcasting -- I don't think there is anything special about zero in that case. But your subject refers to "clearing" an array.
So I wonder if you have a use case where the performance difference matters, in which case _maybe_ it would be worth having a ndarray.zero() method that efficiently zeros out an array.
In [7]: %timeit Z_float[...] = 0
1000 loops, best of 3: 380 µs per loop
In [8]: %timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 271 µs per loop
In [9]: %timeit Z_float.fill(0)
1000 loops, best of 3: 363 µs per loop
which seems to take an insignificantly shorter time than assignment. Probably because it's doing exactly the same loop.
whereas a .zero() could use a memset, like it does with bytes.
can't say I have a use-case that would justify this, though.
-CHB
# Native float
Z_float = np.ones(1000000, float)
Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion