Discussion:
[Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
Antony Lee
2016-02-14 01:57:59 UTC
Permalink
Compare (on Python3 -- for Python2, read "xrange" instead of "range"):

In [2]: %timeit np.array(range(1000000), np.int64)
10 loops, best of 3: 156 ms per loop

In [3]: %timeit np.arange(1000000, dtype=np.int64)
1000 loops, best of 3: 853 µs per loop


Note that while iterating over a range is not very fast, it is still much
better than the array creation:

In [4]: from collections import deque

In [5]: %timeit deque(range(1000000), 1)
10 loops, best of 3: 25.5 ms per loop


On one hand, special cases are awful. On the other hand, the range builtin
is probably important enough to deserve a special case to make this
construction faster. Or not? I initially opened this as
https://github.com/numpy/numpy/issues/7233 but it was suggested there that
this should be discussed on the ML first.

(The real issue which prompted this suggestion: I was building sparse
matrices using scipy.sparse.csc_matrix with some indices specified using
range, and that construction step turned out to take a significant portion
of the time because of the calls to np.array).

Antony
j***@gmail.com
2016-02-14 02:43:48 UTC
Permalink
Post by Antony Lee
In [2]: %timeit np.array(range(1000000), np.int64)
10 loops, best of 3: 156 ms per loop
In [3]: %timeit np.arange(1000000, dtype=np.int64)
1000 loops, best of 3: 853 µs per loop
Note that while iterating over a range is not very fast, it is still much
In [4]: from collections import deque
In [5]: %timeit deque(range(1000000), 1)
10 loops, best of 3: 25.5 ms per loop
On one hand, special cases are awful. On the other hand, the range builtin
is probably important enough to deserve a special case to make this
construction faster. Or not? I initially opened this as
https://github.com/numpy/numpy/issues/7233 but it was suggested there
that this should be discussed on the ML first.
(The real issue which prompted this suggestion: I was building sparse
matrices using scipy.sparse.csc_matrix with some indices specified using
range, and that construction step turned out to take a significant portion
of the time because of the calls to np.array).
IMO: I don't see a reason why this should be supported. There is np.arange
after all for this usecase, and from_iter.
range and the other guys are iterators, and in several cases we can use
larange = list(range(...)) as a short cut to get python list.for python 2/3
compatibility.

I think this might be partially a learning effect in the python 2 to 3
transition. After using almost only python 3 for maybe a year, I don't
think it's difficult to remember the differences when writing code that is
py 2.7 and py 3.x compatible.


It's just **another** thing to watch out for if milliseconds matter in your
application.

Josef
Post by Antony Lee
Antony
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-02-14 02:48:31 UTC
Permalink
Post by j***@gmail.com
Post by Antony Lee
In [2]: %timeit np.array(range(1000000), np.int64)
10 loops, best of 3: 156 ms per loop
In [3]: %timeit np.arange(1000000, dtype=np.int64)
1000 loops, best of 3: 853 µs per loop
Note that while iterating over a range is not very fast, it is still much
In [4]: from collections import deque
In [5]: %timeit deque(range(1000000), 1)
10 loops, best of 3: 25.5 ms per loop
On one hand, special cases are awful. On the other hand, the range
builtin is probably important enough to deserve a special case to make this
construction faster. Or not? I initially opened this as
https://github.com/numpy/numpy/issues/7233 but it was suggested there
that this should be discussed on the ML first.
(The real issue which prompted this suggestion: I was building sparse
matrices using scipy.sparse.csc_matrix with some indices specified using
range, and that construction step turned out to take a significant portion
of the time because of the calls to np.array).
IMO: I don't see a reason why this should be supported. There is np.arange
after all for this usecase, and from_iter.
range and the other guys are iterators, and in several cases we can use
larange = list(range(...)) as a short cut to get python list.for python 2/3
compatibility.
I think this might be partially a learning effect in the python 2 to 3
transition. After using almost only python 3 for maybe a year, I don't
think it's difficult to remember the differences when writing code that is
py 2.7 and py 3.x compatible.
It's just **another** thing to watch out for if milliseconds matter in
your application.
side question: Is there a simple way to distinguish a iterator or generator
from an iterable data structure?

Josef
Post by j***@gmail.com
Josef
Post by Antony Lee
Antony
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Antony Lee
2016-02-14 08:21:34 UTC
Permalink
re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.) Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray. It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).

re: iterable vs iterator: check for the presence of the __next__ special
method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
isinstance(x, Iterable))

Antony
Post by j***@gmail.com
Post by j***@gmail.com
Post by Antony Lee
In [2]: %timeit np.array(range(1000000), np.int64)
10 loops, best of 3: 156 ms per loop
In [3]: %timeit np.arange(1000000, dtype=np.int64)
1000 loops, best of 3: 853 µs per loop
Note that while iterating over a range is not very fast, it is still
In [4]: from collections import deque
In [5]: %timeit deque(range(1000000), 1)
10 loops, best of 3: 25.5 ms per loop
On one hand, special cases are awful. On the other hand, the range
builtin is probably important enough to deserve a special case to make this
construction faster. Or not? I initially opened this as
https://github.com/numpy/numpy/issues/7233 but it was suggested there
that this should be discussed on the ML first.
(The real issue which prompted this suggestion: I was building sparse
matrices using scipy.sparse.csc_matrix with some indices specified using
range, and that construction step turned out to take a significant portion
of the time because of the calls to np.array).
IMO: I don't see a reason why this should be supported. There is
np.arange after all for this usecase, and from_iter.
range and the other guys are iterators, and in several cases we can use
larange = list(range(...)) as a short cut to get python list.for python 2/3
compatibility.
I think this might be partially a learning effect in the python 2 to 3
transition. After using almost only python 3 for maybe a year, I don't
think it's difficult to remember the differences when writing code that is
py 2.7 and py 3.x compatible.
It's just **another** thing to watch out for if milliseconds matter in
your application.
side question: Is there a simple way to distinguish a iterator or
generator from an iterable data structure?
Josef
Post by j***@gmail.com
Josef
Post by Antony Lee
Antony
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-02-14 14:28:03 UTC
Permalink
Post by Antony Lee
re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.) Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray. It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).
But numpy does provide arange.
What's the reason to not use np.arange and use an iterator instead?
Post by Antony Lee
re: iterable vs iterator: check for the presence of the __next__ special
method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
isinstance(x, Iterable))
AFAIR and from spot checking the mailing list, in the past the argument was
that it's too complicated to mix array/asarray creation with fromiter
building of arrays.

(I have no idea if array could cheaply delegate to fromiter.)


Josef
Post by Antony Lee
Antony
Post by j***@gmail.com
Post by j***@gmail.com
Post by Antony Lee
In [2]: %timeit np.array(range(1000000), np.int64)
10 loops, best of 3: 156 ms per loop
In [3]: %timeit np.arange(1000000, dtype=np.int64)
1000 loops, best of 3: 853 µs per loop
Note that while iterating over a range is not very fast, it is still
In [4]: from collections import deque
In [5]: %timeit deque(range(1000000), 1)
10 loops, best of 3: 25.5 ms per loop
On one hand, special cases are awful. On the other hand, the range
builtin is probably important enough to deserve a special case to make this
construction faster. Or not? I initially opened this as
https://github.com/numpy/numpy/issues/7233 but it was suggested there
that this should be discussed on the ML first.
(The real issue which prompted this suggestion: I was building sparse
matrices using scipy.sparse.csc_matrix with some indices specified using
range, and that construction step turned out to take a significant portion
of the time because of the calls to np.array).
IMO: I don't see a reason why this should be supported. There is
np.arange after all for this usecase, and from_iter.
range and the other guys are iterators, and in several cases we can use
larange = list(range(...)) as a short cut to get python list.for python 2/3
compatibility.
I think this might be partially a learning effect in the python 2 to 3
transition. After using almost only python 3 for maybe a year, I don't
think it's difficult to remember the differences when writing code that is
py 2.7 and py 3.x compatible.
It's just **another** thing to watch out for if milliseconds matter in
your application.
side question: Is there a simple way to distinguish a iterator or
generator from an iterable data structure?
Josef
Post by j***@gmail.com
Josef
Post by Antony Lee
Antony
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Ralf Gommers
2016-02-14 14:36:05 UTC
Permalink
Post by Antony Lee
re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.) Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray. It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special
method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the
exact proposal is. I could image one or both of:

- special-case the range() object in array (and asarray/asanyarray?) such
that array(range(N)) becomes as fast as arange(N).
- special-case all iterators, such that array(range(N)) becomes as fast
as deque(range(N))

or yet something else?

Ralf
Antony Lee
2016-02-14 20:23:20 UTC
Permalink
I was thinking (1) (special-case range()); however (2) may be more
generally applicable and useful.

Antony
Post by Ralf Gommers
Post by Antony Lee
re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.) Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray. It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special
method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the
- special-case the range() object in array (and asarray/asanyarray?)
such that array(range(N)) becomes as fast as arange(N).
- special-case all iterators, such that array(range(N)) becomes as fast
as deque(range(N))
or yet something else?
Ralf
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2016-02-14 21:36:00 UTC
Permalink
Post by Ralf Gommers
Post by Antony Lee
re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.) Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray. It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special
method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the
- special-case the range() object in array (and asarray/asanyarray?)
such that array(range(N)) becomes as fast as arange(N).
- special-case all iterators, such that array(range(N)) becomes as fast
as deque(range(N))
I think the last wouldn't help much, as numpy would still need to determine
dimensions and type. I assume that is one of the reason sparse itself
doesn't do that.

Chuck
Ralf Gommers
2016-02-15 06:21:37 UTC
Permalink
On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris <
Post by Charles R Harris
Post by Ralf Gommers
Post by Antony Lee
re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.) Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray. It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special
method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the
- special-case the range() object in array (and asarray/asanyarray?)
such that array(range(N)) becomes as fast as arange(N).
- special-case all iterators, such that array(range(N)) becomes as fast
as deque(range(N))
I think the last wouldn't help much, as numpy would still need to
determine dimensions and type. I assume that is one of the reason sparse
itself doesn't do that.
Not orders of magnitude, but this shows that there's something to optimize
for iterators:

In [1]: %timeit np.array(range(100000))
100 loops, best of 3: 14.9 ms per loop

In [2]: %timeit np.array(list(range(100000)))
100 loops, best of 3: 9.68 ms per loop

Ralf
Antony Lee
2016-02-15 07:41:29 UTC
Permalink
I wonder whether numpy is using the "old" iteration protocol (repeatedly
calling x[i] for increasing i until StopIteration is reached?) A quick
timing shows that it is indeed slower.
... actually it's not even clear to me what qualifies as a sequence for
`np.array`:

class C:
def __iter__(self):
return iter(range(10)) # [0... 9] under the new iteration protocol
def __getitem__(self, i):
raise IndexError # [] under the old iteration protocol

np.array(C())
===> array(<__main__.C object at 0x7f3f21ffff28>, dtype=object)


So how can np.array(range(...)) even work?
Post by Ralf Gommers
On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris <
Post by Charles R Harris
Post by Ralf Gommers
Post by Antony Lee
re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.) Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray. It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__
special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and
not isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what
- special-case the range() object in array (and asarray/asanyarray?)
such that array(range(N)) becomes as fast as arange(N).
- special-case all iterators, such that array(range(N)) becomes as
fast as deque(range(N))
I think the last wouldn't help much, as numpy would still need to
determine dimensions and type. I assume that is one of the reason sparse
itself doesn't do that.
Not orders of magnitude, but this shows that there's something to optimize
In [1]: %timeit np.array(range(100000))
100 loops, best of 3: 14.9 ms per loop
In [2]: %timeit np.array(list(range(100000)))
100 loops, best of 3: 9.68 ms per loop
Ralf
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-02-15 08:07:03 UTC
Permalink
Post by Antony Lee
I wonder whether numpy is using the "old" iteration protocol
(repeatedly calling x[i] for increasing i until StopIteration is
reached?) A quick timing shows that it is indeed slower.
... actually it's not even clear to me what qualifies as a sequence
return iter(range(10)) # [0... 9] under the new iteration protocol
raise IndexError # [] under the old iteration protocol
Numpy currently uses PySequence_Fast, but it has to do a two pass
algorithm (find dtype+dims), and the range is converted twice to list
by this call. That explains the speed advantage of converting to list
manually.

- Sebastian
Post by Antony Lee
np.array(C())
===> array(<__main__.C object at 0x7f3f21ffff28>, dtype=object)
So how can np.array(range(...)) even work?
Post by Ralf Gommers
On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris <
On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <
On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <
Post by Antony Lee
re: no reason why...
This has nothing to do with Python2/Python3 (I personally
stopped using Python2 at least 3 years ago.) Let me put it
this way instead: if Python3's "range" (or Python2's
"xrange") was not a builtin type but a type provided by
numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert
it to a ndarray. It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an
`__array__` special method (which is obviously not possible
for chronological reasons).
re: iterable vs iterator: check for the presence of the
__next__ special method (or isinstance(x, Iterable) vs.
isinstance(x, Iterator) and not isinstance(x, Iterable))
I think it's good to do something about this, but it's not
- special-case the range() object in array (and
asarray/asanyarray?) such that array(range(N)) becomes as fast
as arange(N).
- special-case all iterators, such that array(range(N))
becomes as fast as deque(range(N))
I think the last wouldn't help much, as numpy would still need to
determine dimensions and type. I assume that is one of the
reason sparse itself doesn't do that.
Not orders of magnitude, but this shows that there's something to
In [1]: %timeit np.array(range(100000))
100 loops, best of 3: 14.9 ms per loop
In [2]: %timeit np.array(list(range(100000)))
100 loops, best of 3: 9.68 ms per loop
Ralf
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2016-02-15 08:10:11 UTC
Permalink
Post by Antony Lee
I wonder whether numpy is using the "old" iteration protocol (repeatedly
calling x[i] for increasing i until StopIteration is reached?) A quick
timing shows that it is indeed slower.
Yeah, I'm pretty sure that np.array doesn't know anything about
"iterable", just about "sequence" (calling x[i] for 0 <= i <
i.__len__()).

(See Sequence vs Iterable:
https://docs.python.org/3/library/collections.abc.html)

Personally I'd like it if we could eventually make it so np.array
specifically looks for lists and only lists, because the way it has so
many different fallbacks right now creates all confusion between which
objects are elements. Compare:

In [5]: np.array([(1, 2), (3, 4)]).shape
Out[5]: (2, 2)

In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape
Out[6]: (2,)

-n
--
Nathaniel J. Smith -- https://vorpus.org
Antony Lee
2016-02-15 16:13:29 UTC
Permalink
Indeed:

In [1]: class C:
def __getitem__(self, i):
if i < 10: return i
else: raise IndexError
def __len__(self):
return 10
...:

In [2]: np.array(C())
Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


(omitting __len__ results in the creation of an object array, consistently
with the fact that the sequence protocol requires __len__).
Meanwhile, I found a new way to segfault numpy :-)

In [3]: class C:
def __getitem__(self, i):
if i < 10: return i
else: raise IndexError
def __len__(self):
return 42
...:

In [4]: np.array(C())
Fatal Python error: Segmentation fault
Post by Nathaniel Smith
Post by Antony Lee
I wonder whether numpy is using the "old" iteration protocol (repeatedly
calling x[i] for increasing i until StopIteration is reached?) A quick
timing shows that it is indeed slower.
Yeah, I'm pretty sure that np.array doesn't know anything about
"iterable", just about "sequence" (calling x[i] for 0 <= i <
i.__len__()).
https://docs.python.org/3/library/collections.abc.html)
Personally I'd like it if we could eventually make it so np.array
specifically looks for lists and only lists, because the way it has so
many different fallbacks right now creates all confusion between which
In [5]: np.array([(1, 2), (3, 4)]).shape
Out[5]: (2, 2)
In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape
Out[6]: (2,)
-n
--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Jeff Reback
2016-02-15 16:24:51 UTC
Permalink
just an FYI.

pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory
savings,
see here
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#range-index>,
similar to how python range/xrange work.

though there are substantial perf benefits, mainly with set operations, see
here
<https://github.com/pydata/pandas/blob/master/pandas/indexes/range.py#L274>
though didn't officially benchmark thes.

Jeff
Post by Antony Lee
if i < 10: return i
else: raise IndexError
return 10
In [2]: np.array(C())
Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
(omitting __len__ results in the creation of an object array, consistently
with the fact that the sequence protocol requires __len__).
Meanwhile, I found a new way to segfault numpy :-)
if i < 10: return i
else: raise IndexError
return 42
In [4]: np.array(C())
Fatal Python error: Segmentation fault
Post by Nathaniel Smith
Post by Antony Lee
I wonder whether numpy is using the "old" iteration protocol (repeatedly
calling x[i] for increasing i until StopIteration is reached?) A quick
timing shows that it is indeed slower.
Yeah, I'm pretty sure that np.array doesn't know anything about
"iterable", just about "sequence" (calling x[i] for 0 <= i <
i.__len__()).
https://docs.python.org/3/library/collections.abc.html)
Personally I'd like it if we could eventually make it so np.array
specifically looks for lists and only lists, because the way it has so
many different fallbacks right now creates all confusion between which
In [5]: np.array([(1, 2), (3, 4)]).shape
Out[5]: (2, 2)
In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape
Out[6]: (2,)
-n
--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Kern
2016-02-15 16:49:29 UTC
Permalink
Post by Jeff Reback
just an FYI.
pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory
savings,
Post by Jeff Reback
see here, similar to how python range/xrange work.
though there are substantial perf benefits, mainly with set operations,
see here
Post by Jeff Reback
though didn't officially benchmark thes.
Since it is a numpy-aware object (unlike the builtins), you can (and have,
if I'm reading the code correctly) implement __array__() such that it does
the correctly performant thing and call np.arange(). RangeIndex won't be
adversely impacted by retaining the status quo.

--
Robert Kern
Chris Barker
2016-02-17 18:50:03 UTC
Permalink
Post by Antony Lee
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range
object, which is lazily evaluated, and satisfies both the iterator protocol
and the sequence protocol (at least most of it:

In [*1*]: r = range(10)


In [*2*]: r[3]

Out[*2*]: 3


In [*3*]: len(r)

Out[*3*]: 10


In [*4*]: type(r)

Out[*4*]: range

In [*9*]: isinstance(r, collections.abc.Sequence)

Out[*9*]: True

In [*10*]: l = list()

In [*11*]: isinstance(l, collections.abc.Sequence)

Out[*11*]: True

In [*12*]: isinstance(r, collections.abc.Iterable)

Out[*12*]: True
I'm still totally confused as to why we'd need to special-case range when
we have arange().

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Antony Lee
2016-02-18 18:15:44 UTC
Permalink
Mostly so that there is no performance lost when someone passes range(...)
instead of np.arange(...). At least I had never realized that one is much
faster than the other and always just passed range() as a convenience.

Antony
Post by Chris Barker
Post by Antony Lee
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range
object, which is lazily evaluated, and satisfies both the iterator protocol
In [*1*]: r = range(10)
In [*2*]: r[3]
Out[*2*]: 3
In [*3*]: len(r)
Out[*3*]: 10
In [*4*]: type(r)
Out[*4*]: range
In [*9*]: isinstance(r, collections.abc.Sequence)
Out[*9*]: True
In [*10*]: l = list()
In [*11*]: isinstance(l, collections.abc.Sequence)
Out[*11*]: True
In [*12*]: isinstance(r, collections.abc.Iterable)
Out[*12*]: True
I'm still totally confused as to why we'd need to special-case range when
we have arange().
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-02-18 19:12:13 UTC
Permalink
Post by Antony Lee
Mostly so that there is no performance lost when someone passes range(...)
instead of np.arange(...). At least I had never realized that one is much
faster than the other and always just passed range() as a convenience.
Antony
Post by Chris Barker
Post by Antony Lee
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range
object, which is lazily evaluated, and satisfies both the iterator protocol
In [*1*]: r = range(10)
thanks, I didn't know that

the range r here doesn't get eaten by iterating through it
while
r = (i for i in range(5))
is only good for a single pass.

(tried on python 3.4)

Josef
Post by Antony Lee
Post by Chris Barker
In [*2*]: r[3]
Out[*2*]: 3
In [*3*]: len(r)
Out[*3*]: 10
In [*4*]: type(r)
Out[*4*]: range
In [*9*]: isinstance(r, collections.abc.Sequence)
Out[*9*]: True
In [*10*]: l = list()
In [*11*]: isinstance(l, collections.abc.Sequence)
Out[*11*]: True
In [*12*]: isinstance(r, collections.abc.Iterable)
Out[*12*]: True
I'm still totally confused as to why we'd need to special-case range when
we have arange().
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Chris Barker
2016-02-18 22:21:02 UTC
Permalink
Post by Antony Lee
Mostly so that there is no performance lost when someone passes range(...)
instead of np.arange(...). At least I had never realized that one is much
faster than the other and always just passed range() as a convenience.
Well, pretty much everything in numpy is faster if you use the numpy array
version rather than plain python -- this hardly seems like the extra code
would be worth it.

numpy's array() constructor can (and should) take an arbitrary iterable.

It does make some sense that you we might want to special case iterators,
as you don't want to loop through them too many times, which is what
np.fromiter() is for.

and _maybe_ it would be worth special casing python lists, as you can
access items faster, and they are really, really common (or has this
already been done?), but special casing range() is getting silly. And it
might be hard to do. At the C level I suppose you could actually know what
the parameters and state of the range object are and create an array
directly from that -- but that's what arange is for...

-CHB
Post by Antony Lee
Post by Chris Barker
Post by Antony Lee
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range
object, which is lazily evaluated, and satisfies both the iterator protocol
In [*1*]: r = range(10)
In [*2*]: r[3]
Out[*2*]: 3
In [*3*]: len(r)
Out[*3*]: 10
In [*4*]: type(r)
Out[*4*]: range
In [*9*]: isinstance(r, collections.abc.Sequence)
Out[*9*]: True
In [*10*]: l = list()
In [*11*]: isinstance(l, collections.abc.Sequence)
Out[*11*]: True
In [*12*]: isinstance(r, collections.abc.Iterable)
Out[*12*]: True
I'm still totally confused as to why we'd need to special-case range when
we have arange().
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Antony Lee
2016-02-18 22:46:40 UTC
Permalink
In a sense this discussion is really about making np.array(iterable) more
efficient, so I restarted the discussion at
https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075059.html

Antony
Post by Chris Barker
Post by Antony Lee
Mostly so that there is no performance lost when someone passes
range(...) instead of np.arange(...). At least I had never realized that
one is much faster than the other and always just passed range() as a
convenience.
Well, pretty much everything in numpy is faster if you use the numpy
array version rather than plain python -- this hardly seems like the extra
code would be worth it.
numpy's array() constructor can (and should) take an arbitrary iterable.
It does make some sense that you we might want to special case iterators,
as you don't want to loop through them too many times, which is what
np.fromiter() is for.
and _maybe_ it would be worth special casing python lists, as you can
access items faster, and they are really, really common (or has this
already been done?), but special casing range() is getting silly. And it
might be hard to do. At the C level I suppose you could actually know what
the parameters and state of the range object are and create an array
directly from that -- but that's what arange is for...
-CHB
Post by Antony Lee
Post by Chris Barker
Post by Antony Lee
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a
range object, which is lazily evaluated, and satisfies both the iterator
In [*1*]: r = range(10)
In [*2*]: r[3]
Out[*2*]: 3
In [*3*]: len(r)
Out[*3*]: 10
In [*4*]: type(r)
Out[*4*]: range
In [*9*]: isinstance(r, collections.abc.Sequence)
Out[*9*]: True
In [*10*]: l = list()
In [*11*]: isinstance(l, collections.abc.Sequence)
Out[*11*]: True
In [*12*]: isinstance(r, collections.abc.Iterable)
Out[*12*]: True
I'm still totally confused as to why we'd need to special-case range
when we have arange().
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...