Discussion:
[Numpy-discussion] making "low" optional in numpy.randint
G Young
2016-02-17 15:01:38 UTC
Permalink
Hello all,

I have a PR open here <https://github.com/numpy/numpy/pull/7151> that makes
"low" an optional parameter in numpy.randint and introduces new behavior
into the API as follows:

1) `low == None` and `high == None`

Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd =
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is
the provided integral type.

2) `low != None` and `high == None`

If `low >= 0`, numbers are <b>still</b> generated over the range `[0,
low)`, but if `low` < 0, numbers are generated over the range `[low,
highbnd)`, where `highbnd` is defined as above.

3) `low == None` and `high != None`

Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is
defined as above.

The primary motivation was the second case, as it is more convenient to
specify a 'dtype' by itself when generating such numbers in a similar vein
to numpy.empty, except with initialized values.

Looking forward to your feedback!

Greg
Alan Isaac
2016-02-17 16:40:05 UTC
Permalink
Behavior of random integer generation:
Python randint [a,b]
MATLAB randi [a,b]
Mma RandomInteger [a,b]
haskell randomR [a,b]
GAUSS rndi [a,b]
Maple rand [a,b]

In short, NumPy's `randint` is non-standard (and,
I would add, non-intuitive). Presumably was due
due to relying on a float draw from [0,1) along
with the use of floor.

The divergence in behavior between the (later) Python
function of the same name is particularly unfortunate.

So I suggest further work on this function is
not called for, and use of `random_integers`
should be encouraged. Probably NumPy's `randint`
should be deprecated.

If there is any playing with the interface,
I think Mma provides a pretty good model. If I were
designing the interface, I would always require a
tuple argument (for the inclusive range), with possible
`None` values to imply datatype extreme values.
Proposed name (after `randint` deprecation): `randints`.

Cheers,
Alan Isaac
Robert Kern
2016-02-17 16:46:43 UTC
Permalink
Post by Alan Isaac
Python randint [a,b]
MATLAB randi [a,b]
Mma RandomInteger [a,b]
haskell randomR [a,b]
GAUSS rndi [a,b]
Maple rand [a,b]
In short, NumPy's `randint` is non-standard (and,
I would add, non-intuitive). Presumably was due
due to relying on a float draw from [0,1) along
with the use of floor.
No, never was. It is implemented so because Python uses semi-open integer
intervals by preference because it plays most nicely with 0-based indexing.
Not sure about all of those systems, but some at least are 1-based
indexing, so closed intervals do make sense.

The Python stdlib's random.randint() closed interval is considered a
mistake by python-dev leading to the implementation and preference for
random.randrange() instead.
Post by Alan Isaac
The divergence in behavior between the (later) Python
function of the same name is particularly unfortunate.
Indeed, but unfortunately, this mistake dates way back to Numeric times,
and easing the migration to numpy was a priority in the heady days of numpy
1.0.
Post by Alan Isaac
So I suggest further work on this function is
not called for, and use of `random_integers`
should be encouraged. Probably NumPy's `randint`
should be deprecated.
Not while I'm here. Instead, `random_integers()` is discouraged and perhaps
might eventually be deprecated.

--
Robert Kern
G Young
2016-02-17 16:48:14 UTC
Permalink
Actually, it has already been deprecated because I did it myself. :)
Post by Robert Kern
Post by Alan Isaac
Python randint [a,b]
MATLAB randi [a,b]
Mma RandomInteger [a,b]
haskell randomR [a,b]
GAUSS rndi [a,b]
Maple rand [a,b]
In short, NumPy's `randint` is non-standard (and,
I would add, non-intuitive). Presumably was due
due to relying on a float draw from [0,1) along
with the use of floor.
No, never was. It is implemented so because Python uses semi-open integer
intervals by preference because it plays most nicely with 0-based indexing.
Not sure about all of those systems, but some at least are 1-based
indexing, so closed intervals do make sense.
The Python stdlib's random.randint() closed interval is considered a
mistake by python-dev leading to the implementation and preference for
random.randrange() instead.
Post by Alan Isaac
The divergence in behavior between the (later) Python
function of the same name is particularly unfortunate.
Indeed, but unfortunately, this mistake dates way back to Numeric times,
and easing the migration to numpy was a priority in the heady days of numpy
1.0.
Post by Alan Isaac
So I suggest further work on this function is
not called for, and use of `random_integers`
should be encouraged. Probably NumPy's `randint`
should be deprecated.
Not while I'm here. Instead, `random_integers()` is discouraged and
perhaps might eventually be deprecated.
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Alan Isaac
2016-02-17 17:10:52 UTC
Permalink
some at least are 1-based indexing, so closed intervals do make sense.
Haskell is 0-indexed.
And quite carefully thought out, imo.

Cheers,
Alan
G Young
2016-02-17 17:28:48 UTC
Permalink
Perhaps, but we are not coding in Haskell. We are coding in Python, and
the standard is that the endpoint is excluded, which renders your point
moot I'm afraid.
Post by Alan Isaac
some at least are 1-based indexing, so closed intervals do make sense.
Haskell is 0-indexed.
And quite carefully thought out, imo.
Cheers,
Alan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Alan Isaac
2016-02-17 20:30:45 UTC
Permalink
Post by G Young
Perhaps, but we are not coding in Haskell. We are coding in Python, and
the standard is that the endpoint is excluded, which renders your point
moot I'm afraid.
I am not sure what "standard" you are talking about.
I thought we were talking about the user interface.

Nobody is proposing changing the behavior of `range`.
That is an entirely separate question.

I'm not trying to change any minds, but let's not rely
on spurious arguments.

Cheers,
Alan
Robert Kern
2016-02-17 20:42:07 UTC
Permalink
Post by Alan Isaac
Post by G Young
Perhaps, but we are not coding in Haskell. We are coding in Python, and
the standard is that the endpoint is excluded, which renders your point
moot I'm afraid.
I am not sure what "standard" you are talking about.
I thought we were talking about the user interface.
It is a persistent and consistent convention (i.e. "standard") across
Python APIs that deal with integer ranges (range(), slice(),
random.randrange(), ...), particularly those that end up related to
indexing; e.g. `x[np.random.randint(0, len(x))]` to pull a random sample
from an array.

random.randint() was the one big exception, and it was considered a mistake
for that very reason, soft-deprecated in favor of random.randrange().

--
Robert Kern
Alan Isaac
2016-02-17 23:29:30 UTC
Permalink
Post by Robert Kern
random.randint() was the one big exception, and it was considered a
mistake for that very reason, soft-deprecated in favor of
random.randrange().
randrange also has its detractors:
https://code.activestate.com/lists/python-dev/138358/
and following.

I think if we start citing persistant conventions, the
persistent convention across *many* languages that the bounds
provided for a random integer range are inclusive also counts for
something, especially when the names are essentially shared.

But again, I am just trying to be clear about what is at issue,
not push for a change. I think citing non-existent standards
is not helpful. I think the discrepancy between the Python
standard library and numpy for a function going by a common
name is harmful. (But then, I teach.)

fwiw,
Alan
Juan Nunez-Iglesias
2016-02-17 23:48:39 UTC
Permalink
Also fwiw, I think the 0-based, half-open interval is one of the best
features of Python indexing and yes, I do use random integers to index into
my arrays and would not appreciate having to litter my code with "-1"
everywhere.
Post by Alan Isaac
Post by Robert Kern
random.randint() was the one big exception, and it was considered a
mistake for that very reason, soft-deprecated in favor of
random.randrange().
https://code.activestate.com/lists/python-dev/138358/
and following.
I think if we start citing persistant conventions, the
persistent convention across *many* languages that the bounds
provided for a random integer range are inclusive also counts for
something, especially when the names are essentially shared.
But again, I am just trying to be clear about what is at issue,
not push for a change. I think citing non-existent standards
is not helpful. I think the discrepancy between the Python
standard library and numpy for a function going by a common
name is harmful. (But then, I teach.)
fwiw,
Alan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Kern
2016-02-17 23:55:00 UTC
Permalink
He was talking consistently about "random integers" not
"random_integers()". :-)
Your statement is a little self-contradictory, but in any case, you
shouldn't worry about random_integers getting removed from the code-base.
However, it has been deprecated in favor of randint.
Post by Juan Nunez-Iglesias
Also fwiw, I think the 0-based, half-open interval is one of the best
features of Python indexing and yes, I do use random integers to index into
my arrays and would not appreciate having to litter my code with "-1"
everywhere.
Post by Alan Isaac
Post by Robert Kern
random.randint() was the one big exception, and it was considered a
mistake for that very reason, soft-deprecated in favor of
random.randrange().
https://code.activestate.com/lists/python-dev/138358/
and following.
I think if we start citing persistant conventions, the
persistent convention across *many* languages that the bounds
provided for a random integer range are inclusive also counts for
something, especially when the names are essentially shared.
But again, I am just trying to be clear about what is at issue,
not push for a change. I think citing non-existent standards
is not helpful. I think the discrepancy between the Python
standard library and numpy for a function going by a common
name is harmful. (But then, I teach.)
fwiw,
Alan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Robert Kern
Alan Isaac
2016-02-17 23:59:15 UTC
Permalink
Post by Juan Nunez-Iglesias
Also fwiw, I think the 0-based, half-open interval is one of the best
features of Python indexing and yes, I do use random integers to index
into my arrays and would not appreciate having to litter my code with
"-1" everywhere.
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated
/numpy.random.choice.html

fwiw,
Alan Isaac
Juan Nunez-Iglesias
2016-02-18 00:01:46 UTC
Permalink
Notice the limitation "1D array-like".
Post by Alan Isaac
Post by Juan Nunez-Iglesias
Also fwiw, I think the 0-based, half-open interval is one of the best
features of Python indexing and yes, I do use random integers to index
into my arrays and would not appreciate having to litter my code with
"-1" everywhere.
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated
/numpy.random.choice.html
fwiw,
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Alan Isaac
2016-02-18 00:08:08 UTC
Permalink
Post by Juan Nunez-Iglesias
Notice the limitation "1D array-like".
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choice.html
"If an int, the random sample is generated as if a was np.arange(n)"

hth,
Alan Isaac
Juan Nunez-Iglesias
2016-02-18 00:17:24 UTC
Permalink
Ah! Touché! =) My last and admittedly weak defense is that I've been
writing numpy since before 1.7. =)
Post by Alan Isaac
Post by Juan Nunez-Iglesias
Notice the limitation "1D array-like".
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choice.html
"If an int, the random sample is generated as if a was np.arange(n)"
hth,
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-02-18 02:24:41 UTC
Permalink
Post by Juan Nunez-Iglesias
Ah! Touché! =) My last and admittedly weak defense is that I've been
writing numpy since before 1.7. =)
Post by Alan Isaac
Post by Juan Nunez-Iglesias
Notice the limitation "1D array-like".
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choice.html
"If an int, the random sample is generated as if a was np.arange(n)"
(un)related aside:
my R doc quote about "may lead to undesired behavior" refers to this,
IIRC, R's `sample` was the inspiration for this function

but numpy distinguishes scalar from one element (1D) arrays
Post by Juan Nunez-Iglesias
Post by Alan Isaac
Post by Juan Nunez-Iglesias
for i in range(3, 10): np.random.choice(np.arange(10)[i:])
Josef
Post by Juan Nunez-Iglesias
Post by Alan Isaac
hth,
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Juan Nunez-Iglesias
2016-02-17 23:53:50 UTC
Permalink
LOL "random integers" != "random_integers". =D
Your statement is a little self-contradictory, but in any case, you
shouldn't worry about random_integers getting removed from the code-base.
However, it has been deprecated in favor of randint.
Post by Juan Nunez-Iglesias
Also fwiw, I think the 0-based, half-open interval is one of the best
features of Python indexing and yes, I do use random integers to index into
my arrays and would not appreciate having to litter my code with "-1"
everywhere.
Post by Alan Isaac
Post by Robert Kern
random.randint() was the one big exception, and it was considered a
mistake for that very reason, soft-deprecated in favor of
random.randrange().
https://code.activestate.com/lists/python-dev/138358/
and following.
I think if we start citing persistant conventions, the
persistent convention across *many* languages that the bounds
provided for a random integer range are inclusive also counts for
something, especially when the names are essentially shared.
But again, I am just trying to be clear about what is at issue,
not push for a change. I think citing non-existent standards
is not helpful. I think the discrepancy between the Python
standard library and numpy for a function going by a common
name is harmful. (But then, I teach.)
fwiw,
Alan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
G Young
2016-02-17 20:43:55 UTC
Permalink
Joe: fair enough. A separate function seems more reasonable. Perhaps it
was a wording thing, but you kept saying "wrapper," which is not the same
as a separate function.

Josef: I don't think we are making people think more. They're all keyword
arguments, so if you don't want to think about them, then you leave them as
the defaults, and everyone is happy. The 'dtype' keyword was needed by
someone who wanted to generate a large array of uint8 random integers and
could not just as call 'astype' due to memory constraints. I would suggest
you read this issue here <https://github.com/numpy/numpy/issues/6790> and
the PR's that followed so that you have a better understanding as to why
this 'weird' behavior was chosen.
Post by Alan Isaac
Post by G Young
Perhaps, but we are not coding in Haskell. We are coding in Python, and
the standard is that the endpoint is excluded, which renders your point
moot I'm afraid.
I am not sure what "standard" you are talking about.
I thought we were talking about the user interface.
Nobody is proposing changing the behavior of `range`.
That is an entirely separate question.
I'm not trying to change any minds, but let's not rely
on spurious arguments.
Cheers,
Alan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Kern
2016-02-17 20:48:24 UTC
Permalink
Post by G Young
Josef: I don't think we are making people think more. They're all
keyword arguments, so if you don't want to think about them, then you leave
them as the defaults, and everyone is happy.

I believe that Josef has the code's reader in mind, not the code's writer.
As a reader of other people's code (and I count 6-months-ago-me as one such
"other people"), I am sure to eventually encounter all of the different
variants, so I will need to know all of them.

--
Robert Kern
G Young
2016-02-17 20:58:47 UTC
Permalink
I sense that this issue is now becoming more of "randint has become too
complicated" I suppose we could always "add" more functions that present
simpler interfaces, though if you really do want simple, there's always
Python's random library you can use.
Post by Robert Kern
Post by G Young
Josef: I don't think we are making people think more. They're all
keyword arguments, so if you don't want to think about them, then you leave
them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's writer.
As a reader of other people's code (and I count 6-months-ago-me as one such
"other people"), I am sure to eventually encounter all of the different
variants, so I will need to know all of them.
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-02-17 21:20:49 UTC
Permalink
Post by G Young
I sense that this issue is now becoming more of "randint has become too
complicated" I suppose we could always "add" more functions that present
simpler interfaces, though if you really do want simple, there's always
Python's random library you can use.
Post by Robert Kern
Post by G Young
Josef: I don't think we are making people think more. They're all
keyword arguments, so if you don't want to think about them, then you leave
them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's
writer. As a reader of other people's code (and I count 6-months-ago-me as
one such "other people"), I am sure to eventually encounter all of the
different variants, so I will need to know all of them.
I have mostly the users in mind (i.e. me).

I like simple patterns where I don't have to stare at a docstring for five
minutes to understand it, or pull it up again each time I use it.

dtype for storage is different from dtype as distribution parameter.


---
aside, since I just read this
https://news.ycombinator.com/item?id=11112763

what to avoid.
you save a few keystrokes and spend months trying to figure out what's
going on.
(exaggerated)

"*Note* that this convenience feature may lead to undesired behaviour when
..." from R docs

Josef
Post by G Young
Post by Robert Kern
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-02-17 21:27:37 UTC
Permalink
Post by Robert Kern
Post by G Young
Josef: I don't think we are making people think more. They're all
keyword arguments, so if you don't want to think about them, then
you
leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's
writer. As a reader of other people's code (and I count 6-months
-ago
-me as one such "other people"), I am sure to eventually encounter
all of the different variants, so I will need to know all of them.
Completely agree. Greg, if you need more then a few minutes to
explain
it in this case, there seems little point. It seems to me even the
np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)
And *everyone* will immediately know what is meant with just minor
extra effort for writing it. We should keep the analogy to "range" as
much as possible. Anything going far beyond that, can be confusing.
On
first sight I am not convinced that there is a serious convenience
gain
"Explicit is better then implicit"
since writing the explicit code is easy. It might also create weird
bugs if the completely unexpected (most users would probably not even
realize it existed) happens and you get huge numbers because you
happened to have a `low=0` in there. Especially your point 2) seems
confusing. As for 3) if I see `np.random.randint(high=3)` I think I
would assume [0, 3)....
OK, that was silly, that is what happens of course. So it is explicit
in the sense that you have pass in at least one `None` explicitly.

But I am still not sure that the added convenience is big and easy to
understand [1], if it was always lowest for low and highest for high, I
remember get it, but it seems more complex (though None does also look
a a bit like "default" and "default" is 0 for low).

- Sebastian

[1] As in the trade-off between added complexity vs. added convenience.
Additionally, I am not sure the maximum int range is such a common
need
anyway?
- Sebastian
Post by Robert Kern
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
G Young
2016-02-17 21:53:08 UTC
Permalink
"Explicit is better than implicit" - can't argue with that. It doesn't
seem like the PR has gained much traction, so I'll close it.
Post by Sebastian Berg
Post by Robert Kern
Post by G Young
Josef: I don't think we are making people think more. They're all
keyword arguments, so if you don't want to think about them, then
you
leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's
writer. As a reader of other people's code (and I count 6-months
-ago
-me as one such "other people"), I am sure to eventually encounter
all of the different variants, so I will need to know all of them.
Completely agree. Greg, if you need more then a few minutes to
explain
it in this case, there seems little point. It seems to me even the
np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)
And *everyone* will immediately know what is meant with just minor
extra effort for writing it. We should keep the analogy to "range" as
much as possible. Anything going far beyond that, can be confusing.
On
first sight I am not convinced that there is a serious convenience
gain
"Explicit is better then implicit"
since writing the explicit code is easy. It might also create weird
bugs if the completely unexpected (most users would probably not even
realize it existed) happens and you get huge numbers because you
happened to have a `low=0` in there. Especially your point 2) seems
confusing. As for 3) if I see `np.random.randint(high=3)` I think I
would assume [0, 3)....
OK, that was silly, that is what happens of course. So it is explicit
in the sense that you have pass in at least one `None` explicitly.
But I am still not sure that the added convenience is big and easy to
understand [1], if it was always lowest for low and highest for high, I
remember get it, but it seems more complex (though None does also look
a a bit like "default" and "default" is 0 for low).
- Sebastian
[1] As in the trade-off between added complexity vs. added convenience.
Additionally, I am not sure the maximum int range is such a common
need
anyway?
- Sebastian
Post by Robert Kern
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-02-17 22:18:44 UTC
Permalink
Post by G Young
"Explicit is better than implicit" - can't argue with that. It
doesn't seem like the PR has gained much traction, so I'll close it.
Thanks for the effort though! Sometimes we get a bit carried away with
doing fancy stuff, and I guess the idea is likely a bit too fancy for
wide application.

- Sebastian
Post by G Young
On Wed, Feb 17, 2016 at 9:27 PM, Sebastian Berg <
Post by G Young
Post by Robert Kern
Post by G Young
Josef: I don't think we are making people think more.
They're
Post by Robert Kern
Post by G Young
all
keyword arguments, so if you don't want to think about them,
then
Post by Robert Kern
you
leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the
code's
Post by Robert Kern
writer. As a reader of other people's code (and I count 6
-months
Post by Robert Kern
-ago
-me as one such "other people"), I am sure to eventually
encounter
Post by Robert Kern
all of the different variants, so I will need to know all of
them.
Completely agree. Greg, if you need more then a few minutes to
explain
it in this case, there seems little point. It seems to me even
the
worst cases of your examples would be covered by writing code
np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)
And *everyone* will immediately know what is meant with just
minor
extra effort for writing it. We should keep the analogy to
"range" as
much as possible. Anything going far beyond that, can be
confusing.
On
first sight I am not convinced that there is a serious
convenience
gain
"Explicit is better then implicit"
since writing the explicit code is easy. It might also create
weird
bugs if the completely unexpected (most users would probably not
even
realize it existed) happens and you get huge numbers because you
happened to have a `low=0` in there. Especially your point 2)
seems
confusing. As for 3) if I see `np.random.randint(high=3)` I think
I
would assume [0, 3)....
OK, that was silly, that is what happens of course. So it is
explicit
in the sense that you have pass in at least one `None` explicitly.
But I am still not sure that the added convenience is big and easy to
understand [1], if it was always lowest for low and highest for high, I
remember get it, but it seems more complex (though None does also look
a a bit like "default" and "default" is 0 for low).
- Sebastian
[1] As in the trade-off between added complexity vs. added
convenience.
Additionally, I am not sure the maximum int range is such a
common
need
anyway?
- Sebastian
Post by Robert Kern
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-02-17 18:37:04 UTC
Permalink
Post by G Young
Hello all,
I have a PR open here <https://github.com/numpy/numpy/pull/7151> that
makes "low" an optional parameter in numpy.randint and introduces new
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd =
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is
the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0,
low)`, but if `low` < 0, numbers are generated over the range `[low,
highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is
defined as above.
My impression (*) is that this will be confusing, and uses a default that I
never ever needed.

Maybe a better way would be to use low=-np.inf and high=np.inf where inf
would be interpreted as the smallest and largest representable number. And
leave the defaults unchanged.

(*) I didn't try to understand how it works for various cases.

Josef
Post by G Young
The primary motivation was the second case, as it is more convenient to
specify a 'dtype' by itself when generating such numbers in a similar vein
to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Joseph Fox-Rabinovitz
2016-02-17 18:52:51 UTC
Permalink
Post by j***@gmail.com
Post by G Young
Hello all,
I have a PR open here that makes "low" an optional parameter in
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd =
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is
the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0,
low)`, but if `low` < 0, numbers are generated over the range `[low,
highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is
defined as above.
My impression (*) is that this will be confusing, and uses a default that I
never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where inf
would be interpreted as the smallest and largest representable number. And
leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the
inconsistency between the new and the old functionality, specifically
in #2. If high is, the behavior is completely different depending on
the value of `low`. Using `np.inf` instead of `None` may fix that,
although I think that the author's idea was to avoid having to type
the bounds in the `None`/`+/-np.inf` cases. I think that a better
option is to have a separate wrapper to `randint` that implements this
behavior in a consistent manner and leaves the current function
consistent as well.

-Joe
Post by j***@gmail.com
Post by G Young
The primary motivation was the second case, as it is more convenient to
specify a 'dtype' by itself when generating such numbers in a similar vein
to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
G Young
2016-02-17 19:09:12 UTC
Permalink
Yes, you are correct in explaining my intentions. However, as I also
mentioned in the PR discussion, I did not quite understand how your wrapper
idea would make things any more comprehensive at the cost of additional
overhead and complexity. What do you mean by making the functions
"consistent" (i.e. outline the behavior *exactly* depending on the
inputs)? As I've explained before, and I will state it again, the
different behavior for the high=None and low != None case is due to
backwards compatibility.

On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz <
Post by G Young
Post by j***@gmail.com
Post by G Young
Hello all,
I have a PR open here that makes "low" an optional parameter in
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd
=
Post by j***@gmail.com
Post by G Young
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where
`dtype` is
Post by j***@gmail.com
Post by G Young
the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0,
low)`, but if `low` < 0, numbers are generated over the range `[low,
highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is
defined as above.
My impression (*) is that this will be confusing, and uses a default
that I
Post by j***@gmail.com
never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where inf
would be interpreted as the smallest and largest representable number.
And
Post by j***@gmail.com
leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the
inconsistency between the new and the old functionality, specifically
in #2. If high is, the behavior is completely different depending on
the value of `low`. Using `np.inf` instead of `None` may fix that,
although I think that the author's idea was to avoid having to type
the bounds in the `None`/`+/-np.inf` cases. I think that a better
option is to have a separate wrapper to `randint` that implements this
behavior in a consistent manner and leaves the current function
consistent as well.
-Joe
Post by j***@gmail.com
Post by G Young
The primary motivation was the second case, as it is more convenient to
specify a 'dtype' by itself when generating such numbers in a similar
vein
Post by j***@gmail.com
Post by G Young
to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Joseph Fox-Rabinovitz
2016-02-17 19:19:00 UTC
Permalink
My point is that you are proposing to make the overall API have
counter-intuitive behavior for the sake of adding a new feature. It is
worth a little bit of overhead to have two functions that behave
exactly as expected. Josef's footnote is a good example of how people
will feel about having to figure out (not to mention remember) the
different use cases. I think it is better to keep the current API and
just add a "bounded_randint" function for which an input of `None`
always means "limit of that bound, no exceptions".

-Joe
Post by G Young
Yes, you are correct in explaining my intentions. However, as I also
mentioned in the PR discussion, I did not quite understand how your wrapper
idea would make things any more comprehensive at the cost of additional
overhead and complexity. What do you mean by making the functions
"consistent" (i.e. outline the behavior exactly depending on the inputs)?
As I've explained before, and I will state it again, the different behavior
for the high=None and low != None case is due to backwards compatibility.
On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz
Post by Joseph Fox-Rabinovitz
Post by j***@gmail.com
Post by G Young
Hello all,
I have a PR open here that makes "low" an optional parameter in
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd =
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is
the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0,
low)`, but if `low` < 0, numbers are generated over the range `[low,
highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is
defined as above.
My impression (*) is that this will be confusing, and uses a default that I
never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where inf
would be interpreted as the smallest and largest representable number. And
leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the
inconsistency between the new and the old functionality, specifically
in #2. If high is, the behavior is completely different depending on
the value of `low`. Using `np.inf` instead of `None` may fix that,
although I think that the author's idea was to avoid having to type
the bounds in the `None`/`+/-np.inf` cases. I think that a better
option is to have a separate wrapper to `randint` that implements this
behavior in a consistent manner and leaves the current function
consistent as well.
-Joe
Post by j***@gmail.com
Post by G Young
The primary motivation was the second case, as it is more convenient to
specify a 'dtype' by itself when generating such numbers in a similar vein
to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-02-17 19:20:16 UTC
Permalink
Post by G Young
Yes, you are correct in explaining my intentions. However, as I also
mentioned in the PR discussion, I did not quite understand how your wrapper
idea would make things any more comprehensive at the cost of additional
overhead and complexity. What do you mean by making the functions
"consistent" (i.e. outline the behavior *exactly* depending on the
inputs)? As I've explained before, and I will state it again, the
different behavior for the high=None and low != None case is due to
backwards compatibility.
One problem is that if there is only one positional argument, then I can
still figure out that it might have different meanings.
If there are two keywords, then I would assume standard python argument
interpretation applies.

If I want to save on typing, then I think it should be for a more
"standard" case. (I also never sample all real numbers, at least not
uniformly.)

Josef
Post by G Young
On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz <
Post by G Young
Post by j***@gmail.com
Post by G Young
Hello all,
I have a PR open here that makes "low" an optional parameter in
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where
`lowbnd =
Post by j***@gmail.com
Post by G Young
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where
`dtype` is
Post by j***@gmail.com
Post by G Young
the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0,
low)`, but if `low` < 0, numbers are generated over the range `[low,
highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd`
is
Post by j***@gmail.com
Post by G Young
defined as above.
My impression (*) is that this will be confusing, and uses a default
that I
Post by j***@gmail.com
never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where
inf
Post by j***@gmail.com
would be interpreted as the smallest and largest representable number.
And
Post by j***@gmail.com
leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the
inconsistency between the new and the old functionality, specifically
in #2. If high is, the behavior is completely different depending on
the value of `low`. Using `np.inf` instead of `None` may fix that,
although I think that the author's idea was to avoid having to type
the bounds in the `None`/`+/-np.inf` cases. I think that a better
option is to have a separate wrapper to `randint` that implements this
behavior in a consistent manner and leaves the current function
consistent as well.
-Joe
Post by j***@gmail.com
Post by G Young
The primary motivation was the second case, as it is more convenient to
specify a 'dtype' by itself when generating such numbers in a similar
vein
Post by j***@gmail.com
Post by G Young
to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-02-17 20:04:37 UTC
Permalink
Post by j***@gmail.com
Post by G Young
Yes, you are correct in explaining my intentions. However, as I also
mentioned in the PR discussion, I did not quite understand how your wrapper
idea would make things any more comprehensive at the cost of additional
overhead and complexity. What do you mean by making the functions
"consistent" (i.e. outline the behavior *exactly* depending on the
inputs)? As I've explained before, and I will state it again, the
different behavior for the high=None and low != None case is due to
backwards compatibility.
One problem is that if there is only one positional argument, then I can
still figure out that it might have different meanings.
If there are two keywords, then I would assume standard python argument
interpretation applies.
If I want to save on typing, then I think it should be for a more
"standard" case. (I also never sample all real numbers, at least not
uniformly.)
One more thing I don't like:

So far all distributions are "theoretical" distributions where the
distribution depends on the provided shape, location and scale parameters.
There is a limitation in how they are represented as numbers/dtype and what
range is possible. However, that is not relevant for most use cases.

In this case you are promoting `dtype` from a memory or storage parameter
to an actual shape (or loc and scale) parameter.
That's "weird", and even more so if this would be the default behavior.

There is no proper uniform distribution on all integers. So, this forces
users to think about the implementation detail like dtype, when I just want
a random sample of a probability distribution.

Josef
Post by j***@gmail.com
Josef
Post by G Young
On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz <
Post by G Young
Post by j***@gmail.com
Post by G Young
Hello all,
I have a PR open here that makes "low" an optional parameter in
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where
`lowbnd =
Post by j***@gmail.com
Post by G Young
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where
`dtype` is
Post by j***@gmail.com
Post by G Young
the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0,
low)`, but if `low` < 0, numbers are generated over the range `[low,
highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd`
is
Post by j***@gmail.com
Post by G Young
defined as above.
My impression (*) is that this will be confusing, and uses a default
that I
Post by j***@gmail.com
never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where
inf
Post by j***@gmail.com
would be interpreted as the smallest and largest representable number.
And
Post by j***@gmail.com
leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the
inconsistency between the new and the old functionality, specifically
in #2. If high is, the behavior is completely different depending on
the value of `low`. Using `np.inf` instead of `None` may fix that,
although I think that the author's idea was to avoid having to type
the bounds in the `None`/`+/-np.inf` cases. I think that a better
option is to have a separate wrapper to `randint` that implements this
behavior in a consistent manner and leaves the current function
consistent as well.
-Joe
Post by j***@gmail.com
Post by G Young
The primary motivation was the second case, as it is more convenient
to
Post by j***@gmail.com
Post by G Young
specify a 'dtype' by itself when generating such numbers in a similar
vein
Post by j***@gmail.com
Post by G Young
to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...