Discussion:
numpy.mean still broken for largefloat32arrays
(too old to reply)
Eelco Hoogendoorn
2014-07-25 03:37:49 UTC
Permalink
Perhaps it is a slightly semantical discussion; but all fp calculations have errors, and there are always strategies for making them smaller. We just don't happen to like the error for this case; but rest assured it won't be hard to find new cases of 'blatantly wrong' results, no matter what accumulator is implemented. That's no reason to not try and be clever about it, but there isn't going to be an algorithm that is best for all possible inputs, and in the end the most important thing is that the algorithm used is specified in the docs.

-----Original Message-----
From: "Alan G Isaac" <***@gmail.com>
Sent: ‎25-‎7-‎2014 00:10
To: "Discussion of Numerical Python" <numpy-***@scipy.org>
Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
This isn't a bug report, but rather a feature request.
I'm not sure statement this is correct. The mean of a float32 array
can certainly be computed as a float32. Currently this is
not necessarily what happens, not even approximately.
That feels a lot like a bug, even if we can readily understand
how the algorithm currently used produces it. To say whether
it is a bug or not, don't we have to ask about the intent of `mean`?
If the intent is to sum and divide, then it is not a bug.
If the intent is to produce the mean, then it is a bug.

Alan Isaac
Eelco Hoogendoorn
2014-07-25 08:22:56 UTC
Permalink
To elaborate on that point; knowing that numpy accumulates in a simple
first-to-last sweep, and does not implicitly upcast, the original problem
can be solved in several ways; specifying a higher precision to sum with,
or by a nested summation, like X.mean(0).mean(0)==1.0. I personally like
this explicitness, and am wary of numpy doing overly clever things behind
the scenes, as I can think of other code that might become broken if things
change too radically. For instance, I often sort large arrays with a large
spread in magnitudes before summation, relying on the fact that summing the
smallest values first gives best precision. Any changes made to reduction
behavior should try and be backwards compatible with such properties of
straightforward reductions, or else a lot of code is going to be broken
without warning.

I suppose using maximum precision internally, and nesting all reductions
over multiple axes of an ndarray, are both easy to implement improvements
that do not come with any drawbacks that I can think of. Actually the
maximum precision I am not so sure of, as I personally prefer to make an
informed decision about precision used, and get an error on a platform that
does not support the specified precision, rather than obtain subtly or
horribly broken results without warning when moving your code to a
different platform/compiler whatever.


On Fri, Jul 25, 2014 at 5:37 AM, Eelco Hoogendoorn <
Post by Eelco Hoogendoorn
Perhaps it is a slightly semantical discussion; but all fp calculations
have errors, and there are always strategies for making them smaller. We
just don't happen to like the error for this case; but rest assured it
won't be hard to find new cases of 'blatantly wrong' results, no matter
what accumulator is implemented. That's no reason to not try and be clever
about it, but there isn't going to be an algorithm that is best for all
possible inputs, and in the end the most important thing is that the
algorithm used is specified in the docs.
------------------------------
Sent: ‎25-‎7-‎2014 00:10
Subject: Re: [Numpy-discussion] numpy.mean still broken for
largefloat32arrays
This isn't a bug report, but rather a feature request.
I'm not sure statement this is correct. The mean of a float32 array
can certainly be computed as a float32. Currently this is
not necessarily what happens, not even approximately.
That feels a lot like a bug, even if we can readily understand
how the algorithm currently used produces it. To say whether
it is a bug or not, don't we have to ask about the intent of `mean`?
If the intent is to sum and divide, then it is not a bug.
If the intent is to produce the mean, then it is a bug.
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
RayS
2014-07-25 14:11:52 UTC
Permalink
Actually the maximum precision I am not so
sure of, as I personally prefer to make an
informed decision about precision used, and get
an error on a platform that does not support
the specified precision, rather than obtain
subtly or horribly broken results without
warning when moving your code to a different platform/compiler whatever.
We were talking on this in the office, as we
realized it does affect a couple of lines dealing
with large arrays, including complex64.
As I expect Python modules to work uniformly
cross platform unless documented otherwise, to me
that includes 32 vs 64 bit platforms, implying
that the modules should automatically use large
enough accumulators for the data type input.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
does mention inaccuracy.
http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.mstats.gmean.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html
etc do not, exactly

- Ray
Robert Kern
2014-07-25 14:22:36 UTC
Permalink
Post by RayS
Actually the maximum precision I am not so
sure of, as I personally prefer to make an
informed decision about precision used, and get
an error on a platform that does not support
the specified precision, rather than obtain
subtly or horribly broken results without
warning when moving your code to a different platform/compiler whatever.
We were talking on this in the office, as we
realized it does affect a couple of lines dealing
with large arrays, including complex64.
As I expect Python modules to work uniformly
cross platform unless documented otherwise, to me
that includes 32 vs 64 bit platforms, implying
that the modules should automatically use large
enough accumulators for the data type input.
The 32/64-bitness of your platform has nothing to do with floating
point. Nothing discussed in this thread is platform-specific (modulo
some minor details about the hardware FPU, but that should be taken as
read).
--
Robert Kern
RayS
2014-07-25 16:56:39 UTC
Permalink
Post by Robert Kern
Post by RayS
We were talking on this in the office, as we
realized it does affect a couple of lines dealing
with large arrays, including complex64.
As I expect Python modules to work uniformly
cross platform unless documented otherwise, to me
that includes 32 vs 64 bit platforms, implying
that the modules should automatically use large
enough accumulators for the data type input.
The 32/64-bitness of your platform has nothing to do with floating
point.
As a naive end user, I can, and do, download different binaries for
different CPUs/Windows versions and will get different results
http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070747.html
Post by Robert Kern
Nothing discussed in this thread is platform-specific (modulo
some minor details about the hardware FPU, but that should be taken as
read).
And compilers, apparently.

The important point was that it would be best if all of the methods
affected by summing 32 bit floats with 32 bit accumulators had the
same Notes as numpy.mean(). We went through a lot of code yesterday,
assuming that any numpy or Scipy.stats functions that use
accumulators suffer the same issue, whether noted or not, and found it true.

"Depending on the input data, this can cause the results to be
inaccurate, especially for float32 (see example below). Specifying a
higher-precision accumulator using the
<http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html#numpy.dtype>dtype
keyword can alleviate this issue." seems rather un-Pythonic.

- Ray
Nathaniel Smith
2014-07-25 18:29:16 UTC
Permalink
Post by Eelco Hoogendoorn
The important point was that it would be best if all of the methods affected
by summing 32 bit floats with 32 bit accumulators had the same Notes as
numpy.mean(). We went through a lot of code yesterday, assuming that any
numpy or Scipy.stats functions that use accumulators suffer the same issue,
whether noted or not, and found it true.
Do you have a list of the functions that are affected?
Post by Eelco Hoogendoorn
"Depending on the input data, this can cause the results to be inaccurate,
especially for float32 (see example below). Specifying a higher-precision
accumulator using the dtype keyword can alleviate this issue." seems rather
un-Pythonic.
It's true that in its full generality, this problem just isn't
something numpy can solve. Using float32 is extremely dangerous and
should not be attempted unless you're prepared to seriously analyze
all your code for numeric stability; IME it often runs into problems
in practice, in any number of ways. Remember that it only has as much
precision as a 24 bit integer. There are good reasons why float64 is
the default!

That said, it does seem that np.mean could be implemented better than
it is, even given float32's inherent limitations. If anyone wants to
implement better algorithms for computing the mean, variance, sums,
etc., then we would love to add them to numpy. I'd suggest
implementing them as gufuncs -- there are examples of defining gufuncs
in numpy/linalg/umath_linalg.c.src.

-n
--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
RayS
2014-07-25 20:25:57 UTC
Permalink
Post by Eelco Hoogendoorn
Post by RayS
The important point was that it would be best if all of the
methods affected
Post by RayS
by summing 32 bit floats with 32 bit accumulators had the same Notes as
numpy.mean(). We went through a lot of code yesterday, assuming that any
numpy or Scipy.stats functions that use accumulators suffer the same issue,
whether noted or not, and found it true.
Do you have a list of the functions that are affected?
We only tested a few we used, but
scipy.stats.nanmean, or any .*mean()
numpy.sum, mean, average, std, var,...

via something like:

import numpy
import scipy.stats
print numpy.__version__
print scipy.__version__
onez = numpy.ones((2**25, 1), numpy.float32)
step = 2**10
func = scipy.stats.nanmean
for s in range(2**24-step, 2**25, step):
if func(onez[:s+step])!=1.:
print '\nbroke', s, func(onez[:s+step])
break
else:
print '\r',s,
Post by Eelco Hoogendoorn
That said, it does seem that np.mean could be implemented better than
it is, even given float32's inherent limitations. If anyone wants to
implement better algorithms for computing the mean, variance, sums,
etc., then we would love to add them to numpy.
Others have pointed out the possible tradeoffs in summation algos,
perhaps a method arg would be appropriate, "better" depending on your
desire for speed vs. accuracy.

It just occurred to me that if the STSI folks (who count photons)
took the mean() or other such func of an image array from Hubble
sensors to find background value, they'd better always be using float64.

- Ray
j***@gmail.com
2014-07-25 21:36:27 UTC
Permalink
Post by RayS
Post by Eelco Hoogendoorn
Post by RayS
The important point was that it would be best if all of the
methods affected
Post by RayS
by summing 32 bit floats with 32 bit accumulators had the same Notes as
numpy.mean(). We went through a lot of code yesterday, assuming that
any
Post by Eelco Hoogendoorn
Post by RayS
numpy or Scipy.stats functions that use accumulators suffer the same
issue,
Post by Eelco Hoogendoorn
Post by RayS
whether noted or not, and found it true.
Do you have a list of the functions that are affected?
We only tested a few we used, but
scipy.stats.nanmean, or any .*mean()
numpy.sum, mean, average, std, var,...
import numpy
import scipy.stats
print numpy.__version__
print scipy.__version__
onez = numpy.ones((2**25, 1), numpy.float32)
step = 2**10
func = scipy.stats.nanmean
print '\nbroke', s, func(onez[:s+step])
break
print '\r',s,
Post by Eelco Hoogendoorn
That said, it does seem that np.mean could be implemented better than
it is, even given float32's inherent limitations. If anyone wants to
implement better algorithms for computing the mean, variance, sums,
etc., then we would love to add them to numpy.
Others have pointed out the possible tradeoffs in summation algos,
perhaps a method arg would be appropriate, "better" depending on your
desire for speed vs. accuracy.
I think this would be a good improvement.
But it doesn't compensate for users to be aware of the problems. I think
the docstring and the description of the dtype argument is pretty clear.

I'm largely with Eelco, whatever precision or algorithm we use, with
floating point calculations we run into problems in some cases. And I don't
think this is a "broken" function but a design decision that takes the
different tradeoffs into account.
Whether it's the right decision is an open question, if there are better
algorithm with essentially not disadvantages.

Two examples:
I had problems to verify some results against Stata at more than a few
significant digits, until I realized that Stata had used float32 for the
calculations by default in this case, while I was working with float64.
Using single precision linear algebra causes the same numerical problems as
numpy.mean runs into.

A few years ago I tried to match some tougher NIST examples that were
intentionally very badly scaled. numpy.mean at float64 had quite large
errors, but a simple trick with two passes through the data managed to get
very close to the certified NIST examples.


my conclusion:
don't use float32 unless you know you don't need any higher precision.
even with float64 it is possible to run into extreme cases where you get
numerical garbage or large precision losses.
However, in the large majority of cases a boring fast "naive"
implementation is enough.

Also, whether we use mean, sum or dot in a calculation is an implementation
detail, which in the case of dot doesn't have a dtype argument and always
depends on the dtype of the arrays, AFAIK.
Separating the accumulation dtype from the array dtype would require a lot
of work except in the simplest cases, like those that only use sum and mean
with specified dtype argument.
Post by RayS
Post by Eelco Hoogendoorn
Post by RayS
X = np.ones((50000, 1024), np.float32)
X.mean()
1.0
Post by RayS
Post by Eelco Hoogendoorn
Post by RayS
X.mean(dtype=np.float32)
1.0
Post by RayS
Post by Eelco Hoogendoorn
Post by RayS
np.dot(X.ravel(), np.ones(X.ravel().shape) *1. / X.ravel().shape)
1.0000000002299174
Post by RayS
Post by Eelco Hoogendoorn
Post by RayS
np.__version__
'1.5.1'

Win32

Josef
Post by RayS
It just occurred to me that if the STSI folks (who count photons)
took the mean() or other such func of an image array from Hubble
sensors to find background value, they'd better always be using float64.
- Ray
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
RayS
2014-07-25 23:51:03 UTC
Permalink
Post by j***@gmail.com
But it doesn't compensate for users to be aware of the problems. I
think the docstring and the description of the dtype argument is pretty clear.
Most of the docs for the affected functions do not have a Note with
the same warning as mean()

- Ray
Eelco Hoogendoorn
2014-07-25 17:40:17 UTC
Permalink
Arguably, the whole of floating point numbers and their related shenanigans is not very pythonic in the first place. The accuracy of the output WILL depend on the input, to some degree or another. At the risk of repeating myself: explicit is better than implicit

-----Original Message-----
From: "RayS" <***@blue-cove.com>
Sent: ‎25-‎7-‎2014 19:56
To: "Discussion of Numerical Python" <numpy-***@scipy.org>
Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
Post by RayS
We were talking on this in the office, as we
realized it does affect a couple of lines dealing
with large arrays, including complex64.
As I expect Python modules to work uniformly
cross platform unless documented otherwise, to me
that includes 32 vs 64 bit platforms, implying
that the modules should automatically use large
enough accumulators for the data type input.
The 32/64-bitness of your platform has nothing to do with floating
point.

As a naive end user, I can, and do, download different binaries for different CPUs/Windows versions and will get different results
http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070747.html


Nothing discussed in this thread is platform-specific (modulo
some minor details about the hardware FPU, but that should be taken as
read).

And compilers, apparently.

The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true.

"Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue." seems rather un-Pythonic.

- Ray
Alan G Isaac
2014-07-25 18:00:15 UTC
Permalink
Post by Eelco Hoogendoorn
At the risk of repeating myself: explicit is better than implicit
This sounds like an argument for renaming the `mean` function `naivemean`
rather than `mean`. Whatever numpy names `mean`, shouldn't it
implement an algorithm that produces the mean? And obviously, for any
float data type, the mean value of the values in the array is representable
as a value of the same type.

Alan Isaac
Eelco Hoogendoorn
2014-07-25 19:23:43 UTC
Permalink
It need not be exactly representable as such; take the mean of [1, 1+eps]
for instance. Granted, there are at most two number in the range of the
original dtype which are closest to the true mean; but im not sure that
computing them exactly is a tractable problem for arbitrary input.

Im not sure what is considered best practice for these problems; or if
there is one, considering the hetrogenity of the problem. As noted earlier,
summing a list of floating point values is a remarkably multifaceted
problem, once you get down into the details.

I think it should be understood that all floating point algorithms are
subject to floating point errors. As long as the algorithm used is
specified, one can make an informed decision if the given algorithm will do
what you expect of it. That's the best we can hope for.

If we are going to advocate doing 'clever' things behind the scenes, we
have to take backwards compatibility (not creating a possibility of
producing worse results on the same input) and platform independence in
mind. Funny summation orders could violate the former depending on the
implementation details, and 'using the highest machine precision available'
violates the latter (and is horrible practice in general, imo. Either you
don't need the extra accuracy, or you do, and the absence on a given
platform should be an error)

Perhaps pairwise summation in the original order of the data is the best
option:

q = np.ones((2,)*26, np.float32)
print q.mean()
while q.ndim > 0:
q = q.mean(axis=-1, dtype=np.float32)
print q

This only requires log(N) space on the stack if properly implemented, and
is not platform dependent, nor should have any backward compatibility
issues that I can think of. But im not sure how easy it would be to
implement, given the current framework. The ability to specify different
algorithms per kwarg wouldn't be a bad idea either, imo; or the ability to
explicitly specify a separate output and accumulator dtype.
Post by Alan G Isaac
Post by Eelco Hoogendoorn
At the risk of repeating myself: explicit is better than implicit
This sounds like an argument for renaming the `mean` function `naivemean`
rather than `mean`. Whatever numpy names `mean`, shouldn't it
implement an algorithm that produces the mean? And obviously, for any
float data type, the mean value of the values in the array is representable
as a value of the same type.
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2014-07-26 09:15:00 UTC
Permalink
Post by Eelco Hoogendoorn
It need not be exactly representable as such; take the mean of [1, 1
+eps] for instance. Granted, there are at most two number in the range
of the original dtype which are closest to the true mean; but im not
sure that computing them exactly is a tractable problem for arbitrary
input.
<snip>
Post by Eelco Hoogendoorn
This only requires log(N) space on the stack if properly implemented,
and is not platform dependent, nor should have any backward
compatibility issues that I can think of. But im not sure how easy it
would be to implement, given the current framework. The ability to
specify different algorithms per kwarg wouldn't be a bad idea either,
imo; or the ability to explicitly specify a separate output and
accumulator dtype.
Well, you already can use dtype to cause an upcast of both arrays.
However this currently will cause a buffered upcast to float64 for the
float32 data. You could also add a d,f->d loop to avoid the cast, but
then you would have to use the out argument currently.

In any case, the real solution here is IMO what I think most of us
already thought before would be good, and that is a keyword argument or
maybe context (though I am unsure about details with threading, etc.) to
chose more stable algorithms for such statistical functions. The
pairwise summation that is in master now is very awesome, but it is not
secure enough in the sense that a new user will have difficulty
understanding when he can be sure it is used.

- Sebastian
Post by Eelco Hoogendoorn
Post by Eelco Hoogendoorn
At the risk of repeating myself: explicit is better than
implicit
<snip>
Sturla Molden
2014-07-26 10:39:18 UTC
Permalink
Post by Sebastian Berg
chose more stable algorithms for such statistical functions. The
pairwise summation that is in master now is very awesome, but it is not
secure enough in the sense that a new user will have difficulty
understanding when he can be sure it is used.
Why is it not always used?
Eelco Hoogendoorn
2014-07-26 13:38:46 UTC
Permalink
I was wondering the same thing. Are there any known tradeoffs to this
method of reduction?
Post by Sturla Molden
Post by Sebastian Berg
chose more stable algorithms for such statistical functions. The
pairwise summation that is in master now is very awesome, but it is not
secure enough in the sense that a new user will have difficulty
understanding when he can be sure it is used.
Why is it not always used?
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Julian Taylor
2014-07-26 13:53:06 UTC
Permalink
Post by Sturla Molden
Why is it not always used?
for 1d reduction the iterator blocks by 8192 elements even when no
buffering is required. There is a TODO in the source to fix that by
adding additional checks. Unfortunately nobody knows hat these
additional tests would need to be and Mark Wiebe who wrote it did not
reply to a ping yet.

Also along the non-fast axes the iterator optimizes the reduction to
remove the strided access, see:
https://github.com/numpy/numpy/pull/4697#issuecomment-42752599


Instead of having a keyword argument to mean I would prefer a context
manager that changes algorithms for different requirements.
This would easily allow changing the accuracy and performance of third
party functions using numpy without changing the third party library as
long as they are using numpy as the base.
E.g.
with np.precisionstate(sum="kahan"):
scipy.stats.nanmean(d)

We also have case where numpy uses algorithms that are far more precise
than most people needs them. E.g. np.hypot and the related complex
absolute value and division.
These are very slow with glibc as it provides 1ulp accuracy, this is
hardly ever needed.
Another case that could use dynamic changing is flushing subnormals to zero.

But this api is like Nathaniels parameterizable dtypes just an idea
floating in my head which needs proper design and implementation written
down. The issue is as usual ENOTIME.
Benjamin Root
2014-07-26 13:57:05 UTC
Permalink
I could get behind the context manager approach. It would help keep
backwards compatibility, while providing a very easy (and clean) way of
consistently using the same reduction operation. Adding kwargs is just a
road to hell.

Cheers!
Ben Root


On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor <
Post by Julian Taylor
Post by Sturla Molden
Why is it not always used?
for 1d reduction the iterator blocks by 8192 elements even when no
buffering is required. There is a TODO in the source to fix that by
adding additional checks. Unfortunately nobody knows hat these
additional tests would need to be and Mark Wiebe who wrote it did not
reply to a ping yet.
Also along the non-fast axes the iterator optimizes the reduction to
https://github.com/numpy/numpy/pull/4697#issuecomment-42752599
Instead of having a keyword argument to mean I would prefer a context
manager that changes algorithms for different requirements.
This would easily allow changing the accuracy and performance of third
party functions using numpy without changing the third party library as
long as they are using numpy as the base.
E.g.
scipy.stats.nanmean(d)
We also have case where numpy uses algorithms that are far more precise
than most people needs them. E.g. np.hypot and the related complex
absolute value and division.
These are very slow with glibc as it provides 1ulp accuracy, this is
hardly ever needed.
Another case that could use dynamic changing is flushing subnormals to zero.
But this api is like Nathaniels parameterizable dtypes just an idea
floating in my head which needs proper design and implementation written
down. The issue is as usual ENOTIME.
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2014-07-26 18:29:11 UTC
Permalink
Post by Benjamin Root
I could get behind the context manager approach. It would help keep
backwards compatibility, while providing a very easy (and clean) way of
consistently using the same reduction operation. Adding kwargs is just a
road to hell.
Wouldn't a context manager require a global state that changes how
everything is calculated ?

Josef
Post by Benjamin Root
Cheers!
Ben Root
On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor <
Post by Julian Taylor
Post by Sturla Molden
Why is it not always used?
for 1d reduction the iterator blocks by 8192 elements even when no
buffering is required. There is a TODO in the source to fix that by
adding additional checks. Unfortunately nobody knows hat these
additional tests would need to be and Mark Wiebe who wrote it did not
reply to a ping yet.
Also along the non-fast axes the iterator optimizes the reduction to
https://github.com/numpy/numpy/pull/4697#issuecomment-42752599
Instead of having a keyword argument to mean I would prefer a context
manager that changes algorithms for different requirements.
This would easily allow changing the accuracy and performance of third
party functions using numpy without changing the third party library as
long as they are using numpy as the base.
E.g.
scipy.stats.nanmean(d)
We also have case where numpy uses algorithms that are far more precise
than most people needs them. E.g. np.hypot and the related complex
absolute value and division.
These are very slow with glibc as it provides 1ulp accuracy, this is
hardly ever needed.
Another case that could use dynamic changing is flushing subnormals to zero.
But this api is like Nathaniels parameterizable dtypes just an idea
floating in my head which needs proper design and implementation written
down. The issue is as usual ENOTIME.
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Benjamin Root
2014-07-26 18:44:08 UTC
Permalink
That is one way of doing it, and probably the cleanest way. Or else you
have to pass in the context object everywhere anyway. But I am not so
concerned about that (we do that for other things as well). Bigger concerns
would be nested contexts. For example, what if one of the scikit functions
use such a context to explicitly state that they need a particular
reduction algorithm, but the call to that scikit function is buried under a
few layers of user functions, at the top of which has a context manager
that states a different reduction op.

Whose context wins? Naively, the scikit's context wins (because that's how
contexts work). But, does that break with the very broad design goal here?
To let the user specify the reduction kernel? Practically speaking, we
could see users naively puting in context managers all over the place in
their libraries, possibly choosing incorrect algorithms (I am serious here,
how often have we seen stackoverflow instructions just blindly parrot
certain arguments "just because")? This gives the user no real mechanism to
override the library, largely defeating the purpose.

My other concern would be with multi-threaded code (which is where a global
state would be bad).

Ben
Post by j***@gmail.com
Post by Benjamin Root
I could get behind the context manager approach. It would help keep
backwards compatibility, while providing a very easy (and clean) way of
consistently using the same reduction operation. Adding kwargs is just a
road to hell.
Wouldn't a context manager require a global state that changes how
everything is calculated ?
Josef
Post by Benjamin Root
Cheers!
Ben Root
On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor <
Post by Julian Taylor
Post by Sturla Molden
Why is it not always used?
for 1d reduction the iterator blocks by 8192 elements even when no
buffering is required. There is a TODO in the source to fix that by
adding additional checks. Unfortunately nobody knows hat these
additional tests would need to be and Mark Wiebe who wrote it did not
reply to a ping yet.
Also along the non-fast axes the iterator optimizes the reduction to
https://github.com/numpy/numpy/pull/4697#issuecomment-42752599
Instead of having a keyword argument to mean I would prefer a context
manager that changes algorithms for different requirements.
This would easily allow changing the accuracy and performance of third
party functions using numpy without changing the third party library as
long as they are using numpy as the base.
E.g.
scipy.stats.nanmean(d)
We also have case where numpy uses algorithms that are far more precise
than most people needs them. E.g. np.hypot and the related complex
absolute value and division.
These are very slow with glibc as it provides 1ulp accuracy, this is
hardly ever needed.
Another case that could use dynamic changing is flushing subnormals to zero.
But this api is like Nathaniels parameterizable dtypes just an idea
floating in my head which needs proper design and implementation written
down. The issue is as usual ENOTIME.
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2014-07-26 19:00:12 UTC
Permalink
Post by Benjamin Root
That is one way of doing it, and probably the cleanest way. Or else you
have to pass in the context object everywhere anyway. But I am not so
concerned about that (we do that for other things as well). Bigger concerns
would be nested contexts. For example, what if one of the scikit functions
use such a context to explicitly state that they need a particular
reduction algorithm, but the call to that scikit function is buried under a
few layers of user functions, at the top of which has a context manager
that states a different reduction op.
Whose context wins? Naively, the scikit's context wins (because that's how
contexts work). But, does that break with the very broad design goal here?
To let the user specify the reduction kernel? Practically speaking, we
could see users naively puting in context managers all over the place in
their libraries, possibly choosing incorrect algorithms (I am serious here,
how often have we seen stackoverflow instructions just blindly parrot
certain arguments "just because")? This gives the user no real mechanism to
override the library, largely defeating the purpose.
My other concern would be with multi-threaded code (which is where a
global state would be bad).
statsmodels still has avoided anything that smells like a global state that
changes calculation.
(We never even implemented different global warning levels.)

https://groups.google.com/d/msg/pystatsmodels/-J9WXKLjyH4/5xvKu9_mbbEJ


Josef
There be Dragons.
Post by Benjamin Root
Ben
Post by j***@gmail.com
Post by Benjamin Root
I could get behind the context manager approach. It would help keep
backwards compatibility, while providing a very easy (and clean) way of
consistently using the same reduction operation. Adding kwargs is just a
road to hell.
Wouldn't a context manager require a global state that changes how
everything is calculated ?
Josef
Post by Benjamin Root
Cheers!
Ben Root
On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor <
Post by Julian Taylor
Post by Sturla Molden
Why is it not always used?
for 1d reduction the iterator blocks by 8192 elements even when no
buffering is required. There is a TODO in the source to fix that by
adding additional checks. Unfortunately nobody knows hat these
additional tests would need to be and Mark Wiebe who wrote it did not
reply to a ping yet.
Also along the non-fast axes the iterator optimizes the reduction to
https://github.com/numpy/numpy/pull/4697#issuecomment-42752599
Instead of having a keyword argument to mean I would prefer a context
manager that changes algorithms for different requirements.
This would easily allow changing the accuracy and performance of third
party functions using numpy without changing the third party library as
long as they are using numpy as the base.
E.g.
scipy.stats.nanmean(d)
We also have case where numpy uses algorithms that are far more precise
than most people needs them. E.g. np.hypot and the related complex
absolute value and division.
These are very slow with glibc as it provides 1ulp accuracy, this is
hardly ever needed.
Another case that could use dynamic changing is flushing subnormals to zero.
But this api is like Nathaniels parameterizable dtypes just an idea
floating in my head which needs proper design and implementation written
down. The issue is as usual ENOTIME.
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sturla Molden
2014-07-26 19:19:52 UTC
Permalink
Post by j***@gmail.com
statsmodels still has avoided anything that smells like a global state that
changes calculation.
If global states are stored in a stack, as in OpenGL, it is not so bad. A
context manager could push a state in __enter__ and pop the state in
__exit__. This is actually how I write OpenGL code in Python and Cython:
pairs of glBegin/glEnd, glPushMatrix/glPopMatrix, and
glPushAttrib/glPopAttrib nicely fits with Python context managers.

However, the bigger issue is multithreading scalability. You need to
protect the global state with a recursive lock, and it might not scale like
you want. A thread might call a lengthy computation that releases the GIL,
but still hold the rlock that protects the state. All your hopes for seing
more then one core saturated will go down the drain. It is even bad for i/o
bound code, e.g. on-line signal processing: Data might be ready for
processing in one thread, but the global state is locked by an idle thread
waiting for data.

Sturla
Sturla Molden
2014-07-26 19:04:10 UTC
Permalink
Post by Benjamin Root
My other concern would be with multi-threaded code (which is where a global
state would be bad).
It would presumably require a global threading.RLock for protecting the
global state.

Sturla
Robert Kern
2014-07-26 20:06:21 UTC
Permalink
Post by Sturla Molden
Post by Benjamin Root
My other concern would be with multi-threaded code (which is where a global
state would be bad).
It would presumably require a global threading.RLock for protecting the
global state.
We would use thread-local storage like we currently do with the
np.errstate() context manager. Each thread will have its own "global"
state.
--
Robert Kern
Sturla Molden
2014-07-26 21:19:06 UTC
Permalink
Post by Robert Kern
Post by Sturla Molden
It would presumably require a global threading.RLock for protecting the
global state.
We would use thread-local storage like we currently do with the
np.errstate() context manager. Each thread will have its own "global"
state.
That sounds like a better plan, yes :)


Sturla
j***@gmail.com
2014-07-27 06:04:50 UTC
Permalink
Post by Sturla Molden
Post by Robert Kern
Post by Sturla Molden
It would presumably require a global threading.RLock for protecting the
global state.
We would use thread-local storage like we currently do with the
np.errstate() context manager. Each thread will have its own "global"
state.
That sounds like a better plan, yes :)
Any "global" state that changes how things are calculated will have
unpredictable results.

And I don't trust python users to be disciplined enough.

issue: Why do I get different results after `import this_funy_package`?

Josef
Post by Sturla Molden
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Kern
2014-07-27 08:24:58 UTC
Permalink
Post by j***@gmail.com
Post by Sturla Molden
Post by Robert Kern
Post by Sturla Molden
It would presumably require a global threading.RLock for protecting the
global state.
We would use thread-local storage like we currently do with the
np.errstate() context manager. Each thread will have its own "global"
state.
That sounds like a better plan, yes :)
Any "global" state that changes how things are calculated will have
unpredictable results.
And I don't trust python users to be disciplined enough.
issue: Why do I get different results after `import this_funy_package`?
That's why the suggestion is that it be controlled by a context
manager. The state change will only be limited to the `with:`
statement. You would not be able to "fire-and-forget" the state
change.
--
Robert Kern
j***@gmail.com
2014-07-27 08:56:32 UTC
Permalink
Post by Robert Kern
Post by j***@gmail.com
Post by Sturla Molden
Post by Robert Kern
Post by Sturla Molden
It would presumably require a global threading.RLock for protecting
the
Post by j***@gmail.com
Post by Sturla Molden
Post by Robert Kern
Post by Sturla Molden
global state.
We would use thread-local storage like we currently do with the
np.errstate() context manager. Each thread will have its own "global"
state.
That sounds like a better plan, yes :)
Any "global" state that changes how things are calculated will have
unpredictable results.
And I don't trust python users to be disciplined enough.
issue: Why do I get different results after `import this_funy_package`?
That's why the suggestion is that it be controlled by a context
manager. The state change will only be limited to the `with:`
statement. You would not be able to "fire-and-forget" the state
change.
Can you implement a context manager without introducing a global variable
that everyone could set, and forget?

Josef
Post by Robert Kern
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Kern
2014-07-27 09:04:41 UTC
Permalink
Post by j***@gmail.com
Post by Robert Kern
Post by j***@gmail.com
Post by Sturla Molden
Post by Robert Kern
Post by Sturla Molden
It would presumably require a global threading.RLock for protecting the
global state.
We would use thread-local storage like we currently do with the
np.errstate() context manager. Each thread will have its own "global"
state.
That sounds like a better plan, yes :)
Any "global" state that changes how things are calculated will have
unpredictable results.
And I don't trust python users to be disciplined enough.
issue: Why do I get different results after `import this_funy_package`?
That's why the suggestion is that it be controlled by a context
manager. The state change will only be limited to the `with:`
statement. You would not be able to "fire-and-forget" the state
change.
Can you implement a context manager without introducing a global variable
that everyone could set, and forget?
Oh sure, with enough effort and digging, someone could search through
the C source, find the hidden, private API that does this, and
deliberately mess with it. But they can already do that with all of
the other necessarily-global state; every module object is a glorified
global variable that can be mutated.

You won't be able to do it by accident or omission or a lack of
discipline. It's not a tempting public target like, say, np.seterr().
--
Robert Kern
RayS
2014-07-27 14:16:47 UTC
Permalink
Post by Robert Kern
You won't be able to do it by accident or omission or a lack of
discipline. It's not a tempting public target like, say, np.seterr().
BTW, why not throw an overflow error in the large float32 sum() case?
Is it too expensive to check while accumulating?

- Ray
Nathaniel Smith
2014-07-27 14:44:46 UTC
Permalink
Post by RayS
Post by Robert Kern
You won't be able to do it by accident or omission or a lack of
discipline. It's not a tempting public target like, say, np.seterr().
BTW, why not throw an overflow error in the large float32 sum() case?
Is it too expensive to check while accumulating?
In the example that started this thread, there's no overflow (in the
technical sense) occurring. Overflow for ints means wrapping around,
and for floats it means exceeding the maximum possible value and
overflowing to infinity.

The problem here is that when summing up the values, the sum gets
large enough that after rounding, x + 1 = x and the sum stops
increasing. (For float32's all this requires is x > 16777216.) So
while the final error is massive, the mechanism is just ordinary
floating-point round-off error.

-n
--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
RayS
2014-07-27 17:02:16 UTC
Permalink
Thanks for the clarification, but how is the numpy rounding directed?
Round to nearest, ties to even?
http://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules
Just curious, as I couldn't find a reference.

- Ray
Post by Nathaniel Smith
Post by RayS
Post by Robert Kern
You won't be able to do it by accident or omission or a lack of
discipline. It's not a tempting public target like, say, np.seterr().
BTW, why not throw an overflow error in the large float32 sum() case?
Is it too expensive to check while accumulating?
In the example that started this thread, there's no overflow (in the
technical sense) occurring. Overflow for ints means wrapping around,
and for floats it means exceeding the maximum possible value and
overflowing to infinity.
The problem here is that when summing up the values, the sum gets
large enough that after rounding, x + 1 = x and the sum stops
increasing. (For float32's all this requires is x > 16777216.) So
while the final error is massive, the mechanism is just ordinary
floating-point round-off error.
-n
--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sturla Molden
2014-07-27 18:26:37 UTC
Permalink
Post by Nathaniel Smith
The problem here is that when summing up the values, the sum gets
large enough that after rounding, x + 1 = x and the sum stops
increasing.
Interesting. That explains why the divide-and-conquer reduction is much
more robust.

Thanks :)


Sturla
Eelco Hoogendoorn
2014-07-28 12:37:13 UTC
Permalink
To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
np.ones((2,N)).mean(1) produce different results with the implementation in
the current master? If so, I think that would be very much regrettable; and
if this is a minority opinion, I do hope that at least this gets documented
in a most explicit manner.
Post by Sturla Molden
Post by Nathaniel Smith
The problem here is that when summing up the values, the sum gets
large enough that after rounding, x + 1 = x and the sum stops
increasing.
Interesting. That explains why the divide-and-conquer reduction is much
more robust.
Thanks :)
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2014-07-28 12:46:35 UTC
Permalink
Post by Eelco Hoogendoorn
To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
np.ones((2,N)).mean(1) produce different results with the
implementation in the current master? If so, I think that would be
very much regrettable; and if this is a minority opinion, I do hope
that at least this gets documented in a most explicit manner.
This will always give you different results. Though in master. the
difference is more likely to be large, since (often the second one)
maybe be less likely to run into bigger numerical issues.
Post by Eelco Hoogendoorn
On Sun, Jul 27, 2014 at 8:26 PM, Sturla Molden
Post by Nathaniel Smith
The problem here is that when summing up the values, the sum
gets
Post by Nathaniel Smith
large enough that after rounding, x + 1 = x and the sum
stops
Post by Nathaniel Smith
increasing.
Interesting. That explains why the divide-and-conquer
reduction is much
more robust.
Thanks :)
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
alex
2014-07-28 13:21:15 UTC
Permalink
On Mon, Jul 28, 2014 at 8:46 AM, Sebastian Berg
Post by Sebastian Berg
Post by Eelco Hoogendoorn
To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
np.ones((2,N)).mean(1) produce different results with the
implementation in the current master? If so, I think that would be
very much regrettable; and if this is a minority opinion, I do hope
that at least this gets documented in a most explicit manner.
This will always give you different results. Though in master. the
difference is more likely to be large, since (often the second one)
maybe be less likely to run into bigger numerical issues.
Are you sure they always give different results? Notice that
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so these
differences 'cancel out'.

My understanding of the question is to clarify how numpy reduction
algorithms are special-cased for the fast axis vs. other axes.
Sturla Molden
2014-07-28 13:35:23 UTC
Permalink
Post by alex
Are you sure they always give different results? Notice that
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so these
differences 'cancel out'.
They will be if different algorithms are used. np.ones((N,2)).mean(0)
will have larger accumulated rounding error than np.ones((2,N)).mean(1),
if only the latter uses the divide-and-conquer summation.

I would suggest that in the first case we try to copy the array to a
temporary contiguous buffer and use the same divide-and-conquer
algorithm, unless some heuristics on memory usage fails.

Sturla
Sebastian Berg
2014-07-28 14:06:12 UTC
Permalink
Post by Sturla Molden
Post by alex
Are you sure they always give different results? Notice that
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so these
differences 'cancel out'.
They will be if different algorithms are used. np.ones((N,2)).mean(0)
will have larger accumulated rounding error than np.ones((2,N)).mean(1),
if only the latter uses the divide-and-conquer summation.
What I wanted to point out is that to some extend the algorithm does not
matter. You will not necessarily get identical results already if you
use a different iteration order, and we have been doing that for years
for speed reasons. All libs like BLAS do the same.
Yes, the new changes make this much more dramatic, but they only make
some paths much better, never worse. It might be dangerous, but only in
the sense that you test it with the good path and it works good enough,
but later (also) use the other one in some lib. I am not even sure if I
Post by Sturla Molden
I would suggest that in the first case we try to copy the array to a
temporary contiguous buffer and use the same divide-and-conquer
algorithm, unless some heuristics on memory usage fails.
Sure, but you have to make major changes to the buffered iterator to do
that without larger speed implications. It might be a good idea, but it
requires someone who knows this stuff to spend a lot of time and care in
the depths of numpy.
Post by Sturla Molden
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Eelco Hoogendoorn
2014-07-28 14:31:41 UTC
Permalink
Sebastian:

Those are good points. Indeed iteration order may already produce different
results, even though the semantics of numpy suggest identical operations.
Still, I feel this different behavior without any semantical clues is
something to be minimized.

Indeed copying might have large speed implications. But on second thought,
does it? Either the data is already aligned and no copy is required, or it
isn't aligned, and we need one pass of cache inefficient access to the data
anyway. Infact, if we had one low level function which does
cache-intelligent transposition of numpy arrays (using some block
strategy), this might be faster even than the current simple reduction
operations when forced to work on awkwardly aligned data. Ideally, this
intelligent access and intelligent reduction would be part of a single pass
of course; but that wouldn't really fit within the numpy design, and merely
such an intelligent transpose would provide most of the benefit I think. Or
is the mechanism behind ascontiguousarray already intelligent in this sense?
Post by Sebastian Berg
Post by Sturla Molden
Post by alex
Are you sure they always give different results? Notice that
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so these
differences 'cancel out'.
They will be if different algorithms are used. np.ones((N,2)).mean(0)
will have larger accumulated rounding error than np.ones((2,N)).mean(1),
if only the latter uses the divide-and-conquer summation.
What I wanted to point out is that to some extend the algorithm does not
matter. You will not necessarily get identical results already if you
use a different iteration order, and we have been doing that for years
for speed reasons. All libs like BLAS do the same.
Yes, the new changes make this much more dramatic, but they only make
some paths much better, never worse. It might be dangerous, but only in
the sense that you test it with the good path and it works good enough,
but later (also) use the other one in some lib. I am not even sure if I
Post by Sturla Molden
I would suggest that in the first case we try to copy the array to a
temporary contiguous buffer and use the same divide-and-conquer
algorithm, unless some heuristics on memory usage fails.
Sure, but you have to make major changes to the buffered iterator to do
that without larger speed implications. It might be a good idea, but it
requires someone who knows this stuff to spend a lot of time and care in
the depths of numpy.
Post by Sturla Molden
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2014-07-28 15:22:33 UTC
Permalink
Post by Eelco Hoogendoorn
Those are good points. Indeed iteration order may already produce
different results, even though the semantics of numpy suggest
identical operations. Still, I feel this different behavior without
any semantical clues is something to be minimized.
Indeed copying might have large speed implications. But on second
thought, does it? Either the data is already aligned and no copy is
required, or it isn't aligned, and we need one pass of cache
inefficient access to the data anyway. Infact, if we had one low level
function which does cache-intelligent transposition of numpy arrays
(using some block strategy), this might be faster even than the
current simple reduction operations when forced to work on awkwardly
aligned data. Ideally, this intelligent access and intelligent
reduction would be part of a single pass of course; but that wouldn't
really fit within the numpy design, and merely such an intelligent
transpose would provide most of the benefit I think. Or is the
mechanism behind ascontiguousarray already intelligent in this sense?
The iterator is currently smart in the sense that it will (obviously low
level), do something like [1]. Most things in numpy use that iterator,
ascontiguousarray does so as well. Such a blocked cache aware iterator
is what I mean by, someone who knows it would have to spend a lot of
time on it :).

[1] Appendix:

arr = np.ones((100, 100))
arr.sum(1)
# being equivalent (iteration order wise) to:
res = np.zeros(100)
for i in range(100):
res += arr[i, :]
# while arr.sum(0) would be:
for i in range(100):
res[i] = arr[i, :].sum()
Post by Eelco Hoogendoorn
On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg
Post by Sturla Molden
Post by alex
Are you sure they always give different results? Notice
that
Post by Sturla Molden
Post by alex
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so
these
Post by Sturla Molden
Post by alex
differences 'cancel out'.
They will be if different algorithms are used.
np.ones((N,2)).mean(0)
Post by Sturla Molden
will have larger accumulated rounding error than
np.ones((2,N)).mean(1),
Post by Sturla Molden
if only the latter uses the divide-and-conquer summation.
What I wanted to point out is that to some extend the
algorithm does not
matter. You will not necessarily get identical results already if you
use a different iteration order, and we have been doing that for years
for speed reasons. All libs like BLAS do the same.
Yes, the new changes make this much more dramatic, but they only make
some paths much better, never worse. It might be dangerous, but only in
the sense that you test it with the good path and it works good enough,
but later (also) use the other one in some lib. I am not even sure if I
Post by Sturla Molden
I would suggest that in the first case we try to copy the
array to a
Post by Sturla Molden
temporary contiguous buffer and use the same
divide-and-conquer
Post by Sturla Molden
algorithm, unless some heuristics on memory usage fails.
Sure, but you have to make major changes to the buffered iterator to do
that without larger speed implications. It might be a good idea, but it
requires someone who knows this stuff to spend a lot of time and care in
the depths of numpy.
Post by Sturla Molden
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Eelco Hoogendoorn
2014-07-28 21:32:15 UTC
Permalink
I see, thanks for the clarification. Just for the sake of argument, since
unfortunately I don't have the time to go dig in the guts of numpy myself:
a design which always produces results of the same (high) accuracy, but
only optimizes the common access patterns in a hacky way, and may be
inefficient in case it needs to fall back on dumb iteration or array
copying, is the best compromise between features and the ever limiting
amount of time available, I would argue, no? Its preferable if your code
works, but may be hacked to work more efficiently, than that it works
efficiently, but may need hacking to work correctly under all circumstances.

But fun as it is to think about what ought to be, i suppose the people who
do actually pour in the effort have thought about these things already. A
numpy 2.0 could probably borrow/integrate a lot from numexpr, I suppose.

By the way, the hierarchical summation would make it fairly easy to erase
(and in any case would minimize) summation differences due to differences
between logical and actual ordering in memory of the data, no?
Post by Sebastian Berg
Post by Eelco Hoogendoorn
Those are good points. Indeed iteration order may already produce
different results, even though the semantics of numpy suggest
identical operations. Still, I feel this different behavior without
any semantical clues is something to be minimized.
Indeed copying might have large speed implications. But on second
thought, does it? Either the data is already aligned and no copy is
required, or it isn't aligned, and we need one pass of cache
inefficient access to the data anyway. Infact, if we had one low level
function which does cache-intelligent transposition of numpy arrays
(using some block strategy), this might be faster even than the
current simple reduction operations when forced to work on awkwardly
aligned data. Ideally, this intelligent access and intelligent
reduction would be part of a single pass of course; but that wouldn't
really fit within the numpy design, and merely such an intelligent
transpose would provide most of the benefit I think. Or is the
mechanism behind ascontiguousarray already intelligent in this sense?
The iterator is currently smart in the sense that it will (obviously low
level), do something like [1]. Most things in numpy use that iterator,
ascontiguousarray does so as well. Such a blocked cache aware iterator
is what I mean by, someone who knows it would have to spend a lot of
time on it :).
arr = np.ones((100, 100))
arr.sum(1)
res = np.zeros(100)
res += arr[i, :]
res[i] = arr[i, :].sum()
Post by Eelco Hoogendoorn
On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg
Post by Sturla Molden
Post by alex
Are you sure they always give different results? Notice
that
Post by Sturla Molden
Post by alex
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so
these
Post by Sturla Molden
Post by alex
differences 'cancel out'.
They will be if different algorithms are used.
np.ones((N,2)).mean(0)
Post by Sturla Molden
will have larger accumulated rounding error than
np.ones((2,N)).mean(1),
Post by Sturla Molden
if only the latter uses the divide-and-conquer summation.
What I wanted to point out is that to some extend the
algorithm does not
matter. You will not necessarily get identical results already if you
use a different iteration order, and we have been doing that for years
for speed reasons. All libs like BLAS do the same.
Yes, the new changes make this much more dramatic, but they only make
some paths much better, never worse. It might be dangerous, but only in
the sense that you test it with the good path and it works good enough,
but later (also) use the other one in some lib. I am not even sure if I
Post by Sturla Molden
I would suggest that in the first case we try to copy the
array to a
Post by Sturla Molden
temporary contiguous buffer and use the same
divide-and-conquer
Post by Sturla Molden
algorithm, unless some heuristics on memory usage fails.
Sure, but you have to make major changes to the buffered iterator to do
that without larger speed implications. It might be a good idea, but it
requires someone who knows this stuff to spend a lot of time
and care in
the depths of numpy.
Post by Sturla Molden
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Julian Taylor
2014-07-28 22:03:35 UTC
Permalink
Post by Eelco Hoogendoorn
I see, thanks for the clarification. Just for the sake of argument,
since unfortunately I don't have the time to go dig in the guts of numpy
myself: a design which always produces results of the same (high)
accuracy, but only optimizes the common access patterns in a hacky way,
and may be inefficient in case it needs to fall back on dumb iteration
or array copying, is the best compromise between features and the ever
limiting amount of time available, I would argue, no? Its preferable if
your code works, but may be hacked to work more efficiently, than that
it works efficiently, but may need hacking to work correctly under all
circumstances.
I don't see the inconsistency as such a big problem. If applications are
so sensitive to accurate summations over large uniform datasets they
will most likely implement their own algorithm instead of relying on the
black box in numpy (which never documented any accuracy bounds or used
algorithms on summation so far I know).
If they do they should add testsuites that will detect accidental use
the less accurate path in numpy and fix it before they even release.

General purpose libraries that may not be able to test every input third
party users may give them usually don't have the luxury of only
supporting the latest version of numpy to have pairwise summation
guaranteed in the first place, so they would just have to implement
their own algorithms anyway.
Post by Eelco Hoogendoorn
But fun as it is to think about what ought to be, i suppose the people
who do actually pour in the effort have thought about these things
already. A numpy 2.0 could probably borrow/integrate a lot from numexpr,
I suppose.
By the way, the hierarchical summation would make it fairly easy to
erase (and in any case would minimize) summation differences due to
differences between logical and actual ordering in memory of the data, no?
On Mon, Jul 28, 2014 at 5:22 PM, Sebastian Berg
Post by Eelco Hoogendoorn
Those are good points. Indeed iteration order may already produce
different results, even though the semantics of numpy suggest
identical operations. Still, I feel this different behavior without
any semantical clues is something to be minimized.
Indeed copying might have large speed implications. But on second
thought, does it? Either the data is already aligned and no copy is
required, or it isn't aligned, and we need one pass of cache
inefficient access to the data anyway. Infact, if we had one low level
function which does cache-intelligent transposition of numpy arrays
(using some block strategy), this might be faster even than the
current simple reduction operations when forced to work on awkwardly
aligned data. Ideally, this intelligent access and intelligent
reduction would be part of a single pass of course; but that wouldn't
really fit within the numpy design, and merely such an intelligent
transpose would provide most of the benefit I think. Or is the
mechanism behind ascontiguousarray already intelligent in this sense?
The iterator is currently smart in the sense that it will (obviously low
level), do something like [1]. Most things in numpy use that iterator,
ascontiguousarray does so as well. Such a blocked cache aware iterator
is what I mean by, someone who knows it would have to spend a lot of
time on it :).
arr = np.ones((100, 100))
arr.sum(1)
res = np.zeros(100)
res += arr[i, :]
res[i] = arr[i, :].sum()
Post by Eelco Hoogendoorn
On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg
Post by Sturla Molden
Post by alex
Are you sure they always give different results? Notice
that
Post by Sturla Molden
Post by alex
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so
these
Post by Sturla Molden
Post by alex
differences 'cancel out'.
They will be if different algorithms are used.
np.ones((N,2)).mean(0)
Post by Sturla Molden
will have larger accumulated rounding error than
np.ones((2,N)).mean(1),
Post by Sturla Molden
if only the latter uses the divide-and-conquer summation.
What I wanted to point out is that to some extend the
algorithm does not
matter. You will not necessarily get identical results already
if you
use a different iteration order, and we have been doing that
for years
for speed reasons. All libs like BLAS do the same.
Yes, the new changes make this much more dramatic, but they
only make
some paths much better, never worse. It might be dangerous,
but only in
the sense that you test it with the good path and it works
good enough,
but later (also) use the other one in some lib. I am not even
sure if I
Post by Sturla Molden
I would suggest that in the first case we try to copy the
array to a
Post by Sturla Molden
temporary contiguous buffer and use the same
divide-and-conquer
Post by Sturla Molden
algorithm, unless some heuristics on memory usage fails.
Sure, but you have to make major changes to the buffered
iterator to do
that without larger speed implications. It might be a good
idea, but it
requires someone who knows this stuff to spend a lot of time
and care in
the depths of numpy.
Post by Sturla Molden
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Daπid
2014-07-28 13:30:24 UTC
Permalink
Post by Sebastian Berg
Post by Eelco Hoogendoorn
To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
np.ones((2,N)).mean(1) produce different results with the
implementation in the current master? If so, I think that would be
very much regrettable; and if this is a minority opinion, I do hope
that at least this gets documented in a most explicit manner.
This will always give you different results. Though in master. the
difference is more likely to be large, since (often the second one)
maybe be less likely to run into bigger numerical issues.
An example using float16 on Numpy 1.8.1 (I haven't seen diferences with
float32):

for N in np.logspace(2, 6):
print N, (np.ones((N,2), dtype=np.float16).mean(0), np.ones((2,N),
dtype=np.float16).mean(1))

The first one gives correct results up to 2049, from where the values start
to fall. The second one, on the other hand, gives correct results up to
65519, where it blows to infinity.

Interestingly, in the second case there are fluctuations. For example, for
N = 65424, the mean is 0.99951172, but 1 for the next and previous numbers.
But I think they are just an effect of the rounding, as:

In [33]: np.ones(N+1, dtype=np.float16).sum() - N
Out[33]: 16.0

In [35]: np.ones(N+1, dtype=np.float16).sum() - (N +1)
Out[35]: 15.0

In [36]: np.ones(N-1, dtype=np.float16).sum() - (N -1)
Out[36]: -15.0

In [37]: N = 65519 - 20

In [38]: np.ones(N, dtype=np.float16).sum() - N
Out[38]: 5.0

In [39]: np.ones(N+1, dtype=np.float16).sum() - (N +1)
Out[39]: 4.0

In [40]: np.ones(N-1, dtype=np.float16).sum() - (N -1)
Out[40]: 6.0
Fabien
2014-07-28 13:50:50 UTC
Permalink
Post by Daπid
An example using float16 on Numpy 1.8.1 (I haven't seen diferences with
Why aren't there differences between float16 and float32 ?

Could this be related to my earlier post in this thread where I
mentioned summation problems occurring much earlier in numpy than in IDL?

Fabien
Sebastian Berg
2014-07-28 14:08:39 UTC
Permalink
Post by Fabien
Post by Daπid
An example using float16 on Numpy 1.8.1 (I haven't seen diferences with
Why aren't there differences between float16 and float32 ?
float16 calculations are actually float32 calculations. If done along
the fast axis they will not get rounded in between (within those 8192
elements chunks). Basically something like the difference we are talking
about for float32 and float64 has for years existed in float16.
Post by Fabien
Could this be related to my earlier post in this thread where I
mentioned summation problems occurring much earlier in numpy than in IDL?
Fabien
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Eelco Hoogendoorn
2014-07-26 14:10:50 UTC
Permalink
A context manager makes sense.

I very much appreciate the time constraints and the effort put in this far,
but if we can not make something work uniformly, I wonder if we should
include it in the master at all. I don't have a problem with customizing
algorithms where fp accuracy demands it; I have more of a problem with hard
to predict behavior. If np.ones(bigN).sum() gives different results than
np.ones((bigN,2)).sum(0), aside from the obvious differences, that would be
one hard to catch source of bugs.

Wouldn't per-axis reduction, as a limited form of nested reduction, provide
most of the benefits, without any of the drawbacks?


On Sat, Jul 26, 2014 at 3:53 PM, Julian Taylor <
Post by Julian Taylor
Post by Sturla Molden
Why is it not always used?
for 1d reduction the iterator blocks by 8192 elements even when no
buffering is required. There is a TODO in the source to fix that by
adding additional checks. Unfortunately nobody knows hat these
additional tests would need to be and Mark Wiebe who wrote it did not
reply to a ping yet.
Also along the non-fast axes the iterator optimizes the reduction to
https://github.com/numpy/numpy/pull/4697#issuecomment-42752599
Instead of having a keyword argument to mean I would prefer a context
manager that changes algorithms for different requirements.
This would easily allow changing the accuracy and performance of third
party functions using numpy without changing the third party library as
long as they are using numpy as the base.
E.g.
scipy.stats.nanmean(d)
We also have case where numpy uses algorithms that are far more precise
than most people needs them. E.g. np.hypot and the related complex
absolute value and division.
These are very slow with glibc as it provides 1ulp accuracy, this is
hardly ever needed.
Another case that could use dynamic changing is flushing subnormals to zero.
But this api is like Nathaniels parameterizable dtypes just an idea
floating in my head which needs proper design and implementation written
down. The issue is as usual ENOTIME.
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2014-07-26 13:58:51 UTC
Permalink
Post by Eelco Hoogendoorn
I was wondering the same thing. Are there any known tradeoffs to this
method of reduction?
Yes, it is much more complicated and incompatible with naive ufuncs if
you want your memory access to be optimized. And optimizing that is very
much worth it speed wise...

- Sebastian
Post by Eelco Hoogendoorn
On Sat, Jul 26, 2014 at 12:39 PM, Sturla Molden
Post by Sebastian Berg
chose more stable algorithms for such statistical functions.
The
Post by Sebastian Berg
pairwise summation that is in master now is very awesome,
but it is not
Post by Sebastian Berg
secure enough in the sense that a new user will have
difficulty
Post by Sebastian Berg
understanding when he can be sure it is used.
Why is it not always used?
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sturla Molden
2014-07-26 15:11:09 UTC
Permalink
Post by Sebastian Berg
Yes, it is much more complicated and incompatible with naive ufuncs if
you want your memory access to be optimized. And optimizing that is very
much worth it speed wise...
Why? Couldn't we just copy the data chunk-wise to a temporary buffer of say
2**13 numbers and then reduce that? I don't see why we need another
iterator for that.

Sturla
Sturla Molden
2014-07-26 16:34:10 UTC
Permalink
Post by Sturla Molden
Post by Sebastian Berg
Yes, it is much more complicated and incompatible with naive ufuncs if
you want your memory access to be optimized. And optimizing that is very
much worth it speed wise...
Why? Couldn't we just copy the data chunk-wise to a temporary buffer of say
2**13 numbers and then reduce that? I don't see why we need another
iterator for that.
I am sorry if this is a stupid suggestion. My knowledge of how NumPy ufuncs
works could have been better.

Sturla
Eelco Hoogendoorn
2014-07-26 19:19:59 UTC
Permalink
Perhaps I in turn am missing something; but I would suppose that any
algorithm that requires multiple passes over the data is off the table?
Perhaps I am being a little old fashioned and performance oriented here,
but to make the ultra-majority of use cases suffer a factor two performance
penalty for an odd use case which already has a plethora of fine and dandy
solutions? Id vote against, fwiw...
Post by Sturla Molden
Post by Sturla Molden
Post by Sebastian Berg
Yes, it is much more complicated and incompatible with naive ufuncs if
you want your memory access to be optimized. And optimizing that is very
much worth it speed wise...
Why? Couldn't we just copy the data chunk-wise to a temporary buffer of
say
Post by Sturla Molden
2**13 numbers and then reduce that? I don't see why we need another
iterator for that.
I am sorry if this is a stupid suggestion. My knowledge of how NumPy ufuncs
works could have been better.
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sylvain Corlay
2014-07-26 19:30:06 UTC
Permalink
I completely agree with Eelco. I expect numpy.mean to do something
simple and straightforward. If the naive method is not well suited for
my data, I can deal with it and have my own ad hoc method.

On Sat, Jul 26, 2014 at 3:19 PM, Eelco Hoogendoorn
Post by Eelco Hoogendoorn
Perhaps I in turn am missing something; but I would suppose that any
algorithm that requires multiple passes over the data is off the table?
Perhaps I am being a little old fashioned and performance oriented here, but
to make the ultra-majority of use cases suffer a factor two performance
penalty for an odd use case which already has a plethora of fine and dandy
solutions? Id vote against, fwiw...
Post by Sturla Molden
Post by Sturla Molden
Post by Sebastian Berg
Yes, it is much more complicated and incompatible with naive ufuncs if
you want your memory access to be optimized. And optimizing that is very
much worth it speed wise...
Why? Couldn't we just copy the data chunk-wise to a temporary buffer of say
2**13 numbers and then reduce that? I don't see why we need another
iterator for that.
I am sorry if this is a stupid suggestion. My knowledge of how NumPy ufuncs
works could have been better.
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Eelco Hoogendoorn
2014-07-25 21:51:40 UTC
Permalink
Ray: I'm not working with Hubble data, but yeah these are all issues I've run into with my terrabytes of microscopy data as well. Given that such raw data comes as uint16, its best to do your calculations as much as possible in good old ints. What you compute is what you get, no obscure shenanigans.

It just occurred to me that pairwise summation will lead to highly branchy code, and you can forget about any vector extensions. Tradeoffs indeed. Any such hierarchical summation is probably best done by aggregating naively summed blocks.

-----Original Message-----
From: "RayS" <***@blue-cove.com>
Sent: ‎25-‎7-‎2014 23:26
To: "Discussion of Numerical Python" <numpy-***@scipy.org>
Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
Post by Eelco Hoogendoorn
Post by RayS
The important point was that it would be best if all of the
methods affected
Post by RayS
by summing 32 bit floats with 32 bit accumulators had the same Notes as
numpy.mean(). We went through a lot of code yesterday, assuming that any
numpy or Scipy.stats functions that use accumulators suffer the same issue,
whether noted or not, and found it true.
Do you have a list of the functions that are affected?
We only tested a few we used, but
scipy.stats.nanmean, or any .*mean()
numpy.sum, mean, average, std, var,...

via something like:

import numpy
import scipy.stats
print numpy.__version__
print scipy.__version__
onez = numpy.ones((2**25, 1), numpy.float32)
step = 2**10
func = scipy.stats.nanmean
for s in range(2**24-step, 2**25, step):
if func(onez[:s+step])!=1.:
print '\nbroke', s, func(onez[:s+step])
break
else:
print '\r',s,
Post by Eelco Hoogendoorn
That said, it does seem that np.mean could be implemented better than
it is, even given float32's inherent limitations. If anyone wants to
implement better algorithms for computing the mean, variance, sums,
etc., then we would love to add them to numpy.
Others have pointed out the possible tradeoffs in summation algos,
perhaps a method arg would be appropriate, "better" depending on your
desire for speed vs. accuracy.

It just occurred to me that if the STSI folks (who count photons)
took the mean() or other such func of an image array from Hubble
sensors to find background value, they'd better always be using float64.

- Ray
Julian Taylor
2014-07-25 21:57:51 UTC
Permalink
Post by Eelco Hoogendoorn
Ray: I'm not working with Hubble data, but yeah these are all issues
I've run into with my terrabytes of microscopy data as well. Given that
such raw data comes as uint16, its best to do your calculations as much
as possible in good old ints. What you compute is what you get, no
obscure shenanigans.
integers are dangerous too, they overflow quickly and signed overflow is
even undefined in C the standard.
Post by Eelco Hoogendoorn
It just occurred to me that pairwise summation will lead to highly
branchy code, and you can forget about any vector extensions. Tradeoffs
indeed. Any such hierarchical summation is probably best done by
aggregating naively summed blocks.
pairwise summation is usually implemented with a naive sum cutoff large
enough so the recursion does not matter much.
In numpy 1.9 this cutoff is 128 elements, but the inner loop is unrolled
8 times which makes it effectively 16 elements.
the unrolling factor of 8 was intentionally chosen to allow using AVX in
the inner loop without changing the summation ordering, but last I
tested actually using AVX here only gave mediocre speedups (10%-20% on
an i5).
Post by Eelco Hoogendoorn
------------------------------------------------------------------------
Sent: ‎25-‎7-‎2014 23:26
Subject: Re: [Numpy-discussion] numpy.mean still broken for
largefloat32arrays
Post by Eelco Hoogendoorn
Post by RayS
The important point was that it would be best if all of the
methods affected
Post by RayS
by summing 32 bit floats with 32 bit accumulators had the same Notes as
numpy.mean(). We went through a lot of code yesterday, assuming that any
numpy or Scipy.stats functions that use accumulators suffer the same
issue,
Post by Eelco Hoogendoorn
Post by RayS
whether noted or not, and found it true.
Do you have a list of the functions that are affected?
We only tested a few we used, but
scipy.stats.nanmean, or any .*mean()
numpy.sum, mean, average, std, var,...
import numpy
import scipy.stats
print numpy.__version__
print scipy.__version__
onez = numpy.ones((2**25, 1), numpy.float32)
step = 2**10
func = scipy.stats.nanmean
print '\nbroke', s, func(onez[:s+step])
break
print '\r',s,
Post by Eelco Hoogendoorn
That said, it does seem that np.mean could be implemented better than
it is, even given float32's inherent limitations. If anyone wants to
implement better algorithms for computing the mean, variance, sums,
etc., then we would love to add them to numpy.
Others have pointed out the possible tradeoffs in summation algos,
perhaps a method arg would be appropriate, "better" depending on your
desire for speed vs. accuracy.
It just occurred to me that if the STSI folks (who count photons)
took the mean() or other such func of an image array from Hubble
sensors to find background value, they'd better always be using float64.
- Ray
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Eelco Hoogendoorn
2014-07-26 09:05:08 UTC
Permalink
Cool, sounds like great improvements. I can imagine that after some loop unrolling one becomes memory bound pretty soon. Is the summation guaranteed to traverse the data in its natural order? And do you happen to know what the rules for choosing accumulator dtypes are?

-----Original Message-----
From: "Julian Taylor" <***@googlemail.com>
Sent: ‎26-‎7-‎2014 00:58
To: "Discussion of Numerical Python" <numpy-***@scipy.org>
Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
Post by Eelco Hoogendoorn
Ray: I'm not working with Hubble data, but yeah these are all issues
I've run into with my terrabytes of microscopy data as well. Given that
such raw data comes as uint16, its best to do your calculations as much
as possible in good old ints. What you compute is what you get, no
obscure shenanigans.
integers are dangerous too, they overflow quickly and signed overflow is
even undefined in C the standard.
Post by Eelco Hoogendoorn
It just occurred to me that pairwise summation will lead to highly
branchy code, and you can forget about any vector extensions. Tradeoffs
indeed. Any such hierarchical summation is probably best done by
aggregating naively summed blocks.
pairwise summation is usually implemented with a naive sum cutoff large
enough so the recursion does not matter much.
In numpy 1.9 this cutoff is 128 elements, but the inner loop is unrolled
8 times which makes it effectively 16 elements.
the unrolling factor of 8 was intentionally chosen to allow using AVX in
the inner loop without changing the summation ordering, but last I
tested actually using AVX here only gave mediocre speedups (10%-20% on
an i5).
Post by Eelco Hoogendoorn
------------------------------------------------------------------------
Sent: ‎25-‎7-‎2014 23:26
Subject: Re: [Numpy-discussion] numpy.mean still broken for
largefloat32arrays
Post by Eelco Hoogendoorn
Post by RayS
The important point was that it would be best if all of the
methods affected
Post by RayS
by summing 32 bit floats with 32 bit accumulators had the same Notes as
numpy.mean(). We went through a lot of code yesterday, assuming that any
numpy or Scipy.stats functions that use accumulators suffer the same
issue,
Post by Eelco Hoogendoorn
Post by RayS
whether noted or not, and found it true.
Do you have a list of the functions that are affected?
We only tested a few we used, but
scipy.stats.nanmean, or any .*mean()
numpy.sum, mean, average, std, var,...
import numpy
import scipy.stats
print numpy.__version__
print scipy.__version__
onez = numpy.ones((2**25, 1), numpy.float32)
step = 2**10
func = scipy.stats.nanmean
print '\nbroke', s, func(onez[:s+step])
break
print '\r',s,
Post by Eelco Hoogendoorn
That said, it does seem that np.mean could be implemented better than
it is, even given float32's inherent limitations. If anyone wants to
implement better algorithms for computing the mean, variance, sums,
etc., then we would love to add them to numpy.
Others have pointed out the possible tradeoffs in summation algos,
perhaps a method arg would be appropriate, "better" depending on your
desire for speed vs. accuracy.
It just occurred to me that if the STSI folks (who count photons)
took the mean() or other such func of an image array from Hubble
sensors to find background value, they'd better always be using float64.
- Ray
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Colin J. Williams
2014-07-29 12:24:41 UTC
Permalink
This version of Numpy does not appear to be available as an installable binary. In any event, the LAPACK and other packages do not seem to be available with the installable versions.

I understand that Windows Studio 2008 is normally used for Windows compiling. Unfortunately, this is no longer available from Microsoft. The link is replaced by a Power Point presentation.

Can anyone suggest an alternative compiler/linker?

Colin W.
Olivier Grisel
2014-07-29 12:50:12 UTC
Permalink
Post by Colin J. Williams
This version of Numpy does not appear to be available as an installable binary. In any event, the LAPACK and other packages do not seem to be available with the installable versions.
I understand that Windows Studio 2008 is normally used for Windows compiling. Unfortunately, this is no longer available from Microsoft. The link is replaced by a Power Point presentation.
Can anyone suggest an alternative compiler/linker?
The web installers for MSVC Express 2008 is still online at:
http://go.microsoft.com/?linkid=7729279

FYI I recently update the scikit-learn documentation for building
under windows, both for Python 2 and Python 3 as well as 32 bit and 64
bit architectures:

http://scikit-learn.org/stable/install.html#building-on-windows

The same build environment should work for numpy (I think).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Colin J. Williams
2014-07-29 13:48:44 UTC
Permalink
Oliver,

Thanks. I've installed Windows Studio 2008 Express.

I'll read your building on Winods Document.

Colin W.
Post by Colin J. Williams
Post by Colin J. Williams
This version of Numpy does not appear to be available as an installable
binary. In any event, the LAPACK and other packages do not seem to be
available with the installable versions.
Post by Colin J. Williams
I understand that Windows Studio 2008 is normally used for Windows
compiling. Unfortunately, this is no longer available from Microsoft. The
link is replaced by a Power Point presentation.
Post by Colin J. Williams
Can anyone suggest an alternative compiler/linker?
http://go.microsoft.com/?linkid=7729279
FYI I recently update the scikit-learn documentation for building
under windows, both for Python 2 and Python 3 as well as 32 bit and 64
http://scikit-learn.org/stable/install.html#building-on-windows
The same build environment should work for numpy (I think).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...