Discussion:
[Numpy-discussion] Multi-distribution Linux wheels - please test
Matthew Brett
2016-02-06 20:26:34 UTC
Permalink
Hi,

As some of you may have seen, Robert McGibbon and Nathaniel have just
guided a PEP for multi-distribution Linux wheels past the approval
process over on distutils-sig:

https://www.python.org/dev/peps/pep-0513/

The PEP includes a docker image on which y'all can build wheels which
match the PEP:

https://quay.io/repository/manylinux/manylinux

Now we're at the stage where we need stress-testing of the built
wheels to find any problems we hadn't thought of.

I've built numpy and scipy wheels here:

https://nipy.bic.berkeley.edu/manylinux/

So, if you have a Linux distribution handy, we would love to hear from
you about the results of testing these guys, maybe on the lines of:

pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
python -c 'import numpy; numpy.test()'
python -c 'import scipy; scipy.test()'

These manylinux wheels should soon be available on pypi, and soon
after, installable with latest pip, so we would like to fix as many
problems as possible before going live.

Cheers,

Matthew
Nadav Horesh
2016-02-07 05:28:48 UTC
Permalink
Test platform: python 3.4.1 on archlinux x86_64

scipy test: OK

OK (KNOWNFAIL=97, SKIP=1626)


numpy tests: Failed on long double and int128 tests, and got one error:

Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/usr/lib/python3.5/site-packages/numpy/core/tests/test_longdouble.py", line 108, in test_fromstring_missing
np.array([1]))
File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 296, in assert_equal
return assert_array_equal(actual, desired, err_msg, verbose)
File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 787, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 668, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal

(shapes (6,), (1,) mismatch)
x: array([ 1., -1., 3., 4., 5., 6.])
y: array([1])

----------------------------------------------------------------------
Ran 6019 tests in 28.029s

FAILED (KNOWNFAIL=13, SKIP=12, errors=1, failures=18



________________________________________
From: NumPy-Discussion <numpy-discussion-***@scipy.org> on behalf of Matthew Brett <***@gmail.com>
Sent: 06 February 2016 22:26
To: Discussion of Numerical Python; SciPy Developers List
Subject: [Numpy-discussion] Multi-distribution Linux wheels - please test

Hi,

As some of you may have seen, Robert McGibbon and Nathaniel have just
guided a PEP for multi-distribution Linux wheels past the approval
process over on distutils-sig:

https://www.python.org/dev/peps/pep-0513/

The PEP includes a docker image on which y'all can build wheels which
match the PEP:

https://quay.io/repository/manylinux/manylinux

Now we're at the stage where we need stress-testing of the built
wheels to find any problems we hadn't thought of.

I've built numpy and scipy wheels here:

https://nipy.bic.berkeley.edu/manylinux/

So, if you have a Linux distribution handy, we would love to hear from
you about the results of testing these guys, maybe on the lines of:

pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
python -c 'import numpy; numpy.test()'
python -c 'import scipy; scipy.test()'

These manylinux wheels should soon be available on pypi, and soon
after, installable with latest pip, so we would like to fix as many
problems as possible before going live.

Cheers,

Matthew
Matthew Brett
2016-02-07 05:52:02 UTC
Permalink
On Sat, Feb 6, 2016 at 9:28 PM, Nadav Horesh <***@visionsense.com> wrote:
> Test platform: python 3.4.1 on archlinux x86_64
>
> scipy test: OK
>
> OK (KNOWNFAIL=97, SKIP=1626)
>
>
> numpy tests: Failed on long double and int128 tests, and got one error:
>
> Traceback (most recent call last):
> File "/usr/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
> self.test(*self.arg)
> File "/usr/lib/python3.5/site-packages/numpy/core/tests/test_longdouble.py", line 108, in test_fromstring_missing
> np.array([1]))
> File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 296, in assert_equal
> return assert_array_equal(actual, desired, err_msg, verbose)
> File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 787, in assert_array_equal
> verbose=verbose, header='Arrays are not equal')
> File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 668, in assert_array_compare
> raise AssertionError(msg)
> AssertionError:
> Arrays are not equal
>
> (shapes (6,), (1,) mismatch)
> x: array([ 1., -1., 3., 4., 5., 6.])
> y: array([1])
>
> ----------------------------------------------------------------------
> Ran 6019 tests in 28.029s
>
> FAILED (KNOWNFAIL=13, SKIP=12, errors=1, failures=18

Great - thanks so much for doing this.

Do you get a different error if you compile from source?

If you compile from source, do you link to OpenBLAS?

Thanks again,

Matthew
Nadav Horesh
2016-02-07 10:06:43 UTC
Permalink
The reult tests of numpy 1.10.4 installed from source:

OK (KNOWNFAIL=4, SKIP=6)


I think I use openblas, as it is installed instead the normal blas/cblas.

Nadav,
________________________________________
From: NumPy-Discussion <numpy-discussion-***@scipy.org> on behalf of Nadav Horesh <***@visionsense.com>
Sent: 07 February 2016 07:28
To: Discussion of Numerical Python; SciPy Developers List
Subject: Re: [Numpy-discussion] Multi-distribution Linux wheels - please test

Test platform: python 3.4.1 on archlinux x86_64

scipy test: OK

OK (KNOWNFAIL=97, SKIP=1626)


numpy tests: Failed on long double and int128 tests, and got one error:

Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/usr/lib/python3.5/site-packages/numpy/core/tests/test_longdouble.py", line 108, in test_fromstring_missing
np.array([1]))
File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 296, in assert_equal
return assert_array_equal(actual, desired, err_msg, verbose)
File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 787, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 668, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal

(shapes (6,), (1,) mismatch)
x: array([ 1., -1., 3., 4., 5., 6.])
y: array([1])

----------------------------------------------------------------------
Ran 6019 tests in 28.029s

FAILED (KNOWNFAIL=13, SKIP=12, errors=1, failures=18



________________________________________
From: NumPy-Discussion <numpy-discussion-***@scipy.org> on behalf of Matthew Brett <***@gmail.com>
Sent: 06 February 2016 22:26
To: Discussion of Numerical Python; SciPy Developers List
Subject: [Numpy-discussion] Multi-distribution Linux wheels - please test

Hi,

As some of you may have seen, Robert McGibbon and Nathaniel have just
guided a PEP for multi-distribution Linux wheels past the approval
process over on distutils-sig:

https://www.python.org/dev/peps/pep-0513/

The PEP includes a docker image on which y'all can build wheels which
match the PEP:

https://quay.io/repository/manylinux/manylinux

Now we're at the stage where we need stress-testing of the built
wheels to find any problems we hadn't thought of.

I've built numpy and scipy wheels here:

https://nipy.bic.berkeley.edu/manylinux/

So, if you have a Linux distribution handy, we would love to hear from
you about the results of testing these guys, maybe on the lines of:

pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
python -c 'import numpy; numpy.test()'
python -c 'import scipy; scipy.test()'

These manylinux wheels should soon be available on pypi, and soon
after, installable with latest pip, so we would like to fix as many
problems as possible before going live.

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list
NumPy-***@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Matthew Brett
2016-02-07 23:33:01 UTC
Permalink
Hi,

On Sun, Feb 7, 2016 at 2:06 AM, Nadav Horesh <***@visionsense.com> wrote:
> The reult tests of numpy 1.10.4 installed from source:
>
> OK (KNOWNFAIL=4, SKIP=6)
>
>
> I think I use openblas, as it is installed instead the normal blas/cblas.

Thanks again for the further tests.

What do you get for:

python -c 'import numpy; print(numpy.__config__.show())'

Matthew
Nadav Horesh
2016-02-08 06:09:56 UTC
Permalink
Thank you fo reminding me, it is OK now:
$ python -c 'import numpy; print(numpy.__config__.show())'

lapack_opt_info:
library_dirs = ['/usr/local/lib']
language = c
libraries = ['openblas']
define_macros = [('HAVE_CBLAS', None)]
blas_mkl_info:
NOT AVAILABLE
openblas_info:
library_dirs = ['/usr/local/lib']
language = c
libraries = ['openblas']
define_macros = [('HAVE_CBLAS', None)]
openblas_lapack_info:
library_dirs = ['/usr/local/lib']
language = c
libraries = ['openblas']
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
library_dirs = ['/usr/local/lib']
language = c
libraries = ['openblas']
define_macros = [('HAVE_CBLAS', None)]
None

I updated openblas to the latest version (0.2.15) and it pass the tests

Nadav.
________________________________________
From: NumPy-Discussion <numpy-discussion-***@scipy.org> on behalf of Matthew Brett <***@gmail.com>
Sent: 08 February 2016 01:33
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Multi-distribution Linux wheels - please test

Hi,

On Sun, Feb 7, 2016 at 2:06 AM, Nadav Horesh <***@visionsense.com> wrote:
> The reult tests of numpy 1.10.4 installed from source:
>
> OK (KNOWNFAIL=4, SKIP=6)
>
>
> I think I use openblas, as it is installed instead the normal blas/cblas.

Thanks again for the further tests.

What do you get for:

python -c 'import numpy; print(numpy.__config__.show())'

Matthew
Matthew Brett
2016-02-08 06:13:49 UTC
Permalink
On Sun, Feb 7, 2016 at 10:09 PM, Nadav Horesh <***@visionsense.com> wrote:
> Thank you fo reminding me, it is OK now:
> $ python -c 'import numpy; print(numpy.__config__.show())'
>
> lapack_opt_info:
> library_dirs = ['/usr/local/lib']
> language = c
> libraries = ['openblas']
> define_macros = [('HAVE_CBLAS', None)]
> blas_mkl_info:
> NOT AVAILABLE
> openblas_info:
> library_dirs = ['/usr/local/lib']
> language = c
> libraries = ['openblas']
> define_macros = [('HAVE_CBLAS', None)]
> openblas_lapack_info:
> library_dirs = ['/usr/local/lib']
> language = c
> libraries = ['openblas']
> define_macros = [('HAVE_CBLAS', None)]
> blas_opt_info:
> library_dirs = ['/usr/local/lib']
> language = c
> libraries = ['openblas']
> define_macros = [('HAVE_CBLAS', None)]
> None
>
> I updated openblas to the latest version (0.2.15) and it pass the tests

Oh dear - now I'm confused. So you installed the wheel, and tested
it, and it gave a test failure. Then you updated openblas using
pacman, and then reran the tests against the wheel numpy, and they
passed? That's a bit frightening - the wheel should only see its own
copy of openblas...

Thans for persisting,

Matthew
Nadav Horesh
2016-02-08 07:10:06 UTC
Permalink
I have atlas-lapack-base installed via pacman (required by sagemath). Since the numpy installation insisted on openblas on /usr/local, I got the openblas source-code and installed it on /usr/local.
BTW, I use 1.11b rather then 1.10.x since the 1.10 is very slow in handling recarrays. For the tests I am erasing the 1.11 installation, and installing the 1.10.4 wheel. I do verify that I have the right version before running the tests, but I am not sure if there no unnoticed side effects.

Would it help if I put a side the openblas installation and rerun the test?

Nadav
________________________________________
From: NumPy-Discussion <numpy-discussion-***@scipy.org> on behalf of Matthew Brett <***@gmail.com>
Sent: 08 February 2016 08:13
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Multi-distribution Linux wheels - please test

On Sun, Feb 7, 2016 at 10:09 PM, Nadav Horesh <***@visionsense.com> wrote:
> Thank you fo reminding me, it is OK now:
> $ python -c 'import numpy; print(numpy.__config__.show())'
>
> lapack_opt_info:
> library_dirs = ['/usr/local/lib']
> language = c
> libraries = ['openblas']
> define_macros = [('HAVE_CBLAS', None)]
> blas_mkl_info:
> NOT AVAILABLE
> openblas_info:
> library_dirs = ['/usr/local/lib']
> language = c
> libraries = ['openblas']
> define_macros = [('HAVE_CBLAS', None)]
> openblas_lapack_info:
> library_dirs = ['/usr/local/lib']
> language = c
> libraries = ['openblas']
> define_macros = [('HAVE_CBLAS', None)]
> blas_opt_info:
> library_dirs = ['/usr/local/lib']
> language = c
> libraries = ['openblas']
> define_macros = [('HAVE_CBLAS', None)]
> None
>
> I updated openblas to the latest version (0.2.15) and it pass the tests

Oh dear - now I'm confused. So you installed the wheel, and tested
it, and it gave a test failure. Then you updated openblas using
pacman, and then reran the tests against the wheel numpy, and they
passed? That's a bit frightening - the wheel should only see its own
copy of openblas...

Thans for persisting,

Matthew
Nathaniel Smith
2016-02-08 07:13:29 UTC
Permalink
(This is not relevant to the main topic of the thread, but FYI I think the
recarray issues are fixed in 1.10.4.)
On Feb 7, 2016 11:10 PM, "Nadav Horesh" <***@visionsense.com> wrote:

> I have atlas-lapack-base installed via pacman (required by sagemath).
> Since the numpy installation insisted on openblas on /usr/local, I got the
> openblas source-code and installed it on /usr/local.
> BTW, I use 1.11b rather then 1.10.x since the 1.10 is very slow in
> handling recarrays. For the tests I am erasing the 1.11 installation, and
> installing the 1.10.4 wheel. I do verify that I have the right version
> before running the tests, but I am not sure if there no unnoticed side
> effects.
>
> Would it help if I put a side the openblas installation and rerun the test?
>
> Nadav
> ________________________________________
> From: NumPy-Discussion <numpy-discussion-***@scipy.org> on behalf of
> Matthew Brett <***@gmail.com>
> Sent: 08 February 2016 08:13
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Multi-distribution Linux wheels - please
> test
>
> On Sun, Feb 7, 2016 at 10:09 PM, Nadav Horesh <***@visionsense.com>
> wrote:
> > Thank you fo reminding me, it is OK now:
> > $ python -c 'import numpy; print(numpy.__config__.show())'
> >
> > lapack_opt_info:
> > library_dirs = ['/usr/local/lib']
> > language = c
> > libraries = ['openblas']
> > define_macros = [('HAVE_CBLAS', None)]
> > blas_mkl_info:
> > NOT AVAILABLE
> > openblas_info:
> > library_dirs = ['/usr/local/lib']
> > language = c
> > libraries = ['openblas']
> > define_macros = [('HAVE_CBLAS', None)]
> > openblas_lapack_info:
> > library_dirs = ['/usr/local/lib']
> > language = c
> > libraries = ['openblas']
> > define_macros = [('HAVE_CBLAS', None)]
> > blas_opt_info:
> > library_dirs = ['/usr/local/lib']
> > language = c
> > libraries = ['openblas']
> > define_macros = [('HAVE_CBLAS', None)]
> > None
> >
> > I updated openblas to the latest version (0.2.15) and it pass the tests
>
> Oh dear - now I'm confused. So you installed the wheel, and tested
> it, and it gave a test failure. Then you updated openblas using
> pacman, and then reran the tests against the wheel numpy, and they
> passed? That's a bit frightening - the wheel should only see its own
> copy of openblas...
>
> Thans for persisting,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Matthew Brett
2016-02-08 07:48:27 UTC
Permalink
Hi Nadav,

On Sun, Feb 7, 2016 at 11:13 PM, Nathaniel Smith <***@pobox.com> wrote:
> (This is not relevant to the main topic of the thread, but FYI I think the
> recarray issues are fixed in 1.10.4.)
>
> On Feb 7, 2016 11:10 PM, "Nadav Horesh" <***@visionsense.com> wrote:
>>
>> I have atlas-lapack-base installed via pacman (required by sagemath).
>> Since the numpy installation insisted on openblas on /usr/local, I got the
>> openblas source-code and installed it on /usr/local.
>> BTW, I use 1.11b rather then 1.10.x since the 1.10 is very slow in
>> handling recarrays. For the tests I am erasing the 1.11 installation, and
>> installing the 1.10.4 wheel. I do verify that I have the right version
>> before running the tests, but I am not sure if there no unnoticed side
>> effects.
>>
>> Would it help if I put a side the openblas installation and rerun the
>> test?

Would you mind doing something like this, and posting the output?:

virtualenv test-manylinux
source test-manylinux/bin/activate
pip install -f https://nipy.bic.berkeley.edu/manylinux numpy==1.10.4 nose
python -c 'import numpy; numpy.test()'
python -c 'import numpy; print(numpy.__config__.show())'
deactivate

virtualenv test-from-source
source test-from-source/bin/activate
pip install numpy==1.10.4 nose
python -c 'import numpy; numpy.test()'
python -c 'import numpy; print(numpy.__config__.show())'
deactivate

I'm puzzled that the wheel gives a test error when the source install
does not, and my best guess was an openblas problem, but this just to
make sure we have the output from the exact same numpy version, at
least.

Thanks again,

Matthew
Nathaniel Smith
2016-02-08 08:03:29 UTC
Permalink
On Feb 7, 2016 11:49 PM, "Matthew Brett" <***@gmail.com> wrote:
>
> Hi Nadav,
>
> On Sun, Feb 7, 2016 at 11:13 PM, Nathaniel Smith <***@pobox.com> wrote:
> > (This is not relevant to the main topic of the thread, but FYI I think
the
> > recarray issues are fixed in 1.10.4.)
> >
> > On Feb 7, 2016 11:10 PM, "Nadav Horesh" <***@visionsense.com> wrote:
> >>
> >> I have atlas-lapack-base installed via pacman (required by sagemath).
> >> Since the numpy installation insisted on openblas on /usr/local, I got
the
> >> openblas source-code and installed it on /usr/local.
> >> BTW, I use 1.11b rather then 1.10.x since the 1.10 is very slow in
> >> handling recarrays. For the tests I am erasing the 1.11 installation,
and
> >> installing the 1.10.4 wheel. I do verify that I have the right version
> >> before running the tests, but I am not sure if there no unnoticed side
> >> effects.
> >>
> >> Would it help if I put a side the openblas installation and rerun the
> >> test?
>
> Would you mind doing something like this, and posting the output?:
>
> virtualenv test-manylinux
> source test-manylinux/bin/activate
> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy==1.10.4 nose
> python -c 'import numpy; numpy.test()'
> python -c 'import numpy; print(numpy.__config__.show())'
> deactivate
>
> virtualenv test-from-source
> source test-from-source/bin/activate
> pip install numpy==1.10.4 nose
> python -c 'import numpy; numpy.test()'
> python -c 'import numpy; print(numpy.__config__.show())'
> deactivate
>
> I'm puzzled that the wheel gives a test error when the source install
> does not, and my best guess was an openblas problem, but this just to
> make sure we have the output from the exact same numpy version, at
> least.

It's hard to say without seeing the full output, but AFAICT the only
failures mentioned so far are in long double stuff, which shouldn't have
any connection to openblas at all?

-n
Olivier Grisel
2016-02-08 09:35:42 UTC
Permalink
I found another problem by running the tests of scikit-learn:

python3 -c "import numpy as np; from scipy import linalg;
linalg.eigh(np.random.randn(200, 200))"
Segmentation fault

Note that the following works:

python3 -c "import numpy as np; np.linalg.eigh(np.random.randn(200, 200))"

Also note that all scipy tests pass:

Ran 20180 tests in 366.163s
OK (KNOWNFAIL=97, SKIP=1657)

--
Olivier Grisel
Olivier Grisel
2016-02-08 15:19:59 UTC
Permalink
Note that the above segfault was found in a VM (docker-machine
virtualbox guest VM launched on a OSX host). The DYNAMIC_ARCH feature
of OpenBLAS detects an Sandybridge core (using
https://gist.github.com/ogrisel/ad4e547a32d0eb18b4ff).

Here are the flags of the CPU visible from inside the docker container:

cat /proc/cpuinfo | grep flags
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm
constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq monitor
ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx rdrand hypervisor
lahf_lm

If I fix the Nehalem kernel by setting the environment variable the
problem disappears:

OPENBLAS_CORETYPE=Nehalem python3 -c "import numpy as np; from scipy
import linalg; linalg.eigh(np.random.randn(200, 200))"

So this is an issue with the architecture detection of OpenBLAS.

--
Olivier
Daπid
2016-02-08 16:23:53 UTC
Permalink
On 8 February 2016 at 16:19, Olivier Grisel <***@ensta.org>
wrote:

>
>
> OPENBLAS_CORETYPE=Nehalem python3 -c "import numpy as np; from scipy
> import linalg; linalg.eigh(np.random.randn(200, 200))"
>
> So this is an issue with the architecture detection of OpenBLAS.


I am seeing the same problem on a native Linux box, with Ivy Bridge
processor (i5-3317U). According to your script, both my native openblas and
the one in the wheel recognises my CPU as Sandybridge, but the wheel
produces a segmentation fault. Setting the architecture to Nehalem works.
Julian Taylor
2016-02-08 16:40:02 UTC
Permalink
On 02/08/2016 05:23 PM, Daπid wrote:
>
> On 8 February 2016 at 16:19, Olivier Grisel <***@ensta.org
> <mailto:***@ensta.org>> wrote:
>
>
>
> OPENBLAS_CORETYPE=Nehalem python3 -c "import numpy as np; from scipy
> import linalg; linalg.eigh(np.random.randn(200, 200))"
>
> So this is an issue with the architecture detection of OpenBLAS.
>
>
> I am seeing the same problem on a native Linux box, with Ivy Bridge
> processor (i5-3317U). According to your script, both my native openblas
> and the one in the wheel recognises my CPU as Sandybridge, but the wheel
> produces a segmentation fault. Setting the architecture to Nehalem works.
>

more likely that is a bug the kernel of openblas instead of its cpu
detection.
The cpuinfo of Oliver indicates its at least a sandy bridge, and ivy
bridge is be sandy bridge compatible.
Is an up to date version of openblas used?
Matthew Brett
2016-02-08 19:25:11 UTC
Permalink
Hi Julian,

On Mon, Feb 8, 2016 at 8:40 AM, Julian Taylor
<***@googlemail.com> wrote:
> On 02/08/2016 05:23 PM, Daπid wrote:
>>
>> On 8 February 2016 at 16:19, Olivier Grisel <***@ensta.org
>> <mailto:***@ensta.org>> wrote:
>>
>>
>>
>> OPENBLAS_CORETYPE=Nehalem python3 -c "import numpy as np; from scipy
>> import linalg; linalg.eigh(np.random.randn(200, 200))"
>>
>> So this is an issue with the architecture detection of OpenBLAS.
>>
>>
>> I am seeing the same problem on a native Linux box, with Ivy Bridge
>> processor (i5-3317U). According to your script, both my native openblas
>> and the one in the wheel recognises my CPU as Sandybridge, but the wheel
>> produces a segmentation fault. Setting the architecture to Nehalem works.
>>
>
> more likely that is a bug the kernel of openblas instead of its cpu
> detection.
> The cpuinfo of Oliver indicates its at least a sandy bridge, and ivy
> bridge is be sandy bridge compatible.
> Is an up to date version of openblas used?

I used the latest release, v0.2.15:
https://github.com/matthew-brett/manylinux-builds/blob/master/build_openblas.sh#L5

Is there a later version that we should try?

Cheers,

Matthew
Daπid
2016-02-09 09:21:12 UTC
Permalink
On 8 February 2016 at 20:25, Matthew Brett <***@gmail.com> wrote:

>
> I used the latest release, v0.2.15:
>
> https://github.com/matthew-brett/manylinux-builds/blob/master/build_openblas.sh#L5
>
> Is there a later version that we should try?
>
> Cheers,
>

That is the one in the Fedora repos that is working for me. How are you
compiling it?

Mine is compiled with GCC 5 with the options seen in the source rpm:
http://koji.fedoraproject.org/koji/packageinfo?packageID=15277
Freddy Rietdijk
2016-02-09 20:01:11 UTC
Permalink
On Nix we also had trouble with OpenBLAS 0.2.15. Version 0.2.14 did not
cause any segmentation faults so we reverted to that version.
https://github.com/scipy/scipy/issues/5620

(hopefully this time the e-mail gets through)

On Tue, Feb 9, 2016 at 10:21 AM, Daπid <***@gmail.com> wrote:

> On 8 February 2016 at 20:25, Matthew Brett <***@gmail.com>
> wrote:
>
>>
>> I used the latest release, v0.2.15:
>>
>> https://github.com/matthew-brett/manylinux-builds/blob/master/build_openblas.sh#L5
>>
>> Is there a later version that we should try?
>>
>> Cheers,
>>
>
> That is the one in the Fedora repos that is working for me. How are you
> compiling it?
>
> Mine is compiled with GCC 5 with the options seen in the source rpm:
> http://koji.fedoraproject.org/koji/packageinfo?packageID=15277
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
Olivier Grisel
2016-02-08 08:09:36 UTC
Permalink
I used docker to run the numpy tests on base/archlinux. I had to
pacman -Sy python-pip openssl and gcc (required by one of the numpy
tests):

```
Ran 5621 tests in 34.482s
OK (KNOWNFAIL=4, SKIP=9)
```

Everything looks fine.

--
Olivier
Nadav Horesh
2016-02-09 14:07:35 UTC
Permalink
Do not know what happened --- all test passed, even when removed openblas (Nathaniel was right).

Manylinux config:

python -c 'import numpy; print(numpy.__config__.show())'
blas_opt_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas']
language = c
library_dirs = ['/usr/local/lib']
lapack_opt_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas']
language = c
library_dirs = ['/usr/local/lib']
blas_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas']
language = c
library_dirs = ['/usr/local/lib']
openblas_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas']
language = c
library_dirs = ['/usr/local/lib']
None


Source installtion:

python -c 'import numpy; print(numpy.__config__.show())'
openblas_info:
library_dirs = ['/usr/local/lib']
libraries = ['openblas', 'openblas']
language = c
runtime_library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
openblas_lapack_info:
library_dirs = ['/usr/local/lib']
libraries = ['openblas', 'openblas']
language = c
runtime_library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
extra_compile_args = ['-g -ftree-vectorize -mtune=native -march=native -O3']
runtime_library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas', 'atlas', 'f77blas', 'cblas', 'blas']
language = c
library_dirs = ['/usr/local/lib', '/usr/lib']
blas_mkl_info:
NOT AVAILABLE
blas_opt_info:
extra_compile_args = ['-g -ftree-vectorize -mtune=native -march=native -O3']
runtime_library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas', 'atlas', 'f77blas', 'cblas', 'blas']
language = c
library_dirs = ['/usr/local/lib', '/usr/lib']
None

________________________________________
From: NumPy-Discussion <numpy-discussion-***@scipy.org> on behalf of Matthew Brett <***@gmail.com>
Sent: 08 February 2016 09:48
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Multi-distribution Linux wheels - please test

Hi Nadav,

On Sun, Feb 7, 2016 at 11:13 PM, Nathaniel Smith <***@pobox.com> wrote:
> (This is not relevant to the main topic of the thread, but FYI I think the
> recarray issues are fixed in 1.10.4.)
>
> On Feb 7, 2016 11:10 PM, "Nadav Horesh" <***@visionsense.com> wrote:
>>
>> I have atlas-lapack-base installed via pacman (required by sagemath).
>> Since the numpy installation insisted on openblas on /usr/local, I got the
>> openblas source-code and installed it on /usr/local.
>> BTW, I use 1.11b rather then 1.10.x since the 1.10 is very slow in
>> handling recarrays. For the tests I am erasing the 1.11 installation, and
>> installing the 1.10.4 wheel. I do verify that I have the right version
>> before running the tests, but I am not sure if there no unnoticed side
>> effects.
>>
>> Would it help if I put a side the openblas installation and rerun the
>> test?

Would you mind doing something like this, and posting the output?:

virtualenv test-manylinux
source test-manylinux/bin/activate
pip install -f https://nipy.bic.berkeley.edu/manylinux numpy==1.10.4 nose
python -c 'import numpy; numpy.test()'
python -c 'import numpy; print(numpy.__config__.show())'
deactivate

virtualenv test-from-source
source test-from-source/bin/activate
pip install numpy==1.10.4 nose
python -c 'import numpy; numpy.test()'
python -c 'import numpy; print(numpy.__config__.show())'
deactivate

I'm puzzled that the wheel gives a test error when the source install
does not, and my best guess was an openblas problem, but this just to
make sure we have the output from the exact same numpy version, at
least.

Thanks again,

Matthew
Nathaniel Smith
2016-02-08 06:15:13 UTC
Permalink
On Sat, Feb 6, 2016 at 9:28 PM, Nadav Horesh <***@visionsense.com> wrote:
> Test platform: python 3.4.1 on archlinux x86_64
>
> scipy test: OK
>
> OK (KNOWNFAIL=97, SKIP=1626)
>
>
> numpy tests: Failed on long double and int128 tests, and got one error:

Could you post the complete output from the test suite somewhere?
(Maybe gist.github.com)

-n

--
Nathaniel J. Smith -- https://vorpus.org
Nathaniel Smith
2016-02-07 10:40:10 UTC
Permalink
On Feb 6, 2016 12:27 PM, "Matthew Brett" <***@gmail.com> wrote:
>
> Hi,
>
> As some of you may have seen, Robert McGibbon and Nathaniel have just
> guided a PEP for multi-distribution Linux wheels past the approval
> process over on distutils-sig:
>
> https://www.python.org/dev/peps/pep-0513/
>
> The PEP includes a docker image on which y'all can build wheels which
> match the PEP:
>
> https://quay.io/repository/manylinux/manylinux

This is the wrong repository :-) It moved, and there are two now:

quay.io/pypa/manylinux1_x86_64
quay.io/pypa/manylinux1_i686

-n
Nathaniel Smith
2016-02-08 00:17:46 UTC
Permalink
On Feb 7, 2016 15:27, "Charles R Harris" <***@gmail.com> wrote:
>
>
>
> On Sun, Feb 7, 2016 at 2:16 PM, Nathaniel Smith <***@pobox.com> wrote:
>>
>> On Sun, Feb 7, 2016 at 9:49 AM, Charles R Harris
>> <***@gmail.com> wrote:
>> >
>> >
>> > On Sun, Feb 7, 2016 at 3:40 AM, Nathaniel Smith <***@pobox.com> wrote:
>> >>
>> >> On Feb 6, 2016 12:27 PM, "Matthew Brett" <***@gmail.com>
wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > As some of you may have seen, Robert McGibbon and Nathaniel have
just
>> >> > guided a PEP for multi-distribution Linux wheels past the approval
>> >> > process over on distutils-sig:
>> >> >
>> >> > https://www.python.org/dev/peps/pep-0513/
>> >> >
>> >> > The PEP includes a docker image on which y'all can build wheels
which
>> >> > match the PEP:
>> >> >
>> >> > https://quay.io/repository/manylinux/manylinux
>> >>
>> >> This is the wrong repository :-) It moved, and there are two now:
>> >>
>> >> quay.io/pypa/manylinux1_x86_64
>> >> quay.io/pypa/manylinux1_i686
>> >
>> >
>> > I'm going to put out 1.11.0b3 today. What would be the best thing to
do for
>> > testing?
>>
>> I'd say, don't worry about building linux wheels as part of the
>> release cycle yet -- it'll still be a bit before they're allowed on
>> pypi or pip will recognize the new special tag. So for now you can
>> leave it to Matthew or someone to build test images and stick them up
>> on a server somewhere, same as before :-)
>
>
> Should I try putting the sources up on pypi?

+1

-n
Daπid
2016-02-08 11:29:58 UTC
Permalink
On 6 February 2016 at 21:26, Matthew Brett <***@gmail.com> wrote:

>
> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
> python -c 'import numpy; numpy.test()'
> python -c 'import scipy; scipy.test()'
>


All the tests pass on my Fedora 23 with Python 2.7, but it seems to be
linking to the system openblas:

numpy.show_config()
lapack_opt_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_opt_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_lapack_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_mkl_info:
NOT AVAILABLE

I can also reproduce Ogrisel's segfault.
Matthew Brett
2016-02-08 18:21:19 UTC
Permalink
Hi,

On Mon, Feb 8, 2016 at 3:29 AM, Daπid <***@gmail.com> wrote:
>
> On 6 February 2016 at 21:26, Matthew Brett <***@gmail.com> wrote:
>>
>>
>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
>> python -c 'import numpy; numpy.test()'
>> python -c 'import scipy; scipy.test()'
>
>
>
> All the tests pass on my Fedora 23 with Python 2.7, but it seems to be
> linking to the system openblas:
>
> numpy.show_config()
> lapack_opt_info:
> libraries = ['openblas']
> library_dirs = ['/usr/local/lib']
> define_macros = [('HAVE_CBLAS', None)]
> language = c

numpy.show_config() shows the places that numpy found the libraries at
build time. In the case of the manylinux wheel builds, I put openblas
at /usr/local , but the place the wheel should be loading openblas
from is <numpy-install-locatin>/.libs. For example, I think you'll
find that the numpy tests will still pass if you remove any openblas
installation at /usr/local .

Thanks for testing by the way,

Matthew
Evgeni Burovski
2016-02-08 18:41:01 UTC
Permalink
> numpy.show_config() shows the places that numpy found the libraries at
> build time. In the case of the manylinux wheel builds, I put openblas
> at /usr/local , but the place the wheel should be loading openblas
> from is <numpy-install-locatin>/.libs. For example, I think you'll
> find that the numpy tests will still pass if you remove any openblas
> installation at /usr/local .

Confirmed: I do not have openblas in that location, and tests sort of pass
(see a parallel email in this thread).

By the way, is there a chance you could use a more specific location ---
"What does your numpy.show_config() show?" is a question we often ask when
receiving bug reports; having a marker location could save us an iteration
when dealing with those when your wheels are common.
Matthew Brett
2016-02-08 18:47:01 UTC
Permalink
On Mon, Feb 8, 2016 at 10:41 AM, Evgeni Burovski
<***@gmail.com> wrote:
>
>> numpy.show_config() shows the places that numpy found the libraries at
>> build time. In the case of the manylinux wheel builds, I put openblas
>> at /usr/local , but the place the wheel should be loading openblas
>> from is <numpy-install-locatin>/.libs. For example, I think you'll
>> find that the numpy tests will still pass if you remove any openblas
>> installation at /usr/local .
>
> Confirmed: I do not have openblas in that location, and tests sort of pass
> (see a parallel email in this thread).
>
> By the way, is there a chance you could use a more specific location ---
> "What does your numpy.show_config() show?" is a question we often ask when
> receiving bug reports; having a marker location could save us an iteration
> when dealing with those when your wheels are common.

That's a good idea.

Matthew
Evgeni Burovski
2016-02-08 11:57:48 UTC
Permalink
---------- Forwarded message ----------
From: Evgeni Burovski <***@gmail.com>
Date: Mon, Feb 8, 2016 at 11:56 AM
Subject: Re: [Numpy-discussion] Multi-distribution Linux wheels - please test
To: Discussion of Numerical Python <numpy-***@scipy.org>


On Sat, Feb 6, 2016 at 8:26 PM, Matthew Brett <***@gmail.com> wrote:
> Hi,
>
> As some of you may have seen, Robert McGibbon and Nathaniel have just
> guided a PEP for multi-distribution Linux wheels past the approval
> process over on distutils-sig:
>
> https://www.python.org/dev/peps/pep-0513/
>
> The PEP includes a docker image on which y'all can build wheels which
> match the PEP:
>
> https://quay.io/repository/manylinux/manylinux
>
> Now we're at the stage where we need stress-testing of the built
> wheels to find any problems we hadn't thought of.
>
> I've built numpy and scipy wheels here:
>
> https://nipy.bic.berkeley.edu/manylinux/
>
> So, if you have a Linux distribution handy, we would love to hear from
> you about the results of testing these guys, maybe on the lines of:
>
> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
> python -c 'import numpy; numpy.test()'
> python -c 'import scipy; scipy.test()'
>
> These manylinux wheels should soon be available on pypi, and soon
> after, installable with latest pip, so we would like to fix as many
> problems as possible before going live.
>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



Hi,

Bog-standard Ubuntu 12.04, fresh virtualenv:

Python 2.7.3 (default, Jun 22 2015, 19:33:41)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.10.4'
>>> numpy.test()
Running unit tests for numpy
NumPy version 1.10.4
NumPy relaxed strides checking option: False
NumPy is installed in
/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy
Python version 2.7.3 (default, Jun 22 2015, 19:33:41) [GCC 4.6.3]
nose version 1.3.7

<snip>

======================================================================
ERROR: test_multiarray.TestNewBufferProtocol.test_relaxed_strides
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/case.py",
line 197, in runTest
self.test(*self.arg)
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py",
line 5366, in test_relaxed_strides
fd.write(c.data)
TypeError: 'buffer' does not have the buffer interface

----------------------------------------------------------------------


* Scipy tests pass with one error in TestNanFuncs, but the interpreter
crashes immediately afterwards.


Same machine, python 3.5: both numpy and scipy tests pass.
Matthew Brett
2016-02-08 18:23:17 UTC
Permalink
On Mon, Feb 8, 2016 at 3:57 AM, Evgeni Burovski
<***@gmail.com> wrote:
> ---------- Forwarded message ----------
> From: Evgeni Burovski <***@gmail.com>
> Date: Mon, Feb 8, 2016 at 11:56 AM
> Subject: Re: [Numpy-discussion] Multi-distribution Linux wheels - please test
> To: Discussion of Numerical Python <numpy-***@scipy.org>
>
>
> On Sat, Feb 6, 2016 at 8:26 PM, Matthew Brett <***@gmail.com> wrote:
>> Hi,
>>
>> As some of you may have seen, Robert McGibbon and Nathaniel have just
>> guided a PEP for multi-distribution Linux wheels past the approval
>> process over on distutils-sig:
>>
>> https://www.python.org/dev/peps/pep-0513/
>>
>> The PEP includes a docker image on which y'all can build wheels which
>> match the PEP:
>>
>> https://quay.io/repository/manylinux/manylinux
>>
>> Now we're at the stage where we need stress-testing of the built
>> wheels to find any problems we hadn't thought of.
>>
>> I've built numpy and scipy wheels here:
>>
>> https://nipy.bic.berkeley.edu/manylinux/
>>
>> So, if you have a Linux distribution handy, we would love to hear from
>> you about the results of testing these guys, maybe on the lines of:
>>
>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
>> python -c 'import numpy; numpy.test()'
>> python -c 'import scipy; scipy.test()'
>>
>> These manylinux wheels should soon be available on pypi, and soon
>> after, installable with latest pip, so we would like to fix as many
>> problems as possible before going live.
>>
>> Cheers,
>>
>> Matthew
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-***@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> Hi,
>
> Bog-standard Ubuntu 12.04, fresh virtualenv:
>
> Python 2.7.3 (default, Jun 22 2015, 19:33:41)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import numpy
>>>> numpy.__version__
> '1.10.4'
>>>> numpy.test()
> Running unit tests for numpy
> NumPy version 1.10.4
> NumPy relaxed strides checking option: False
> NumPy is installed in
> /home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy
> Python version 2.7.3 (default, Jun 22 2015, 19:33:41) [GCC 4.6.3]
> nose version 1.3.7
>
> <snip>
>
> ======================================================================
> ERROR: test_multiarray.TestNewBufferProtocol.test_relaxed_strides
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/case.py",
> line 197, in runTest
> self.test(*self.arg)
> File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py",
> line 5366, in test_relaxed_strides
> fd.write(c.data)
> TypeError: 'buffer' does not have the buffer interface
>
> ----------------------------------------------------------------------
>
>
> * Scipy tests pass with one error in TestNanFuncs, but the interpreter
> crashes immediately afterwards.
>
>
> Same machine, python 3.5: both numpy and scipy tests pass.

Ouch - great that you found these, I'll take a look,

Matthew
Matthew Brett
2016-02-09 00:37:09 UTC
Permalink
On Mon, Feb 8, 2016 at 10:23 AM, Matthew Brett <***@gmail.com> wrote:
> On Mon, Feb 8, 2016 at 3:57 AM, Evgeni Burovski
> <***@gmail.com> wrote:
>> ---------- Forwarded message ----------
>> From: Evgeni Burovski <***@gmail.com>
>> Date: Mon, Feb 8, 2016 at 11:56 AM
>> Subject: Re: [Numpy-discussion] Multi-distribution Linux wheels - please test
>> To: Discussion of Numerical Python <numpy-***@scipy.org>
>>
>>
>> On Sat, Feb 6, 2016 at 8:26 PM, Matthew Brett <***@gmail.com> wrote:
>>> Hi,
>>>
>>> As some of you may have seen, Robert McGibbon and Nathaniel have just
>>> guided a PEP for multi-distribution Linux wheels past the approval
>>> process over on distutils-sig:
>>>
>>> https://www.python.org/dev/peps/pep-0513/
>>>
>>> The PEP includes a docker image on which y'all can build wheels which
>>> match the PEP:
>>>
>>> https://quay.io/repository/manylinux/manylinux
>>>
>>> Now we're at the stage where we need stress-testing of the built
>>> wheels to find any problems we hadn't thought of.
>>>
>>> I've built numpy and scipy wheels here:
>>>
>>> https://nipy.bic.berkeley.edu/manylinux/
>>>
>>> So, if you have a Linux distribution handy, we would love to hear from
>>> you about the results of testing these guys, maybe on the lines of:
>>>
>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
>>> python -c 'import numpy; numpy.test()'
>>> python -c 'import scipy; scipy.test()'
>>>
>>> These manylinux wheels should soon be available on pypi, and soon
>>> after, installable with latest pip, so we would like to fix as many
>>> problems as possible before going live.
>>>
>>> Cheers,
>>>
>>> Matthew
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-***@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> Hi,
>>
>> Bog-standard Ubuntu 12.04, fresh virtualenv:
>>
>> Python 2.7.3 (default, Jun 22 2015, 19:33:41)
>> [GCC 4.6.3] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import numpy
>>>>> numpy.__version__
>> '1.10.4'
>>>>> numpy.test()
>> Running unit tests for numpy
>> NumPy version 1.10.4
>> NumPy relaxed strides checking option: False
>> NumPy is installed in
>> /home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy
>> Python version 2.7.3 (default, Jun 22 2015, 19:33:41) [GCC 4.6.3]
>> nose version 1.3.7
>>
>> <snip>
>>
>> ======================================================================
>> ERROR: test_multiarray.TestNewBufferProtocol.test_relaxed_strides
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>> File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/case.py",
>> line 197, in runTest
>> self.test(*self.arg)
>> File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py",
>> line 5366, in test_relaxed_strides
>> fd.write(c.data)
>> TypeError: 'buffer' does not have the buffer interface
>>
>> ----------------------------------------------------------------------
>>
>>
>> * Scipy tests pass with one error in TestNanFuncs, but the interpreter
>> crashes immediately afterwards.
>>
>>
>> Same machine, python 3.5: both numpy and scipy tests pass.
>
> Ouch - great that you found these, I'll take a look,

I think these are problems with numpy and Python 2.7.3 - because I got
the same "TypeError: 'buffer' does not have the buffer interface" on
numpy with OS X with Python.org python 2.7.3, installing from a wheel,
or installing from source.

I also get a scipy segfault with scipy 0.17.0 installed from an OSX
wheel, with output ending:

test_check_finite (test_basic.TestLstsq) ...
/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/scipy/linalg/basic.py:884:
RuntimeWarning: internal gelsd driver lwork query error, required
iwork dimension not returned. This is likely the result of LAPACK bug
0038, fixed in LAPACK 3.2.2 (released July 21, 2010). Falling back to
'gelss' driver.
warnings.warn(mesg, RuntimeWarning)
ok
test_random_complex_exact (test_basic.TestLstsq) ... FAIL
test_random_complex_overdet (test_basic.TestLstsq) ... Bus error

This is so whether scipy is running on top of source- or wheel-built
numpy, and for a scipy built from source.

Same numpy error installing on a bare Ubuntu 12.04, either installing
from a wheel built on 12.04 on travis:

pip install -f http://travis-wheels.scikit-image.org --trusted-host
travis-wheels.scikit-image.org --no-index numpy

or from numpy built from source.

I can't replicate the segfault with manylinux wheels and scipy. On
the other hand, I get a new test error for numpy from manylinux, scipy
from manylinux, like this:

$ python -c 'import scipy.linalg; scipy.linalg.test()'

======================================================================
FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
197, in runTest
self.test(*self.arg)
File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
line 658, in eigenhproblem_general
assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
line 892, in assert_array_almost_equal
precision=decimal)
File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
line 713, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 4 decimals

(mismatch 100.0%)
x: array([ 0., 0., 0.], dtype=float32)
y: array([ 1., 1., 1.])

----------------------------------------------------------------------
Ran 1507 tests in 14.928s

FAILED (KNOWNFAIL=4, SKIP=1, failures=1)

This is a very odd error, which we don't get when running over a numpy
installed from source, linked to ATLAS, and doesn't happen when
running the tests via:

nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg

So, something about the copy of numpy (linked to openblas) is
affecting the results of scipy (also linked to openblas), and only
with a particular environment / test order.

If you'd like to try and see whether y'all can do a better job of
debugging than me:

# Run this script inside a docker container started with this incantation:
# docker run -ti --rm ubuntu:12.04 /bin/bash
apt-get update
apt-get install -y python curl
apt-get install libpython2.7 # this won't be necessary with next
iteration of manylinux wheel builds
curl -LO https://bootstrap.pypa.io/get-pip.py
python get-pip.py
pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
python -c 'import scipy.linalg; scipy.linalg.test()'

Cheers,

Matthew
Nathaniel Smith
2016-02-09 01:26:49 UTC
Permalink
On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
[...]
> I can't replicate the segfault with manylinux wheels and scipy. On
> the other hand, I get a new test error for numpy from manylinux, scipy
> from manylinux, like this:
>
> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>
> ======================================================================
> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
> 197, in runTest
> self.test(*self.arg)
> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
> line 658, in eigenhproblem_general
> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
> line 892, in assert_array_almost_equal
> precision=decimal)
> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
> line 713, in assert_array_compare
> raise AssertionError(msg)
> AssertionError:
> Arrays are not almost equal to 4 decimals
>
> (mismatch 100.0%)
> x: array([ 0., 0., 0.], dtype=float32)
> y: array([ 1., 1., 1.])
>
> ----------------------------------------------------------------------
> Ran 1507 tests in 14.928s
>
> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>
> This is a very odd error, which we don't get when running over a numpy
> installed from source, linked to ATLAS, and doesn't happen when
> running the tests via:
>
> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>
> So, something about the copy of numpy (linked to openblas) is
> affecting the results of scipy (also linked to openblas), and only
> with a particular environment / test order.
>
> If you'd like to try and see whether y'all can do a better job of
> debugging than me:
>
> # Run this script inside a docker container started with this incantation:
> # docker run -ti --rm ubuntu:12.04 /bin/bash
> apt-get update
> apt-get install -y python curl
> apt-get install libpython2.7 # this won't be necessary with next
> iteration of manylinux wheel builds
> curl -LO https://bootstrap.pypa.io/get-pip.py
> python get-pip.py
> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
> python -c 'import scipy.linalg; scipy.linalg.test()'

I just tried this and on my laptop it completed without error.

Best guess is that we're dealing with some memory corruption bug
inside openblas, so it's getting perturbed by things like exactly what
other calls to openblas have happened (which is different depending on
whether numpy is linked to openblas), and which core type openblas has
detected.

On my laptop, which *doesn't* show the problem, running with
OPENBLAS_VERBOSE=2 says "Core: Haswell".

Guess the next step is checking what core type the failing machines
use, and running valgrind... anyone have a good valgrind suppressions
file?

-n

--
Nathaniel J. Smith -- https://vorpus.org
Matthew Brett
2016-02-09 02:04:18 UTC
Permalink
On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
> [...]
>> I can't replicate the segfault with manylinux wheels and scipy. On
>> the other hand, I get a new test error for numpy from manylinux, scipy
>> from manylinux, like this:
>>
>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>
>> ======================================================================
>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>> 197, in runTest
>> self.test(*self.arg)
>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>> line 658, in eigenhproblem_general
>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>> line 892, in assert_array_almost_equal
>> precision=decimal)
>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>> line 713, in assert_array_compare
>> raise AssertionError(msg)
>> AssertionError:
>> Arrays are not almost equal to 4 decimals
>>
>> (mismatch 100.0%)
>> x: array([ 0., 0., 0.], dtype=float32)
>> y: array([ 1., 1., 1.])
>>
>> ----------------------------------------------------------------------
>> Ran 1507 tests in 14.928s
>>
>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>
>> This is a very odd error, which we don't get when running over a numpy
>> installed from source, linked to ATLAS, and doesn't happen when
>> running the tests via:
>>
>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>
>> So, something about the copy of numpy (linked to openblas) is
>> affecting the results of scipy (also linked to openblas), and only
>> with a particular environment / test order.
>>
>> If you'd like to try and see whether y'all can do a better job of
>> debugging than me:
>>
>> # Run this script inside a docker container started with this incantation:
>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>> apt-get update
>> apt-get install -y python curl
>> apt-get install libpython2.7 # this won't be necessary with next
>> iteration of manylinux wheel builds
>> curl -LO https://bootstrap.pypa.io/get-pip.py
>> python get-pip.py
>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>> python -c 'import scipy.linalg; scipy.linalg.test()'
>
> I just tried this and on my laptop it completed without error.
>
> Best guess is that we're dealing with some memory corruption bug
> inside openblas, so it's getting perturbed by things like exactly what
> other calls to openblas have happened (which is different depending on
> whether numpy is linked to openblas), and which core type openblas has
> detected.
>
> On my laptop, which *doesn't* show the problem, running with
> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>
> Guess the next step is checking what core type the failing machines
> use, and running valgrind... anyone have a good valgrind suppressions
> file?

My machine (which does give the failure) gives

Core: Core2

with OPENBLAS_VERBOSE=2

Matthew
Nathaniel Smith
2016-02-09 02:07:04 UTC
Permalink
On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <***@gmail.com> wrote:
> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
>> [...]
>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>> from manylinux, like this:
>>>
>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>
>>> ======================================================================
>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>> 197, in runTest
>>> self.test(*self.arg)
>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>> line 658, in eigenhproblem_general
>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>> line 892, in assert_array_almost_equal
>>> precision=decimal)
>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>> line 713, in assert_array_compare
>>> raise AssertionError(msg)
>>> AssertionError:
>>> Arrays are not almost equal to 4 decimals
>>>
>>> (mismatch 100.0%)
>>> x: array([ 0., 0., 0.], dtype=float32)
>>> y: array([ 1., 1., 1.])
>>>
>>> ----------------------------------------------------------------------
>>> Ran 1507 tests in 14.928s
>>>
>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>
>>> This is a very odd error, which we don't get when running over a numpy
>>> installed from source, linked to ATLAS, and doesn't happen when
>>> running the tests via:
>>>
>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>
>>> So, something about the copy of numpy (linked to openblas) is
>>> affecting the results of scipy (also linked to openblas), and only
>>> with a particular environment / test order.
>>>
>>> If you'd like to try and see whether y'all can do a better job of
>>> debugging than me:
>>>
>>> # Run this script inside a docker container started with this incantation:
>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>> apt-get update
>>> apt-get install -y python curl
>>> apt-get install libpython2.7 # this won't be necessary with next
>>> iteration of manylinux wheel builds
>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>> python get-pip.py
>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>
>> I just tried this and on my laptop it completed without error.
>>
>> Best guess is that we're dealing with some memory corruption bug
>> inside openblas, so it's getting perturbed by things like exactly what
>> other calls to openblas have happened (which is different depending on
>> whether numpy is linked to openblas), and which core type openblas has
>> detected.
>>
>> On my laptop, which *doesn't* show the problem, running with
>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>
>> Guess the next step is checking what core type the failing machines
>> use, and running valgrind... anyone have a good valgrind suppressions
>> file?
>
> My machine (which does give the failure) gives
>
> Core: Core2
>
> with OPENBLAS_VERBOSE=2

Yep, that allows me to reproduce it:

***@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
-c 'import scipy.linalg; scipy.linalg.test()'
Core: Core2
[...]
======================================================================
FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
----------------------------------------------------------------------
[...]

So this is indeed sounding like an OpenBLAS issue... next stop
valgrind, I guess :-/

--
Nathaniel J. Smith -- https://vorpus.org
Nathaniel Smith
2016-02-09 03:59:07 UTC
Permalink
On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <***@pobox.com> wrote:
> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <***@gmail.com> wrote:
>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
>>> [...]
>>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>> from manylinux, like this:
>>>>
>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>
>>>> ======================================================================
>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>> 197, in runTest
>>>> self.test(*self.arg)
>>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>> line 658, in eigenhproblem_general
>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>> line 892, in assert_array_almost_equal
>>>> precision=decimal)
>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>> line 713, in assert_array_compare
>>>> raise AssertionError(msg)
>>>> AssertionError:
>>>> Arrays are not almost equal to 4 decimals
>>>>
>>>> (mismatch 100.0%)
>>>> x: array([ 0., 0., 0.], dtype=float32)
>>>> y: array([ 1., 1., 1.])
>>>>
>>>> ----------------------------------------------------------------------
>>>> Ran 1507 tests in 14.928s
>>>>
>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>
>>>> This is a very odd error, which we don't get when running over a numpy
>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>> running the tests via:
>>>>
>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>
>>>> So, something about the copy of numpy (linked to openblas) is
>>>> affecting the results of scipy (also linked to openblas), and only
>>>> with a particular environment / test order.
>>>>
>>>> If you'd like to try and see whether y'all can do a better job of
>>>> debugging than me:
>>>>
>>>> # Run this script inside a docker container started with this incantation:
>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>> apt-get update
>>>> apt-get install -y python curl
>>>> apt-get install libpython2.7 # this won't be necessary with next
>>>> iteration of manylinux wheel builds
>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>> python get-pip.py
>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>
>>> I just tried this and on my laptop it completed without error.
>>>
>>> Best guess is that we're dealing with some memory corruption bug
>>> inside openblas, so it's getting perturbed by things like exactly what
>>> other calls to openblas have happened (which is different depending on
>>> whether numpy is linked to openblas), and which core type openblas has
>>> detected.
>>>
>>> On my laptop, which *doesn't* show the problem, running with
>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>
>>> Guess the next step is checking what core type the failing machines
>>> use, and running valgrind... anyone have a good valgrind suppressions
>>> file?
>>
>> My machine (which does give the failure) gives
>>
>> Core: Core2
>>
>> with OPENBLAS_VERBOSE=2
>
> Yep, that allows me to reproduce it:
>
> ***@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
> -c 'import scipy.linalg; scipy.linalg.test()'
> Core: Core2
> [...]
> ======================================================================
> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
> ----------------------------------------------------------------------
> [...]
>
> So this is indeed sounding like an OpenBLAS issue... next stop
> valgrind, I guess :-/

Here's the valgrind output:
https://gist.github.com/njsmith/577d028e79f0a80d2797

There's a lot of it, but no smoking guns have jumped out at me :-/

-n

--
Nathaniel J. Smith -- https://vorpus.org
Julian Taylor
2016-02-09 19:37:26 UTC
Permalink
On 09.02.2016 04:59, Nathaniel Smith wrote:
> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <***@pobox.com> wrote:
>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <***@gmail.com> wrote:
>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
>>>> [...]
>>>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>> from manylinux, like this:
>>>>>
>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>
>>>>> ======================================================================
>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>>> ----------------------------------------------------------------------
>>>>> Traceback (most recent call last):
>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>> 197, in runTest
>>>>> self.test(*self.arg)
>>>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>> line 658, in eigenhproblem_general
>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>> line 892, in assert_array_almost_equal
>>>>> precision=decimal)
>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>> line 713, in assert_array_compare
>>>>> raise AssertionError(msg)
>>>>> AssertionError:
>>>>> Arrays are not almost equal to 4 decimals
>>>>>
>>>>> (mismatch 100.0%)
>>>>> x: array([ 0., 0., 0.], dtype=float32)
>>>>> y: array([ 1., 1., 1.])
>>>>>
>>>>> ----------------------------------------------------------------------
>>>>> Ran 1507 tests in 14.928s
>>>>>
>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>
>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>> running the tests via:
>>>>>
>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>
>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>> with a particular environment / test order.
>>>>>
>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>> debugging than me:
>>>>>
>>>>> # Run this script inside a docker container started with this incantation:
>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>> apt-get update
>>>>> apt-get install -y python curl
>>>>> apt-get install libpython2.7 # this won't be necessary with next
>>>>> iteration of manylinux wheel builds
>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>> python get-pip.py
>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>
>>>> I just tried this and on my laptop it completed without error.
>>>>
>>>> Best guess is that we're dealing with some memory corruption bug
>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>> other calls to openblas have happened (which is different depending on
>>>> whether numpy is linked to openblas), and which core type openblas has
>>>> detected.
>>>>
>>>> On my laptop, which *doesn't* show the problem, running with
>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>
>>>> Guess the next step is checking what core type the failing machines
>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>> file?
>>>
>>> My machine (which does give the failure) gives
>>>
>>> Core: Core2
>>>
>>> with OPENBLAS_VERBOSE=2
>>
>> Yep, that allows me to reproduce it:
>>
>> ***@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>> -c 'import scipy.linalg; scipy.linalg.test()'
>> Core: Core2
>> [...]
>> ======================================================================
>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>> ----------------------------------------------------------------------
>> [...]
>>
>> So this is indeed sounding like an OpenBLAS issue... next stop
>> valgrind, I guess :-/
>
> Here's the valgrind output:
> https://gist.github.com/njsmith/577d028e79f0a80d2797
>
> There's a lot of it, but no smoking guns have jumped out at me :-/
>
> -n
>

plenty of smoking guns, e.g.:

.............==3695== Invalid read of size 8
3417 ==3695== at 0x7AAA9C0: daxpy_k_CORE2 (in
/usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
3418 ==3695== by 0x76BEEFC: ger_kernel (in
/usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
3419 ==3695== by 0x788F618: exec_blas (in
/usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
3420 ==3695== by 0x76BF099: dger_thread (in
/usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
3421 ==3695== by 0x767DC37: dger_ (in
/usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)


I think I have reported that to openblas already, they said do that
intentionally, though last I checked they are missing the code that
verifies this is actually allowed (if your not crossing a page you can
read beyond the boundaries). Its pretty likely its a pointless micro
optimization, you normally only use that trick for string functions
where you don't know the size of the string.

Your code also indicates it ran on core2, while the issues occur on
sandybridge, maybe valgrind messes with the cpu detection so it won't
show anything.
Matthew Brett
2016-02-09 19:40:17 UTC
Permalink
On Tue, Feb 9, 2016 at 11:37 AM, Julian Taylor
<***@googlemail.com> wrote:
> On 09.02.2016 04:59, Nathaniel Smith wrote:
>> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <***@pobox.com> wrote:
>>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <***@gmail.com> wrote:
>>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
>>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
>>>>> [...]
>>>>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>>> from manylinux, like this:
>>>>>>
>>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>>
>>>>>> ======================================================================
>>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>>>> ----------------------------------------------------------------------
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>>> 197, in runTest
>>>>>> self.test(*self.arg)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>>> line 658, in eigenhproblem_general
>>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 892, in assert_array_almost_equal
>>>>>> precision=decimal)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 713, in assert_array_compare
>>>>>> raise AssertionError(msg)
>>>>>> AssertionError:
>>>>>> Arrays are not almost equal to 4 decimals
>>>>>>
>>>>>> (mismatch 100.0%)
>>>>>> x: array([ 0., 0., 0.], dtype=float32)
>>>>>> y: array([ 1., 1., 1.])
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> Ran 1507 tests in 14.928s
>>>>>>
>>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>>
>>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>>> running the tests via:
>>>>>>
>>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>>
>>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>>> with a particular environment / test order.
>>>>>>
>>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>>> debugging than me:
>>>>>>
>>>>>> # Run this script inside a docker container started with this incantation:
>>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>>> apt-get update
>>>>>> apt-get install -y python curl
>>>>>> apt-get install libpython2.7 # this won't be necessary with next
>>>>>> iteration of manylinux wheel builds
>>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>>> python get-pip.py
>>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>
>>>>> I just tried this and on my laptop it completed without error.
>>>>>
>>>>> Best guess is that we're dealing with some memory corruption bug
>>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>>> other calls to openblas have happened (which is different depending on
>>>>> whether numpy is linked to openblas), and which core type openblas has
>>>>> detected.
>>>>>
>>>>> On my laptop, which *doesn't* show the problem, running with
>>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>>
>>>>> Guess the next step is checking what core type the failing machines
>>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>>> file?
>>>>
>>>> My machine (which does give the failure) gives
>>>>
>>>> Core: Core2
>>>>
>>>> with OPENBLAS_VERBOSE=2
>>>
>>> Yep, that allows me to reproduce it:
>>>
>>> ***@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>>> -c 'import scipy.linalg; scipy.linalg.test()'
>>> Core: Core2
>>> [...]
>>> ======================================================================
>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>> ----------------------------------------------------------------------
>>> [...]
>>>
>>> So this is indeed sounding like an OpenBLAS issue... next stop
>>> valgrind, I guess :-/
>>
>> Here's the valgrind output:
>> https://gist.github.com/njsmith/577d028e79f0a80d2797
>>
>> There's a lot of it, but no smoking guns have jumped out at me :-/
>>
>> -n
>>
>
> plenty of smoking guns, e.g.:
>
> .............==3695== Invalid read of size 8
> 3417 ==3695== at 0x7AAA9C0: daxpy_k_CORE2 (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3418 ==3695== by 0x76BEEFC: ger_kernel (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3419 ==3695== by 0x788F618: exec_blas (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3420 ==3695== by 0x76BF099: dger_thread (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3421 ==3695== by 0x767DC37: dger_ (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
>
>
> I think I have reported that to openblas already, they said do that
> intentionally, though last I checked they are missing the code that
> verifies this is actually allowed (if your not crossing a page you can
> read beyond the boundaries). Its pretty likely its a pointless micro
> optimization, you normally only use that trick for string functions
> where you don't know the size of the string.
>
> Your code also indicates it ran on core2, while the issues occur on
> sandybridge, maybe valgrind messes with the cpu detection so it won't
> show anything.

Julian - thanks for having a look. Do you happen to remember the
openblas issue number for this?

Was there an obvious place we could patch openblas to avoid this error
in particular?

Cheers,

Matthew
Nathaniel Smith
2016-02-09 20:01:16 UTC
Permalink
On Tue, Feb 9, 2016 at 11:37 AM, Julian Taylor
<***@googlemail.com> wrote:
> On 09.02.2016 04:59, Nathaniel Smith wrote:
>> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <***@pobox.com> wrote:
>>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <***@gmail.com> wrote:
>>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
>>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
>>>>> [...]
>>>>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>>> from manylinux, like this:
>>>>>>
>>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>>
>>>>>> ======================================================================
>>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>>>> ----------------------------------------------------------------------
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>>> 197, in runTest
>>>>>> self.test(*self.arg)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>>> line 658, in eigenhproblem_general
>>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 892, in assert_array_almost_equal
>>>>>> precision=decimal)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 713, in assert_array_compare
>>>>>> raise AssertionError(msg)
>>>>>> AssertionError:
>>>>>> Arrays are not almost equal to 4 decimals
>>>>>>
>>>>>> (mismatch 100.0%)
>>>>>> x: array([ 0., 0., 0.], dtype=float32)
>>>>>> y: array([ 1., 1., 1.])
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> Ran 1507 tests in 14.928s
>>>>>>
>>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>>
>>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>>> running the tests via:
>>>>>>
>>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>>
>>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>>> with a particular environment / test order.
>>>>>>
>>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>>> debugging than me:
>>>>>>
>>>>>> # Run this script inside a docker container started with this incantation:
>>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>>> apt-get update
>>>>>> apt-get install -y python curl
>>>>>> apt-get install libpython2.7 # this won't be necessary with next
>>>>>> iteration of manylinux wheel builds
>>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>>> python get-pip.py
>>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>
>>>>> I just tried this and on my laptop it completed without error.
>>>>>
>>>>> Best guess is that we're dealing with some memory corruption bug
>>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>>> other calls to openblas have happened (which is different depending on
>>>>> whether numpy is linked to openblas), and which core type openblas has
>>>>> detected.
>>>>>
>>>>> On my laptop, which *doesn't* show the problem, running with
>>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>>
>>>>> Guess the next step is checking what core type the failing machines
>>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>>> file?
>>>>
>>>> My machine (which does give the failure) gives
>>>>
>>>> Core: Core2
>>>>
>>>> with OPENBLAS_VERBOSE=2
>>>
>>> Yep, that allows me to reproduce it:
>>>
>>> ***@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>>> -c 'import scipy.linalg; scipy.linalg.test()'
>>> Core: Core2
>>> [...]
>>> ======================================================================
>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>> ----------------------------------------------------------------------
>>> [...]
>>>
>>> So this is indeed sounding like an OpenBLAS issue... next stop
>>> valgrind, I guess :-/
>>
>> Here's the valgrind output:
>> https://gist.github.com/njsmith/577d028e79f0a80d2797
>>
>> There's a lot of it, but no smoking guns have jumped out at me :-/
>>
>> -n
>>
>
> plenty of smoking guns, e.g.:
>
> .............==3695== Invalid read of size 8
> 3417 ==3695== at 0x7AAA9C0: daxpy_k_CORE2 (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3418 ==3695== by 0x76BEEFC: ger_kernel (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3419 ==3695== by 0x788F618: exec_blas (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3420 ==3695== by 0x76BF099: dger_thread (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
> 3421 ==3695== by 0x767DC37: dger_ (in
> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
>
>
> I think I have reported that to openblas already, they said do that
> intentionally, though last I checked they are missing the code that
> verifies this is actually allowed (if your not crossing a page you can
> read beyond the boundaries). Its pretty likely its a pointless micro
> optimization, you normally only use that trick for string functions
> where you don't know the size of the string.

Yeah, I thought that was intentional, and we're not getting a segfault
so I don't think they're hitting any page boundaries. It's possible
they're screwing it up and somehow the random data they're reading can
affect the results, and that's why we get the wrong answer sometimes,
but that's just a wild guess.

> Your code also indicates it ran on core2, while the issues occur on
> sandybridge, maybe valgrind messes with the cpu detection so it won't
> show anything.

It ran on core2 because I set OPENBLAS_CORETYPE=core2, since that
seems to trigger the particular issue that Matthew ran into (and
indeed, the relevant test failure did occur in that valgrind run). The
sandybridge thing is a different issue I think.

-n

--
Nathaniel J. Smith -- https://vorpus.org
Julian Taylor
2016-02-09 20:08:25 UTC
Permalink
On 09.02.2016 21:01, Nathaniel Smith wrote:
> On Tue, Feb 9, 2016 at 11:37 AM, Julian Taylor
> <***@googlemail.com> wrote:
>> On 09.02.2016 04:59, Nathaniel Smith wrote:
>>> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <***@pobox.com> wrote:
>>>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <***@gmail.com> wrote:
>>>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
>>>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
>>>>>> [...]
>>>>>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>>>> from manylinux, like this:
>>>>>>>
>>>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>>>
>>>>>>> ======================================================================
>>>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>>>>> ----------------------------------------------------------------------
>>>>>>> Traceback (most recent call last):
>>>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>>>> 197, in runTest
>>>>>>> self.test(*self.arg)
>>>>>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>>>> line 658, in eigenhproblem_general
>>>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>>> line 892, in assert_array_almost_equal
>>>>>>> precision=decimal)
>>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>>> line 713, in assert_array_compare
>>>>>>> raise AssertionError(msg)
>>>>>>> AssertionError:
>>>>>>> Arrays are not almost equal to 4 decimals
>>>>>>>
>>>>>>> (mismatch 100.0%)
>>>>>>> x: array([ 0., 0., 0.], dtype=float32)
>>>>>>> y: array([ 1., 1., 1.])
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>> Ran 1507 tests in 14.928s
>>>>>>>
>>>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>>>
>>>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>>>> running the tests via:
>>>>>>>
>>>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>>>
>>>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>>>> with a particular environment / test order.
>>>>>>>
>>>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>>>> debugging than me:
>>>>>>>
>>>>>>> # Run this script inside a docker container started with this incantation:
>>>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>>>> apt-get update
>>>>>>> apt-get install -y python curl
>>>>>>> apt-get install libpython2.7 # this won't be necessary with next
>>>>>>> iteration of manylinux wheel builds
>>>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>>>> python get-pip.py
>>>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>>
>>>>>> I just tried this and on my laptop it completed without error.
>>>>>>
>>>>>> Best guess is that we're dealing with some memory corruption bug
>>>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>>>> other calls to openblas have happened (which is different depending on
>>>>>> whether numpy is linked to openblas), and which core type openblas has
>>>>>> detected.
>>>>>>
>>>>>> On my laptop, which *doesn't* show the problem, running with
>>>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>>>
>>>>>> Guess the next step is checking what core type the failing machines
>>>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>>>> file?
>>>>>
>>>>> My machine (which does give the failure) gives
>>>>>
>>>>> Core: Core2
>>>>>
>>>>> with OPENBLAS_VERBOSE=2
>>>>
>>>> Yep, that allows me to reproduce it:
>>>>
>>>> ***@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>>>> -c 'import scipy.linalg; scipy.linalg.test()'
>>>> Core: Core2
>>>> [...]
>>>> ======================================================================
>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>> ----------------------------------------------------------------------
>>>> [...]
>>>>
>>>> So this is indeed sounding like an OpenBLAS issue... next stop
>>>> valgrind, I guess :-/
>>>
>>> Here's the valgrind output:
>>> https://gist.github.com/njsmith/577d028e79f0a80d2797
>>>
>>> There's a lot of it, but no smoking guns have jumped out at me :-/
>>>
>>> -n
>>>
>>
>> plenty of smoking guns, e.g.:
>>
>> .............==3695== Invalid read of size 8
>> 3417 ==3695== at 0x7AAA9C0: daxpy_k_CORE2 (in
>> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
>> 3418 ==3695== by 0x76BEEFC: ger_kernel (in
>> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
>> 3419 ==3695== by 0x788F618: exec_blas (in
>> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
>> 3420 ==3695== by 0x76BF099: dger_thread (in
>> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
>> 3421 ==3695== by 0x767DC37: dger_ (in
>> /usr/local/lib/python2.7/dist-packages/numpy/.libs/libopenblas.so.0)
>>
>>
>> I think I have reported that to openblas already, they said do that
>> intentionally, though last I checked they are missing the code that
>> verifies this is actually allowed (if your not crossing a page you can
>> read beyond the boundaries). Its pretty likely its a pointless micro
>> optimization, you normally only use that trick for string functions
>> where you don't know the size of the string.
>
> Yeah, I thought that was intentional, and we're not getting a segfault
> so I don't think they're hitting any page boundaries. It's possible
> they're screwing it up and somehow the random data they're reading can
> affect the results, and that's why we get the wrong answer sometimes,
> but that's just a wild guess.

with openblas everything is possible, especially this exact type of issue.
See e.g.:
https://github.com/xianyi/OpenBLAS/issues/171
here it loaded too much data, partly uninitialized, and if its filled
with nan it spreads into the actually used data.
That was a lot of fun to debug, and openblas is riddled with this stuff...

e.g. here my favourite comment in openblas (which is probably the source
of https://github.com/scipy/scipy/issues/5528):

51 /* make it volatile because some function (ex: dgemv_n.S) */
\
52 /* do not restore all register */
\
https://github.com/xianyi/OpenBLAS/blob/develop/common_stackalloc.h#L51
Matthew Brett
2016-02-09 19:52:35 UTC
Permalink
On Mon, Feb 8, 2016 at 7:59 PM, Nathaniel Smith <***@pobox.com> wrote:
> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <***@pobox.com> wrote:
>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <***@gmail.com> wrote:
>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
>>>> [...]
>>>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>> from manylinux, like this:
>>>>>
>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>
>>>>> ======================================================================
>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>>> ----------------------------------------------------------------------
>>>>> Traceback (most recent call last):
>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>> 197, in runTest
>>>>> self.test(*self.arg)
>>>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>> line 658, in eigenhproblem_general
>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>> line 892, in assert_array_almost_equal
>>>>> precision=decimal)
>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>> line 713, in assert_array_compare
>>>>> raise AssertionError(msg)
>>>>> AssertionError:
>>>>> Arrays are not almost equal to 4 decimals
>>>>>
>>>>> (mismatch 100.0%)
>>>>> x: array([ 0., 0., 0.], dtype=float32)
>>>>> y: array([ 1., 1., 1.])
>>>>>
>>>>> ----------------------------------------------------------------------
>>>>> Ran 1507 tests in 14.928s
>>>>>
>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>
>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>> running the tests via:
>>>>>
>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>
>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>> with a particular environment / test order.
>>>>>
>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>> debugging than me:
>>>>>
>>>>> # Run this script inside a docker container started with this incantation:
>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>> apt-get update
>>>>> apt-get install -y python curl
>>>>> apt-get install libpython2.7 # this won't be necessary with next
>>>>> iteration of manylinux wheel builds
>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>> python get-pip.py
>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>
>>>> I just tried this and on my laptop it completed without error.
>>>>
>>>> Best guess is that we're dealing with some memory corruption bug
>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>> other calls to openblas have happened (which is different depending on
>>>> whether numpy is linked to openblas), and which core type openblas has
>>>> detected.
>>>>
>>>> On my laptop, which *doesn't* show the problem, running with
>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>
>>>> Guess the next step is checking what core type the failing machines
>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>> file?
>>>
>>> My machine (which does give the failure) gives
>>>
>>> Core: Core2
>>>
>>> with OPENBLAS_VERBOSE=2
>>
>> Yep, that allows me to reproduce it:
>>
>> ***@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>> -c 'import scipy.linalg; scipy.linalg.test()'
>> Core: Core2
>> [...]
>> ======================================================================
>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>> ----------------------------------------------------------------------
>> [...]
>>
>> So this is indeed sounding like an OpenBLAS issue... next stop
>> valgrind, I guess :-/
>
> Here's the valgrind output:
> https://gist.github.com/njsmith/577d028e79f0a80d2797
>
> There's a lot of it, but no smoking guns have jumped out at me :-/

Could you send me instructions on replicating the valgrind run, I'll
run on on the actual Core2 machine...

Matthew
Julian Taylor
2016-02-09 19:55:28 UTC
Permalink
On 09.02.2016 20:52, Matthew Brett wrote:
> On Mon, Feb 8, 2016 at 7:59 PM, Nathaniel Smith <***@pobox.com> wrote:
>> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <***@pobox.com> wrote:
>>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <***@gmail.com> wrote:
>>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <***@pobox.com> wrote:
>>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <***@gmail.com> wrote:
>>>>> [...]
>>>>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>>> from manylinux, like this:
>>>>>>
>>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>>
>>>>>> ======================================================================
>>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>>>> ----------------------------------------------------------------------
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>>> 197, in runTest
>>>>>> self.test(*self.arg)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>>> line 658, in eigenhproblem_general
>>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 892, in assert_array_almost_equal
>>>>>> precision=decimal)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 713, in assert_array_compare
>>>>>> raise AssertionError(msg)
>>>>>> AssertionError:
>>>>>> Arrays are not almost equal to 4 decimals
>>>>>>
>>>>>> (mismatch 100.0%)
>>>>>> x: array([ 0., 0., 0.], dtype=float32)
>>>>>> y: array([ 1., 1., 1.])
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> Ran 1507 tests in 14.928s
>>>>>>
>>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>>
>>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>>> running the tests via:
>>>>>>
>>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>>
>>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>>> with a particular environment / test order.
>>>>>>
>>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>>> debugging than me:
>>>>>>
>>>>>> # Run this script inside a docker container started with this incantation:
>>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>>> apt-get update
>>>>>> apt-get install -y python curl
>>>>>> apt-get install libpython2.7 # this won't be necessary with next
>>>>>> iteration of manylinux wheel builds
>>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>>> python get-pip.py
>>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>
>>>>> I just tried this and on my laptop it completed without error.
>>>>>
>>>>> Best guess is that we're dealing with some memory corruption bug
>>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>>> other calls to openblas have happened (which is different depending on
>>>>> whether numpy is linked to openblas), and which core type openblas has
>>>>> detected.
>>>>>
>>>>> On my laptop, which *doesn't* show the problem, running with
>>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>>
>>>>> Guess the next step is checking what core type the failing machines
>>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>>> file?
>>>>
>>>> My machine (which does give the failure) gives
>>>>
>>>> Core: Core2
>>>>
>>>> with OPENBLAS_VERBOSE=2
>>>
>>> Yep, that allows me to reproduce it:
>>>
>>> ***@f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>>> -c 'import scipy.linalg; scipy.linalg.test()'
>>> Core: Core2
>>> [...]
>>> ======================================================================
>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>> ----------------------------------------------------------------------
>>> [...]
>>>
>>> So this is indeed sounding like an OpenBLAS issue... next stop
>>> valgrind, I guess :-/
>>
>> Here's the valgrind output:
>> https://gist.github.com/njsmith/577d028e79f0a80d2797
>>
>> There's a lot of it, but no smoking guns have jumped out at me :-/
>
> Could you send me instructions on replicating the valgrind run, I'll
> run on on the actual Core2 machine...
>
> Matthew


please also use this suppression file, should reduce the python noise
significantly but it might be a bit out of date. Used to work fine on an
ubuntu built python.
Evgeni Burovski
2016-02-09 13:19:29 UTC
Permalink
>>> ======================================================================
>>> ERROR: test_multiarray.TestNewBufferProtocol.test_relaxed_strides
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>> File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/case.py",
>>> line 197, in runTest
>>> self.test(*self.arg)
>>> File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py",
>>> line 5366, in test_relaxed_strides
>>> fd.write(c.data)
>>> TypeError: 'buffer' does not have the buffer interface
>>>
>>> ----------------------------------------------------------------------
>>>
>>>
>>> * Scipy tests pass with one error in TestNanFuncs, but the interpreter
>>> crashes immediately afterwards.
>>>
>>>
>>> Same machine, python 3.5: both numpy and scipy tests pass.
>>
>> Ouch - great that you found these, I'll take a look,
>
> I think these are problems with numpy and Python 2.7.3 - because I got
> the same "TypeError: 'buffer' does not have the buffer interface" on
> numpy with OS X with Python.org python 2.7.3, installing from a wheel,
> or installing from source.


Indeed --- updated to python 2.7.11 (Thanks Felix Krull!) and the
failure is gone, `numpy.test()` passes. However:


>>> numpy.test("full")
Running unit tests for numpy
NumPy version 1.10.4
NumPy relaxed strides checking option: False
NumPy is installed in
/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy
Python version 2.7.11 (default, Dec 14 2015, 22:56:59) [GCC 4.6.3]
nose version 1.3.7

<snip>

======================================================================
ERROR: test_kind.TestKind.test_all
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/case.py",
line 381, in setUp
try_run(self.inst, ('setup', 'setUp'))
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/util.py",
line 471, in try_run
return func()
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/f2py/tests/util.py",
line 367, in setUp
module_name=self.module_name)
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/f2py/tests/util.py",
line 79, in wrapper
memo[key] = func(*a, **kw)
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/f2py/tests/util.py",
line 150, in build_module
__import__(module_name)
ImportError: /home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/core/../.libs/libgfortran.so.3:
version `GFORTRAN_1.4' not found (required by
/tmp/tmpPVjYDE/_test_ext_module_5405.so)

======================================================================
ERROR: test_mixed.TestMixed.test_all
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/case.py",
line 381, in setUp
try_run(self.inst, ('setup', 'setUp'))
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/util.py",
line 471, in try_run
return func()
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/f2py/tests/util.py",
line 367, in setUp
module_name=self.module_name)
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/f2py/tests/util.py",
line 79, in wrapper
memo[key] = func(*a, **kw)
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/f2py/tests/util.py",
line 150, in build_module
__import__(module_name)
ImportError: /home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/core/../.libs/libgfortran.so.3:
version `GFORTRAN_1.4' not found (required by
/tmp/tmpPVjYDE/_test_ext_module_5405.so)

======================================================================
ERROR: test_mixed.TestMixed.test_docstring
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/case.py",
line 381, in setUp
try_run(self.inst, ('setup', 'setUp'))
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/nose/util.py",
line 471, in try_run
return func()
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/f2py/tests/util.py",
line 367, in setUp
module_name=self.module_name)
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/f2py/tests/util.py",
line 85, in wrapper
raise ret
ImportError: /home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/core/../.libs/libgfortran.so.3:
version `GFORTRAN_1.4' not found (required by
/tmp/tmpPVjYDE/_test_ext_module_5405.so)

======================================================================
ERROR: test_basic (test_function_base.TestMedian)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/lib/tests/test_function_base.py",
line 2361, in test_basic
assert_(w[0].category is RuntimeWarning)
IndexError: list index out of range

======================================================================
ERROR: test_nan_behavior (test_function_base.TestMedian)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/lib/tests/test_function_base.py",
line 2464, in test_nan_behavior
assert_(w[0].category is RuntimeWarning)
IndexError: list index out of range

======================================================================
FAIL: test_default (test_numeric.TestSeterr)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py",
line 281, in test_default
under='ignore',
AssertionError: {'over': 'raise', 'divide': 'warn', 'invalid': 'warn',
'under': 'ignore'} != {'over': 'warn', 'divide': 'warn', 'invalid':
'warn', 'under': 'ignore'}
- {'divide': 'warn', 'invalid': 'warn', 'over': 'raise', 'under': 'ignore'}
? ^^^^

+ {'divide': 'warn', 'invalid': 'warn', 'over': 'warn', 'under': 'ignore'}
? ++ ^


======================================================================
FAIL: test_allnans (test_nanfunctions.TestNanFunctions_Median)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/lib/tests/test_nanfunctions.py",
line 544, in test_allnans
assert_(len(w) == 1)
File "/home/br/virtualenvs/manylinux/local/lib/python2.7/site-packages/numpy/testing/utils.py",
line 53, in assert_
raise AssertionError(smsg)
AssertionError

----------------------------------------------------------------------
Ran 6148 tests in 77.301s

FAILED (KNOWNFAIL=3, SKIP=6, errors=5, failures=2)
<nose.result.TextTestResult run=6148 errors=5 failures=2>


Not sure if any of these are present for python 2.7.3 and can no
longer easily test.

`scipy.test("full")` almost passes, there's a bunch of
warnings-related noise and https://github.com/scipy/scipy/issues/5823
Nothing too bad on that machine, it seems :-).


> I also get a scipy segfault with scipy 0.17.0 installed from an OSX
> wheel, with output ending:
>
> test_check_finite (test_basic.TestLstsq) ...
> /Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/scipy/linalg/basic.py:884:
> RuntimeWarning: internal gelsd driver lwork query error, required
> iwork dimension not returned. This is likely the result of LAPACK bug
> 0038, fixed in LAPACK 3.2.2 (released July 21, 2010). Falling back to
> 'gelss' driver.
> warnings.warn(mesg, RuntimeWarning)
> ok
> test_random_complex_exact (test_basic.TestLstsq) ... FAIL
> test_random_complex_overdet (test_basic.TestLstsq) ... Bus error

Oh, that one again...




> This is so whether scipy is running on top of source- or wheel-built
> numpy, and for a scipy built from source.
>
> Same numpy error installing on a bare Ubuntu 12.04, either installing
> from a wheel built on 12.04 on travis:
>
> pip install -f http://travis-wheels.scikit-image.org --trusted-host
> travis-wheels.scikit-image.org --no-index numpy
>
> or from numpy built from source.
>
> I can't replicate the segfault with manylinux wheels and scipy. On
> the other hand, I get a new test error for numpy from manylinux, scipy
> from manylinux, like this:
>
> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>
> ======================================================================
> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
> 197, in runTest
> self.test(*self.arg)
> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
> line 658, in eigenhproblem_general
> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
> line 892, in assert_array_almost_equal
> precision=decimal)
> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
> line 713, in assert_array_compare
> raise AssertionError(msg)
> AssertionError:
> Arrays are not almost equal to 4 decimals
>
> (mismatch 100.0%)
> x: array([ 0., 0., 0.], dtype=float32)
> y: array([ 1., 1., 1.])
>
> ----------------------------------------------------------------------
> Ran 1507 tests in 14.928s
>
> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>
> This is a very odd error, which we don't get when running over a numpy
> installed from source, linked to ATLAS, and doesn't happen when
> running the tests via:
>
> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>
> So, something about the copy of numpy (linked to openblas) is
> affecting the results of scipy (also linked to openblas), and only
> with a particular environment / test order.
>
> If you'd like to try and see whether y'all can do a better job of
> debugging than me:
>
> # Run this script inside a docker container started with this incantation:
> # docker run -ti --rm ubuntu:12.04 /bin/bash
> apt-get update
> apt-get install -y python curl
> apt-get install libpython2.7 # this won't be necessary with next
> iteration of manylinux wheel builds
> curl -LO https://bootstrap.pypa.io/get-pip.py
> python get-pip.py
> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
> python -c 'import scipy.linalg; scipy.linalg.test()'
>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...