[Numpy-discussion] Is numpy.test() supposed to be multithreaded?

Discussion:

Michael Ward

2016-06-28 20:36:26 UTC

Heya, I'm not a numbers guy, but I maintain servers for scientists and
researchers who are. Someone pointed out that our numpy installation on
a particular server was only using one core. I'm unaware of the who/how
the previous version of numpy/OpenBLAS were installed, so I installed
them from scratch, and confirmed that the users test code now runs on
multiple cores as expected, drastically increasing performance time.

Now the user is writing back to say, "my test code is fast now, but
numpy.test() is still about three times slower than <some other server
we don't manage>". When I watch htop as numpy.test() executes, sure
enough, it's using one core. Now I'm not sure if that's the expected
behavior or not. Questions:

* if numpy.test() is supposed to be using multiple cores, why isn't it,
when we've established with other test code that it's now using multiple
cores?

* if numpy.test() is not supposed to be using multiple cores, what could
be the reason that the performance is drastically slower than another
server with a comparable CPU, when the user's test code performs
comparably?

For what it's worth, the users "test" code which does run on multiple
cores is as simple as:

size=4000
a = np.random.random_sample((size,size))
b = np.random.random_sample((size,size))
x = np.dot(a,b)

Whereas this uses only one core:

numpy.test()

---------------------------

OpenBLAS 0.2.18 was basically just compiled with "make", nothing special
to it. Numpy 1.11.0 was installed from source (python setup.py
install), using a site.cfg file to point numpy to the new OpenBLAS.

Thanks,
Mike

Ralf Gommers

2016-06-28 20:53:09 UTC

Permalink

Post by Michael Ward
Heya, I'm not a numbers guy, but I maintain servers for scientists and
researchers who are. Someone pointed out that our numpy installation on a
particular server was only using one core. I'm unaware of the who/how the
previous version of numpy/OpenBLAS were installed, so I installed them from
scratch, and confirmed that the users test code now runs on multiple cores
as expected, drastically increasing performance time.
Now the user is writing back to say, "my test code is fast now, but
numpy.test() is still about three times slower than <some other server we
don't manage>". When I watch htop as numpy.test() executes, sure enough,
it's using one core. Now I'm not sure if that's the expected behavior or
* if numpy.test() is supposed to be using multiple cores, why isn't it,
when we've established with other test code that it's now using multiple
cores?

Some numpy.linalg functions (like np.dot) will be using multiple cores, but
np.linalg.test() takes only ~1% of the time of the full test suite.
Everything else will be running single core. So your observations are not
surprising.

Cheers,
Ralf

Chris Barker - NOAA Federal

2016-06-29 01:27:21 UTC

Permalink

* if numpy.test() is supposed to be using multiple cores, why isn't it,

Post by Michael Ward
when we've established with other test code that it's now using multiple
cores?

Ralf Gommers

2016-06-29 07:07:14 UTC

Permalink

On Wed, Jun 29, 2016 at 3:27 AM, Chris Barker - NOAA Federal <

Post by Michael Ward

* if numpy.test() is supposed to be using multiple cores, why isn't it,

Post by Michael Ward
when we've established with other test code that it's now using multiple
cores?

Some numpy.linalg functions (like np.dot) will be using multiple cores,
but np.linalg.test() takes only ~1% of the time of the full test suite.
Everything else will be running single core. So your observations are not
surprising.
Though why it would run slower on one box than another comparable box is a
mystery...

Maybe just hardware config? I see a similar difference between how long the
test suite runs on TravisCI vs my linux desktop (the latter is slower,
surprisingly).

Ralf

Nathaniel Smith

2016-06-29 09:03:43 UTC

Permalink

As a general rule I wouldn't worry too much about test speed. Speed is
extremely dependent on exact workloads. And this is doubly so for test
suites -- production workloads tend to do a small number of normal
things over and over, while a good test suite never does the same
thing twice and spends most of its time exercising weird edge
conditions. So unless your actual workload is running the numpy test
suite :-), it's probably not worth trying to track down.

And yeah, numpy does not in general do automatic multithreading -- the
only automatic multithreading you should see is when using linear
algebra functions (matrix multiply, eigenvalue calculations, etc.)
that dispatch to the BLAS.

-n

Post by Ralf Gommers
On Wed, Jun 29, 2016 at 3:27 AM, Chris Barker - NOAA Federal

Post by Ralf Gommers

Post by Michael Ward
Now the user is writing back to say, "my test code is fast now, but
numpy.test() is still about three times slower than <some other server we
don't manage>". When I watch htop as numpy.test() executes, sure enough,
it's using one core
* if numpy.test() is supposed to be using multiple cores, why isn't it,
when we've established with other test code that it's now using multiple
cores?

Some numpy.linalg functions (like np.dot) will be using multiple cores,
but np.linalg.test() takes only ~1% of the time of the full test suite.
Everything else will be running single core. So your observations are not
surprising.
Though why it would run slower on one box than another comparable box is a
mystery...

Maybe just hardware config? I see a similar difference between how long the
test suite runs on TravisCI vs my linux desktop (the latter is slower,
surprisingly).
Ralf
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Nathaniel J. Smith -- https://vorpus.org

Sebastian Berg

2016-06-29 09:59:15 UTC

Permalink

Post by Nathaniel Smith
As a general rule I wouldn't worry too much about test speed. Speed is
extremely dependent on exact workloads. And this is doubly so for test
suites -- production workloads tend to do a small number of normal
things over and over, while a good test suite never does the same
thing twice and spends most of its time exercising weird edge
conditions. So unless your actual workload is running the numpy test
suite :-), it's probably not worth trying to track down.

Agreed, the test suit, and likely also the few tests which might take
most time in the end, could be arbitrarily weird and skewed. I could
for example imagine IO speed being a big factor. Also depending on
system configuration (or numpy version) a different number of tests may
be run sometimes.

What might make somewhat more sense would be to compare some of the
benchmarks `python runtests.py --bench` if you have airspeed velocity
installed. While not extensive, a lot of those things at least do test
more typical use cases. Though in any case I think the user should
probably just test some other thing.

- Sebastian

Post by Nathaniel Smith
And yeah, numpy does not in general do automatic multithreading -- the
only automatic multithreading you should see is when using linear
algebra functions (matrix multiply, eigenvalue calculations, etc.)
that dispatch to the BLAS.
-n

Post by Ralf Gommers
On Wed, Jun 29, 2016 at 3:27 AM, Chris Barker - NOAA Federal

Post by Ralf Gommers

Post by Michael Ward
Now the user is writing back to say, "my test code is fast now, but
numpy.test() is still about three times slower than <some other server we
don't manage>".Â Â When I watch htop as numpy.test() executes,
sure enough,
it's using one core
* if numpy.test() is supposed to be using multiple cores, why isn't it,
when we've established with other test code that it's now using multiple
cores?

Some numpy.linalg functions (like np.dot) will be using multiple cores,
but np.linalg.test() takes only ~1% of the time of the full test suite.
Everything else will be running single core. So your observations are not
surprising.
Though why it would run slower on one box than another comparable box is a
mystery...