Discussion:
[Numpy-discussion] Linking Numpy with parallel OpenBLAS
Daπid
2015-10-29 17:25:01 UTC
Permalink
I have installed all the OpenBLAS versions availables at the Fedora repos,
that include openMP and pthreads versions. But Numpy installed by pip on a
virtualenv seems to only link to the serial version. Is there a way to
convince it to use the parallel one?

Here are my libraries:

(py27)[***@SQUIDS lib64]$ ls libopenblas*
libopenblas64.a libopenblaso64.so.0 libopenblasp64.so.0
libopenblas64-r0.2.14.so libopenblaso.a libopenblasp.a
libopenblas64.so libopenblaso-r0.2.14.so
libopenblasp-r0.2.14.so
libopenblas64.so.0 libopenblaso.so libopenblasp.so
libopenblas.a libopenblaso.so.0 libopenblasp.so.0
libopenblaso64.a libopenblasp64.a libopenblas-r0.2.14.so
libopenblaso64-r0.2.14.so libopenblasp64-r0.2.14.so libopenblas.so
libopenblaso64.so libopenblasp64.so libopenblas.so.0

And importing numpy shows that the serial is the only one open:

(py27)[***@SQUIDS lib64]$ lsof libopenbl*
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ipython 2355 david mem REG 8,2 32088056 2372346
libopenblas-r0.2.14.so


This is the output of np.show_config():

lapack_opt_info:
libraries = ['openblas']
library_dirs = ['/usr/lib64']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_opt_info:
libraries = ['openblas']
library_dirs = ['/usr/lib64']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_info:
libraries = ['openblas']
library_dirs = ['/usr/lib64']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_lapack_info:
libraries = ['openblas']
library_dirs = ['/usr/lib64']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_mkl_info:
NOT AVAILABLE


Thanks,


/David.
Julian Taylor
2015-10-29 19:25:47 UTC
Permalink
should be possible by putting this into: ~/.numpy-site.cfg

[openblas]
libraries = openblasp

LD_PRELOAD the file should also work.
Post by Daπid
I have installed all the OpenBLAS versions availables at the Fedora
repos, that include openMP and pthreads versions. But Numpy installed by
pip on a virtualenv seems to only link to the serial version. Is there a
way to convince it to use the parallel one?
libopenblas64.a libopenblaso64.so.0 libopenblasp64.so.0
libopenblas64-r0.2.14.so <http://libopenblas64-r0.2.14.so>
libopenblaso.a libopenblasp.a
libopenblas64.so libopenblaso-r0.2.14.so
<http://libopenblaso-r0.2.14.so> libopenblasp-r0.2.14.so
<http://libopenblasp-r0.2.14.so>
libopenblas64.so.0 libopenblaso.so libopenblasp.so
libopenblas.a libopenblaso.so.0 libopenblasp.so.0
libopenblaso64.a libopenblasp64.a
libopenblas-r0.2.14.so <http://libopenblas-r0.2.14.so>
libopenblaso64-r0.2.14.so <http://libopenblaso64-r0.2.14.so>
libopenblasp64-r0.2.14.so <http://libopenblasp64-r0.2.14.so> libopenblas.so
libopenblaso64.so libopenblasp64.so libopenblas.so.0
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ipython 2355 david mem REG 8,2 32088056 2372346
libopenblas-r0.2.14.so <http://libopenblas-r0.2.14.so>
libraries = ['openblas']
library_dirs = ['/usr/lib64']
define_macros = [('HAVE_CBLAS', None)]
language = c
libraries = ['openblas']
library_dirs = ['/usr/lib64']
define_macros = [('HAVE_CBLAS', None)]
language = c
libraries = ['openblas']
library_dirs = ['/usr/lib64']
define_macros = [('HAVE_CBLAS', None)]
language = c
libraries = ['openblas']
library_dirs = ['/usr/lib64']
define_macros = [('HAVE_CBLAS', None)]
language = c
NOT AVAILABLE
Thanks,
/David.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Daπid
2015-10-29 20:50:34 UTC
Permalink
Post by Julian Taylor
should be possible by putting this into: ~/.numpy-site.cfg
[openblas]
libraries = openblasp
LD_PRELOAD the file should also work.
Thank!

I did some timings on a dot product of a square matrix of size 10000 with
LD_PRELOADing the different versions. I checked that all the cores were
crunching when an other than plain libopenblas/64 was selected. Here are
the timings in seconds:


Intel i5-3317U:
/usr/lib64/libopenblaso.so
86.3651878834
/usr/lib64/libopenblasp64.so
96.8817200661
/usr/lib64/libopenblas.so
114.60265708
/usr/lib64/libopenblasp.so
107.927740097
/usr/lib64/libopenblaso64.so
97.5418870449
/usr/lib64/libopenblas64.so
109.000799179

Intel i7-4770:
/usr/lib64/libopenblas.so
37.9794859886
/usr/lib64/libopenblasp.so
12.3455951214
/usr/lib64/libopenblas64.so
38.0571939945
/usr/lib64/libopenblasp64.so
12.5558650494
/usr/lib64/libopenblaso64.so
12.4118559361
/usr/lib64/libopenblaso.so
13.4787950516

Both computers have the same software and OS. So, it seems that openblas
doesn't get a significant advantage from going parallel in the older i5;
the i7 using all its cores (4 + 4 hyperthread) gains a 3x speed up, and
there is no big different between OpenMP and pthreads.

I am particullary puzzled by the i5 results, shouldn't threads get a
noticeable speedup?


/David.
Julian Taylor
2015-10-29 21:07:57 UTC
Permalink
On 29 October 2015 at 20:25, Julian Taylor
should be possible by putting this into: ~/.numpy-site.cfg
[openblas]
libraries = openblasp
LD_PRELOAD the file should also work.
Thank!
I did some timings on a dot product of a square matrix of size 10000
with LD_PRELOADing the different versions. I checked that all the cores
were crunching when an other than plain libopenblas/64 was selected.
/usr/lib64/libopenblaso.so
86.3651878834
/usr/lib64/libopenblasp64.so
96.8817200661
/usr/lib64/libopenblas.so
114.60265708
/usr/lib64/libopenblasp.so
107.927740097
/usr/lib64/libopenblaso64.so
97.5418870449 <tel:5418870449>
/usr/lib64/libopenblas64.so
109.000799179
/usr/lib64/libopenblas.so
37.9794859886
/usr/lib64/libopenblasp.so
12.3455951214
/usr/lib64/libopenblas64.so
38.0571939945
/usr/lib64/libopenblasp64.so
12.5558650494
/usr/lib64/libopenblaso64.so
12.4118559361
/usr/lib64/libopenblaso.so
13.4787950516
Both computers have the same software and OS. So, it seems that openblas
doesn't get a significant advantage from going parallel in the older i5;
the i7 using all its cores (4 + 4 hyperthread) gains a 3x speed up, and
there is no big different between OpenMP and pthreads.
I am particullary puzzled by the i5 results, shouldn't threads get a
noticeable speedup?
/David.
Try with only 2 cores instead of the 2+2 via OMP_NUM_THREADS=2, its
possible the hyperthreading is just leading to cache trashing.
Also when only one core is active the cpus will overclock themselves a
bit which will decrease relative parallelization speedups (intel turbo
boost).

Loading...