Discussion:
[Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]
Robert McGibbon
2016-01-09 11:52:39 UTC
Permalink
Hi all,

I went ahead and tried to collect a list of all of the libraries that could
be considered to constitute the "base" system for linux-64. The strategy I
used was to leverage off the work done by the folks at Continuum by
searching through their pre-compiled binaries from
https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that
were dependened on (according to ldd) that were not accounted for by the
declared dependencies that each package made known to the conda package
manager.

The full list of these system libraries, sorted in from
most-commonly-depend-on to rarest, is below. There are 158 of them.

['linux-vdso.so.1', 'libc.so.6', 'libpthread.so.0', 'libm.so.6',
'libdl.so.2', 'libutil.so.1', 'libgcc_s.so.1', 'libstdc++.so.6',
'libexpat.so.1', 'librt.so.1', 'libpng12.so.0', 'libcrypt.so.1',
'libffi.so.6', 'libresolv.so.2', 'libkeyutils.so.1', 'libcom_err.so.2',
'libp11-kit.so.0', 'libkrb5.so.26', 'libheimntlm.so.0', 'libtasn1.so.6',
'libheimbase.so.1', 'libgssapi.so.3', 'libroken.so.18', 'libhcrypto.so.4',
'libhogweed.so.4', 'libnettle.so.6', 'libhx509.so.5', 'libwind.so.0',
'libgnutls-deb0.so.28', 'libasn1.so.8', 'libgmp.so.10', 'libsasl2.so.2',
'libidn.so.11', 'librtmp.so.1', 'liblber-2.4.so.2', 'libldap_r-2.4.so.2',
'libXdmcp.so.6', 'libX11.so.6', 'libXau.so.6', 'libxcb.so.1',
'libgssapi_krb5.so.2', 'libkrb5.so.3', 'libk5crypto.so.3',
'libkrb5support.so.0', 'libicudata.so.55', 'libicuuc.so.55',
'libhdf5_serial.so.10', 'libcurl-gnutls.so.4', 'libhdf5_serial_hl.so.10',
'libtinfo.so.5', 'libgcrypt.so.20', 'libgpg-error.so.0', 'libnsl.so.1',
'libXext.so.6', 'libncursesw.so.5', 'libpanelw.so.5', 'libXrender.so.1',
'libjbig.so.0', 'libpcre.so.3', 'libglib-2.0.so.0',
'libnvidia-tls.so.352.41', 'libnvidia-glcore.so.352.41', 'libGL.so.1',
'libuuid.so.1', 'libSM.so.6', 'libICE.so.6', 'libgobject-2.0.so.0',
'libgfortran.so.1', 'liblzma.so.5', 'libXt.so.6', 'libgmodule-2.0.so.0',
'libXi.so.6', 'libgstpbutils-1.0.so.0', 'liborc-0.4.so.0',
'libgstreamer-1.0.so.0', 'libgsttag-1.0.so.0', 'libgstvideo-1.0.so.0',
'libxslt.so.1', 'libaudio.so.2', 'libjpeg.so.8', 'libgstaudio-1.0.so.0',
'libgstbase-1.0.so.0', 'libgstapp-1.0.so.0', 'libz.so.1',
'libgthread-2.0.so.0', 'libfreetype.so.6', 'libfontconfig.so.1',
'libdbus-1.so.3', 'libsystemd.so.0', 'libltdl.so.7', 'libGLU.so.1',
'libsqlite3.so.0', 'libpgm-5.1.so.0', 'libgomp.so.1', 'libxcb-render.so.0',
'libxcb-shm.so.0', 'libncurses.so.5', 'libxml2.so.2', 'libXss.so.1',
'libXft.so.2', 'libtk.so', 'libtcl.so', 'libasound.so.2',
'libharfbuzz.so.0', 'libpixman-1.so.0', 'libgio-2.0.so.0',
'libXinerama.so.1', 'libselinux.so.1', 'libXcomposite.so.1',
'libthai.so.0', 'libXdamage.so.1', 'libgdk-x11-2.0.so.0',
'libpangoft2-1.0.so.0', 'libcairo.so.2', 'libpangocairo-1.0.so.0',
'libdatrie.so.1', 'libatk-1.0.so.0', 'libXcursor.so.1', 'libXfixes.so.3',
'libgraphite2.so.3', 'libgdk_pixbuf-2.0.so.0', 'libgtk-x11-2.0.so.0',
'libquadmath.so.0', 'libpango-1.0.so.0', 'libXrandr.so.2',
'libgfortran.so.3', 'libjson-c.so.2', 'libshiboken-python2.7.so.1.1',
'libogg.so.0', 'libvorbis.so.0', 'libatlas.so.3', 'libcurl.so.4',
'libhdf5.so.9', 'libodbcinst.so.1', 'libpcap.so.0.9', 'libnetcdf.so.7',
'libblas.so.3', 'libpulse.so.0', 'libcaca.so.0', 'libgstreamer-0.10.so.0',
'libXxf86vm.so.1', 'libhdf5_hl.so.9', 'libpulse-simple.so.0',
'libasyncns.so.0', 'libwrap.so.0', 'libvorbisenc.so.2', 'libmagic.so.1',
'libssl.so.1.0.0', 'libFLAC.so.8', 'libSDL-1.2.so.0', 'libsndfile.so.1',
'libslang.so.2', 'libglapi.so.0', 'libaio.so.1',
'libgstinterfaces-0.10.so.0', 'libpulsecommon-6.0.so', 'libjpeg.so.62',
'libcrypto.so.1.0.0']


This list actually contains a fair number of false positives, so it would
need to be pruned manually. If you stare at it a little while, you might
see some libraries in there that you recognize that shouldn't be part of
the base system, like libatlas.so.3.

This gist https://gist.github.com/rmcgibbo/a13e7623c38ec54fcc93 contains
some more detailed data -- for each of libraries in the list above, it
gives a list of names of the packages that depend on this library. For
example, for libatlas.so.3, the there is only a single package which
depends on it, ["scikit-learn-0.11-np16py27_ce0"]. So, probably a bug.

"libgfortran.so.1" is also in the list. It's depended on by
["cvxopt-1.1.6-py27_0", "cvxopt-1.1.7-py27_0", "cvxopt-1.1.7-py34_0",
"cvxopt-1.1.7-py35_0", "numpy-1.5.1-py27_1", "numpy-1.5.1-py27_3",
"numpy-1.5.1-py27_4", "numpy-1.5.1-py27_ce0", "numpy-1.6.2-py27_1",
"numpy-1.6.2-py27_3", "numpy-1.6.2-py27_4", "numpy-1.6.2-py27_ce0",
"numpy-1.7.0-py27_0", "numpy-1.7.0b2-py27_ce0", "numpy-1.7.0rc1-py27_0",
"numpy-1.7.1-py27_0", "numpy-1.7.1-py27_2", "numpy-1.8.0-py27_0",
"numpy-1.8.1-py27_0", "numpy-1.8.1-py34_0", "numpy-1.8.2-py27_0",
"numpy-1.8.2-py34_0", "numpy-1.9.0-py27_0", "numpy-1.9.0-py34_0",
"numpy-1.9.1-py27_0", "numpy-1.9.1-py34_0", "numpy-1.9.2-py27_0",
"numpy-1.9.2-py34_0"].

Note that this list of numpy versions doesn't include the latest ones --
all of the numpy-1.10 binaries made by Continuum pick up libgfortan from a
conda package and don't depend on it being provided by the system. Also,
the final '_0' or '_1' segment of many of these package names is the build
number, which is to make a new release of the same release of a package,
usually because of a packaging problem. So many of these packages were
probably built incorrectly and superseded by new builds with a higher build
number.

So it's not perfect. But it might be a useful starting place.

-Robert
Julian Taylor
2016-01-09 12:20:38 UTC
Permalink
Post by Robert McGibbon
Hi all,
I went ahead and tried to collect a list of all of the libraries that
could be considered to constitute the "base" system for linux-64. The
strategy I used was to leverage off the work done by the folks at
Continuum by searching through their pre-compiled binaries
from https://repo.continuum.io/pkgs/free/linux-64/ to find shared
libraries that were dependened on (according to ldd) that were not
accounted for by the declared dependencies that each package made known
to the conda package manager.
do those packages use ld --as-needed for linking?
there are a lot libraries in that list that I highly doubt are directly
used by the packages.
Robert McGibbon
2016-01-09 12:29:13 UTC
Permalink
Post by Julian Taylor
do those packages use ld --as-needed for linking?
Is it possible to check this? I mean, there are over 7000 packages that I
check. I don't know how they were all built.

It's totally possible for many of them to be unused. A reasonably common
thing might be that packages use ctypes or dlopen to dynamically load
shared libraries that are actually just optional (and catch the error and
recover gracefully if the library can't be loaded).

-Robert
Post by Julian Taylor
Post by Robert McGibbon
Hi all,
I went ahead and tried to collect a list of all of the libraries that
could be considered to constitute the "base" system for linux-64. The
strategy I used was to leverage off the work done by the folks at
Continuum by searching through their pre-compiled binaries
from https://repo.continuum.io/pkgs/free/linux-64/ to find shared
libraries that were dependened on (according to ldd) that were not
accounted for by the declared dependencies that each package made known
to the conda package manager.
do those packages use ld --as-needed for linking?
there are a lot libraries in that list that I highly doubt are directly
used by the packages.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
David Cournapeau
2016-01-09 13:23:42 UTC
Permalink
On Sat, Jan 9, 2016 at 12:20 PM, Julian Taylor <
Post by Julian Taylor
Post by Robert McGibbon
Hi all,
I went ahead and tried to collect a list of all of the libraries that
could be considered to constitute the "base" system for linux-64. The
strategy I used was to leverage off the work done by the folks at
Continuum by searching through their pre-compiled binaries
from https://repo.continuum.io/pkgs/free/linux-64/ to find shared
libraries that were dependened on (according to ldd) that were not
accounted for by the declared dependencies that each package made known
to the conda package manager.
do those packages use ld --as-needed for linking?
there are a lot libraries in that list that I highly doubt are directly
used by the packages.
It is also a common problem when building packages without using a "clean"
build environment, as it is too easy to pick up dependencies accidentally,
especially for autotools-based packages (unless one uses pbuilder or
similar tools).

David
Nathaniel Smith
2016-01-09 23:04:44 UTC
Permalink
Post by Robert McGibbon
Hi all,
I went ahead and tried to collect a list of all of the libraries that could
be considered to constitute the "base" system for linux-64. The strategy I
used was to leverage off the work done by the folks at Continuum by
searching through their pre-compiled binaries from
https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that
were dependened on (according to ldd) that were not accounted for by the
declared dependencies that each package made known to the conda package
manager.
The full list of these system libraries, sorted in from
most-commonly-depend-on to rarest, is below. There are 158 of them.
[...]
Post by Robert McGibbon
So it's not perfect. But it might be a useful starting place.
Unfortunately, yeah, it looks like there's a lot of false positives in
here :-(. For example your list contains liblzma and libsqlite, but
both of these are shipped as dependencies of python itself. So
probably someone just forgot to declare the dependency explicitly, but
got away with it because the libraries were pulled in anyway.

Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration), and then erase from the list all
libraries that are shipped by this configuration (ignoring declared
dependencies since those seem to be unreliable)? It's better to be
conservative here, since the end goal is to come up with a list of
external libraries that we're confident have actually been tested for
compatibility by lots and lots of different users.

-n
--
Nathaniel J. Smith -- http://vorpus.org
Robert McGibbon
2016-01-10 00:42:48 UTC
Permalink
Post by Nathaniel Smith
Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration)

That's not a bad idea. I also have a couple other ideas about how to filter
this based on using debian popularity-contests and the package graph. I
will report back when I have more info.

-Robert
Post by Nathaniel Smith
Post by Robert McGibbon
Hi all,
I went ahead and tried to collect a list of all of the libraries that
could
Post by Robert McGibbon
be considered to constitute the "base" system for linux-64. The strategy
I
Post by Robert McGibbon
used was to leverage off the work done by the folks at Continuum by
searching through their pre-compiled binaries from
https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries
that
Post by Robert McGibbon
were dependened on (according to ldd) that were not accounted for by the
declared dependencies that each package made known to the conda package
manager.
The full list of these system libraries, sorted in from
most-commonly-depend-on to rarest, is below. There are 158 of them.
[...]
Post by Robert McGibbon
So it's not perfect. But it might be a useful starting place.
Unfortunately, yeah, it looks like there's a lot of false positives in
here :-(. For example your list contains liblzma and libsqlite, but
both of these are shipped as dependencies of python itself. So
probably someone just forgot to declare the dependency explicitly, but
got away with it because the libraries were pulled in anyway.
Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration), and then erase from the list all
libraries that are shipped by this configuration (ignoring declared
dependencies since those seem to be unreliable)? It's better to be
conservative here, since the end goal is to come up with a list of
external libraries that we're confident have actually been tested for
compatibility by lots and lots of different users.
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert McGibbon
2016-01-10 09:19:07 UTC
Permalink
Hi all,

I followed Nathaniel's advice and restricted the search down to the
packages included in the Anaconda release (as opposed to all of the
packages in their repositories), and fixed some technical issues with the
way I was doing the analysis.

The new list is much smaller. Here are the shared libraries that the
components of Anaconda require that the system provides on Linux 64:

libpanelw.so.5, libncursesw.so.5, libgcc_s.so.1, libstdc++.so.6, libm.so.6,
libdl.so.2, librt.so.1, libcrypt.so.1, libc.so.6, libnsl.so.1,
libutil.so.1, libpthread.so.0, libX11.so.6, libXext.so.6,
libgobject-2.0.so.0, libgthread-2.0.so.0, libglib-2.0.so.0,
libXrender.so.1, libICE.so.6, libSM.so.6, libGL.so.1.

Many of these libraries are required simply for the interpreter. The
remaining ones that aren't required by the interpreter are, but are
required by some other package in Anaconda are:

libgcc_s.so.1, libstdc++.so.6, libXext.so.6, libSM.so.6,
libgthread-2.0.so.0, libgobject-2.0.so.0, libglib-2.0.so.0, libICE.so.6,
libXrender.so.1, and libGL.so.1.

Most of these are parts of X11 required by Qt (
http://doc.qt.io/qt-5/linux-requirements.html).

-Robert
Post by Nathaniel Smith
Post by Nathaniel Smith
Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration)
That's not a bad idea. I also have a couple other ideas about how to filter
this based on using debian popularity-contests and the package graph. I
will report back when I have more info.
-Robert
Post by Nathaniel Smith
Post by Robert McGibbon
Hi all,
I went ahead and tried to collect a list of all of the libraries that
could
Post by Robert McGibbon
be considered to constitute the "base" system for linux-64. The
strategy I
Post by Robert McGibbon
used was to leverage off the work done by the folks at Continuum by
searching through their pre-compiled binaries from
https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries
that
Post by Robert McGibbon
were dependened on (according to ldd) that were not accounted for by
the
Post by Robert McGibbon
declared dependencies that each package made known to the conda package
manager.
The full list of these system libraries, sorted in from
most-commonly-depend-on to rarest, is below. There are 158 of them.
[...]
Post by Robert McGibbon
So it's not perfect. But it might be a useful starting place.
Unfortunately, yeah, it looks like there's a lot of false positives in
here :-(. For example your list contains liblzma and libsqlite, but
both of these are shipped as dependencies of python itself. So
probably someone just forgot to declare the dependency explicitly, but
got away with it because the libraries were pulled in anyway.
Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration), and then erase from the list all
libraries that are shipped by this configuration (ignoring declared
dependencies since those seem to be unreliable)? It's better to be
conservative here, since the end goal is to come up with a list of
external libraries that we're confident have actually been tested for
compatibility by lots and lots of different users.
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert McGibbon
2016-01-11 14:06:31 UTC
Permalink
I started working on a tool for checking linux wheels for "manylinux"
compatibility, and fixing them up if possible, based on the same ideas as
Matthew Brett's delocate <https://github.com/matthew-brett/delocate> for OS
X. Current WIP code, if anyone wants to help / throw penuts, is here:
https://github.com/rmcgibbo/deloc8.

It's currently fairly modest and can only list non-whitelisted external
shared library dependencies, and verify that sufficiently old versioned
symbols for glibc and its ilk are used.

-Robert
Post by Robert McGibbon
Hi all,
I followed Nathaniel's advice and restricted the search down to the
packages included in the Anaconda release (as opposed to all of the
packages in their repositories), and fixed some technical issues with the
way I was doing the analysis.
The new list is much smaller. Here are the shared libraries that the
libpanelw.so.5, libncursesw.so.5, libgcc_s.so.1, libstdc++.so.6,
libm.so.6, libdl.so.2, librt.so.1, libcrypt.so.1, libc.so.6, libnsl.so.1,
libutil.so.1, libpthread.so.0, libX11.so.6, libXext.so.6,
libgobject-2.0.so.0, libgthread-2.0.so.0, libglib-2.0.so.0,
libXrender.so.1, libICE.so.6, libSM.so.6, libGL.so.1.
Many of these libraries are required simply for the interpreter. The
remaining ones that aren't required by the interpreter are, but are
libgcc_s.so.1, libstdc++.so.6, libXext.so.6, libSM.so.6,
libgthread-2.0.so.0, libgobject-2.0.so.0, libglib-2.0.so.0, libICE.so.6,
libXrender.so.1, and libGL.so.1.
Most of these are parts of X11 required by Qt (
http://doc.qt.io/qt-5/linux-requirements.html).
-Robert
Post by Nathaniel Smith
Post by Nathaniel Smith
Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration)
That's not a bad idea. I also have a couple other ideas about how to filter
this based on using debian popularity-contests and the package graph. I
will report back when I have more info.
-Robert
Post by Nathaniel Smith
Post by Robert McGibbon
Hi all,
I went ahead and tried to collect a list of all of the libraries that
could
Post by Robert McGibbon
be considered to constitute the "base" system for linux-64. The
strategy I
Post by Robert McGibbon
used was to leverage off the work done by the folks at Continuum by
searching through their pre-compiled binaries from
https://repo.continuum.io/pkgs/free/linux-64/ to find shared
libraries that
Post by Robert McGibbon
were dependened on (according to ldd) that were not accounted for by
the
Post by Robert McGibbon
declared dependencies that each package made known to the conda package
manager.
The full list of these system libraries, sorted in from
most-commonly-depend-on to rarest, is below. There are 158 of them.
[...]
Post by Robert McGibbon
So it's not perfect. But it might be a useful starting place.
Unfortunately, yeah, it looks like there's a lot of false positives in
here :-(. For example your list contains liblzma and libsqlite, but
both of these are shipped as dependencies of python itself. So
probably someone just forgot to declare the dependency explicitly, but
got away with it because the libraries were pulled in anyway.
Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration), and then erase from the list all
libraries that are shipped by this configuration (ignoring declared
dependencies since those seem to be unreliable)? It's better to be
conservative here, since the end goal is to come up with a list of
external libraries that we're confident have actually been tested for
compatibility by lots and lots of different users.
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2016-01-12 03:24:43 UTC
Permalink
Post by Robert McGibbon
I started working on a tool for checking linux wheels for "manylinux"
compatibility, and fixing them up if possible, based on the same ideas as
Matthew Brett's delocate for OS X. Current WIP code, if anyone wants to help
/ throw penuts, is here: https://github.com/rmcgibbo/deloc8.
It's currently fairly modest and can only list non-whitelisted external
shared library dependencies, and verify that sufficiently old versioned
symbols for glibc and its ilk are used.
That is super cool! and also this week David C. @ Enthought
contributed the docker image that they use to actually make compatible
builds, so I guess we have some momentum; let's make this happen :-).
I just made a repo and a mailing list to continue the discussion...

https://github.com/manylinux/manylinux
https://groups.google.com/forum/#!forum/manylinux-discuss

-n
--
Nathaniel J. Smith -- http://vorpus.org
Loading...