Discussion:
[Numpy-discussion] Should I use pip install numpy in linux?
Yuxiang Wang
2016-01-08 02:18:45 UTC
Permalink
Dear all,

I know that in Windows, we should use either Christoph's package or
Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue
is solved, so should I directly used pip install numpy to get numpy
with a reasonable BLAS library?

Thanks!

Shawn
--
Yuxiang "Shawn" Wang
Gerling Haptics Lab
University of Virginia
***@virginia.edu
+1 (434) 284-0836
https://sites.google.com/a/virginia.edu/yw5aj/
Nathaniel Smith
2016-01-08 03:01:53 UTC
Permalink
Post by Yuxiang Wang
Dear all,
I know that in Windows, we should use either Christoph's package or
Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue
is solved, so should I directly used pip install numpy to get numpy
with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable
BLAS library will depend on whether you have the development files for
a reasonable BLAS library installed, and whether numpy's build system
is able to automatically locate them. Generally this means that if
you're on a regular distribution and remember to install a decent BLAS
-dev or -devel package, then you'll be fine.

On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to
ensure something reasonable happens.

Anaconda is also an option on linux if you want MKL (or openblas).

-n
--
Nathaniel J. Smith -- http://vorpus.org
Yuxiang Wang
2016-01-08 16:28:04 UTC
Permalink
Dear Nathaniel,

Gotcha. That's very helpful. Thank you so much!

Shawn
Post by Nathaniel Smith
Post by Yuxiang Wang
Dear all,
I know that in Windows, we should use either Christoph's package or
Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue
is solved, so should I directly used pip install numpy to get numpy
with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable
BLAS library will depend on whether you have the development files for
a reasonable BLAS library installed, and whether numpy's build system
is able to automatically locate them. Generally this means that if
you're on a regular distribution and remember to install a decent BLAS
-dev or -devel package, then you'll be fine.
On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to
ensure something reasonable happens.
Anaconda is also an option on linux if you want MKL (or openblas).
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Yuxiang "Shawn" Wang
Gerling Haptics Lab
University of Virginia
***@virginia.edu
+1 (434) 284-0836
https://sites.google.com/a/virginia.edu/yw5aj/
Matthew Brett
2016-01-08 17:12:19 UTC
Permalink
Hi,
Post by Yuxiang Wang
Dear Nathaniel,
Gotcha. That's very helpful. Thank you so much!
Shawn
Post by Nathaniel Smith
Post by Yuxiang Wang
Dear all,
I know that in Windows, we should use either Christoph's package or
Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue
is solved, so should I directly used pip install numpy to get numpy
with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable
BLAS library will depend on whether you have the development files for
a reasonable BLAS library installed, and whether numpy's build system
is able to automatically locate them. Generally this means that if
you're on a regular distribution and remember to install a decent BLAS
-dev or -devel package, then you'll be fine.
On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to
ensure something reasonable happens.
Anaconda is also an option on linux if you want MKL (or openblas).
I wrote a page on using pip with Debian / Ubuntu here :
https://matthew-brett.github.io/pydagogue/installing_on_debian.html

Cheers,

Matthew
Robert McGibbon
2016-01-08 19:07:32 UTC
Permalink
Does anyone know if there's been any movements with the PyPI folks on
allowing linux wheels to be uploaded?

I know you can never be certain what's provided by the distro, but it seems
like if Anaconda can solve the
cross-distro-binary-distribution-of-compiled-python-extensions problem,
there shouldn't be much technically different for Linux wheels.

-Robert
Post by Matthew Brett
Hi,
Post by Yuxiang Wang
Dear Nathaniel,
Gotcha. That's very helpful. Thank you so much!
Shawn
Post by Nathaniel Smith
Post by Yuxiang Wang
Dear all,
I know that in Windows, we should use either Christoph's package or
Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue
is solved, so should I directly used pip install numpy to get numpy
with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable
BLAS library will depend on whether you have the development files for
a reasonable BLAS library installed, and whether numpy's build system
is able to automatically locate them. Generally this means that if
you're on a regular distribution and remember to install a decent BLAS
-dev or -devel package, then you'll be fine.
On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to
ensure something reasonable happens.
Anaconda is also an option on linux if you want MKL (or openblas).
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Cheers,
Matthew
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Oscar Benjamin
2016-01-08 20:06:53 UTC
Permalink
Post by Robert McGibbon
Does anyone know if there's been any movements with the PyPI folks on
allowing linux wheels to be uploaded?
Post by Robert McGibbon
I know you can never be certain what's provided by the distro, but it
seems like if Anaconda can solve the
cross-distro-binary-distribution-of-compiled-python-extensions problem,
there shouldn't be much technically different for Linux wheels.
Post by Robert McGibbon
-Robert
Post by Matthew Brett
Hi,
Post by Yuxiang Wang
Dear Nathaniel,
Gotcha. That's very helpful. Thank you so much!
Shawn
Post by Nathaniel Smith
Post by Yuxiang Wang
Dear all,
I know that in Windows, we should use either Christoph's package or
Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue
is solved, so should I directly used pip install numpy to get numpy
with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable
BLAS library will depend on whether you have the development files for
a reasonable BLAS library installed, and whether numpy's build system
is able to automatically locate them. Generally this means that if
you're on a regular distribution and remember to install a decent BLAS
-dev or -devel package, then you'll be fine.
On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to
ensure something reasonable happens.
Anaconda is also an option on linux if you want MKL (or openblas).
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Cheers,
Matthew
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Oscar Benjamin
2016-01-08 20:12:06 UTC
Permalink
Post by Robert McGibbon
Does anyone know if there's been any movements with the PyPI folks on
allowing linux wheels to be uploaded?
Post by Robert McGibbon
I know you can never be certain what's provided by the distro, but it
seems like if Anaconda can solve the
cross-distro-binary-distribution-of-compiled-python-extensions problem,
there shouldn't be much technically different for Linux wheels.

Anaconda controls all of the dependent non-Python libraries which are
outside of the pip/pypi ecosystem. Pip/wheel doesn't have that option until
such libraries are packaged up for PyPI (e.g. pyopenblas).

--
Oscar
Robert McGibbon
2016-01-08 21:58:18 UTC
Permalink
Well, it's always possible to copy the dependencies like libopenblas.so
into the wheel and fix up the RPATHs, similar to the way the Windows wheels
work.

I'm not sure if this is the right path for numpy or not, but it seems like
something would be suitable for some projects with compiled extensions. But
it's categorically ruled out by the PyPI policy, IIUC.

Perhaps this is OT for this thread, and I should ask on distutils-sig.

-Robert
Post by Robert McGibbon
Post by Robert McGibbon
Does anyone know if there's been any movements with the PyPI folks on
allowing linux wheels to be uploaded?
Post by Robert McGibbon
I know you can never be certain what's provided by the distro, but it
seems like if Anaconda can solve the
cross-distro-binary-distribution-of-compiled-python-extensions problem,
there shouldn't be much technically different for Linux wheels.
Anaconda controls all of the dependent non-Python libraries which are
outside of the pip/pypi ecosystem. Pip/wheel doesn't have that option until
such libraries are packaged up for PyPI (e.g. pyopenblas).
--
Oscar
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Chris Barker
2016-01-08 23:27:31 UTC
Permalink
Post by Robert McGibbon
I'm not sure if this is the right path for numpy or not,
probably not -- AFAICT, the PyPa folks aren't interested in solving teh
problems we have in the scipy community -- we can tweak around the edges,
but we wont get there without a commitment to really solve the issues --
and if pip did that, it would essentially be conda -- non one wants to
re-impliment conda.
Post by Robert McGibbon
Perhaps this is OT for this thread, and I should ask on distutils-sig.
there has been a lot of discussion of this issue on the distutils-sig in
the last couple months (quiet lately). So yes, that's the place to go.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Matthew Brett
2016-01-08 23:50:31 UTC
Permalink
Hi,
Post by Chris Barker
Post by Robert McGibbon
I'm not sure if this is the right path for numpy or not,
probably not -- AFAICT, the PyPa folks aren't interested in solving teh
problems we have in the scipy community -- we can tweak around the edges,
but we wont get there without a commitment to really solve the issues -- and
if pip did that, it would essentially be conda -- non one wants to
re-impliment conda.
Well - as the OP was implying, it really should not be too difficult.

We (here in Berkeley) have discussed how to do this for Linux,
including (Nathaniel mainly) what would be sensible for pypi to do, in
terms of platform labels.

Both Anaconda and Canopy build on a base default Linux system so that
the built binaries will work on many Linux systems.

At the moment, Linux wheels have the platform tag of either linux_i686
(32-bit) or linux_x86_64 - example filenames:

numpy-1.9.2-cp27-none-linux_i686.whl
numpy-1.9.2-cp27-none-linux_x86_64.whl

Obviously these platform tags are rather useless, because they don't
tell you very much about whether this wheel will work on your own
system.

If we started building Linux wheels on a base system like that of
Anaconda or Canopy we might like another platform tag that tells you
that this wheel is compatible with a wide range of systems. So the
job of negotiating with distutils-sig is trying to find a good name
for this base system - we thought that 'manylinux' was a good one -
and then put in a pull request to pip to recognize 'manylinux' as
compatible when running pip install from a range of Linux systems.

Cheers,

Matthew
Robert McGibbon
2016-01-09 00:31:12 UTC
Permalink
Post by Matthew Brett
Both Anaconda and Canopy build on a base default Linux system so that
the built binaries will work on many Linux systems.
I think the base linux system is CentOS 5, and from my experience, it
seems like
this approach
has worked very well. Those packages are compatible with all essentially
all Linuxes that are
more recent than CentOS 5 (which is ancient). I have not heard of anyone
complaining that the
packages they install through conda don't work on their CentOS 4 or Ubuntu
6.06 box. I assume
Python / pip is probably used on a wider diversity of linux flavors than
conda is, so I'm sure that
binaries built on CentOS 5 won't work for absolutely _every_ linux user,
but it does seem to
cover the substantial majority of linux users.

Building redistributable linux binaries that work across a large number of
distros and distro
versions is definitely tricky. If you run ``python setup.py bdist_wheel``
on your Fedora Rawhide
box, you can't really expect the wheel to work for too many other linux
users. So given that, I
can see why PyPI would want to be careful about accepting Linux wheels.

But it seems like, if they make the upload something like

```
twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \
--yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing
```

that this would potentially be able to let packages like numpy serve their
linux
users better without risking too much junk being uploaded to PyPI.

-Robert
Post by Matthew Brett
Hi,
Post by Chris Barker
Post by Robert McGibbon
I'm not sure if this is the right path for numpy or not,
probably not -- AFAICT, the PyPa folks aren't interested in solving teh
problems we have in the scipy community -- we can tweak around the edges,
but we wont get there without a commitment to really solve the issues --
and
Post by Chris Barker
if pip did that, it would essentially be conda -- non one wants to
re-impliment conda.
Well - as the OP was implying, it really should not be too difficult.
We (here in Berkeley) have discussed how to do this for Linux,
including (Nathaniel mainly) what would be sensible for pypi to do, in
terms of platform labels.
Both Anaconda and Canopy build on a base default Linux system so that
the built binaries will work on many Linux systems.
At the moment, Linux wheels have the platform tag of either linux_i686
numpy-1.9.2-cp27-none-linux_i686.whl
numpy-1.9.2-cp27-none-linux_x86_64.whl
Obviously these platform tags are rather useless, because they don't
tell you very much about whether this wheel will work on your own
system.
If we started building Linux wheels on a base system like that of
Anaconda or Canopy we might like another platform tag that tells you
that this wheel is compatible with a wide range of systems. So the
job of negotiating with distutils-sig is trying to find a good name
for this base system - we thought that 'manylinux' was a good one -
and then put in a pull request to pip to recognize 'manylinux' as
compatible when running pip install from a range of Linux systems.
Cheers,
Matthew
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Matthew Brett
2016-01-09 00:36:32 UTC
Permalink
Post by Matthew Brett
Both Anaconda and Canopy build on a base default Linux system so that
the built binaries will work on many Linux systems.
I think the base linux system is CentOS 5, and from my experience, it seems
like this approach
has worked very well. Those packages are compatible with all essentially all
Linuxes that are
more recent than CentOS 5 (which is ancient). I have not heard of anyone
complaining that the
packages they install through conda don't work on their CentOS 4 or Ubuntu
6.06 box. I assume
Python / pip is probably used on a wider diversity of linux flavors than
conda is, so I'm sure that
binaries built on CentOS 5 won't work for absolutely _every_ linux user, but
it does seem to
cover the substantial majority of linux users.
Building redistributable linux binaries that work across a large number of
distros and distro
versions is definitely tricky. If you run ``python setup.py bdist_wheel`` on
your Fedora Rawhide
box, you can't really expect the wheel to work for too many other linux
users. So given that, I
can see why PyPI would want to be careful about accepting Linux wheels.
But it seems like, if they make the upload something like
```
twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \
--yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing
```
that this would potentially be able to let packages like numpy serve their
linux
users better without risking too much junk being uploaded to PyPI.
I could well understand it if the pypa folks thought that was a bad
idea. There are so many Linux distributions and therefore so many
ways for this to go wrong, that the most likely outcome would be a
relentless flood of people saying "ouch this doesn't work for me", and
therefore decreased trust for pip / Linux in general.

On the other hand, having a base build system and matching platform
tag seems like it is well within reach, and, if we provided proof of
concept, I guess that pypa would agree.

Cheers,

Matthew
Nathaniel Smith
2016-01-09 03:13:56 UTC
Permalink
Post by Matthew Brett
Both Anaconda and Canopy build on a base default Linux system so that
the built binaries will work on many Linux systems.
I think the base linux system is CentOS 5, and from my experience, it seems
like this approach
has worked very well. Those packages are compatible with all essentially all
Linuxes that are
more recent than CentOS 5 (which is ancient). I have not heard of anyone
complaining that the
packages they install through conda don't work on their CentOS 4 or Ubuntu
6.06 box.
Right. There's a small problem which is that the base linux system
isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries
that you're allowed to link to: ...", where that list is empirically
chosen to include only stuff that really is installed on ~all linux
machines and for which the ABI really has been stable in practice over
multiple years and distros (so e.g. no OpenSSL).

So the key next step is for someone to figure out and write down that
list. Continuum and Enthought both have versions of it that we know
are good...

Does anyone know who maintains Anaconda's linux build environment?
I assume
Python / pip is probably used on a wider diversity of linux flavors than
conda is, so I'm sure that
binaries built on CentOS 5 won't work for absolutely _every_ linux user, but
it does seem to
cover the substantial majority of linux users.
Building redistributable linux binaries that work across a large number of
distros and distro
versions is definitely tricky. If you run ``python setup.py bdist_wheel`` on
your Fedora Rawhide
box, you can't really expect the wheel to work for too many other linux
users. So given that, I
can see why PyPI would want to be careful about accepting Linux wheels.
But it seems like, if they make the upload something like
```
twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \
--yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing
```
that this would potentially be able to let packages like numpy serve their
linux
users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably
get them to accept a PEP saying "here's a new well-specified platform
tag that means that this wheel works on all linux systems meet the
following list of criteria: ...", and then allow that new platform tag
onto PyPI.

-n
--
Nathaniel J. Smith -- http://vorpus.org
Nathan Goldbaum
2016-01-09 03:17:51 UTC
Permalink
Doesn't building on CentOS 5 also mean using a quite old version of gcc?

I've never tested this, but I've seen claims on the anaconda mailing list
of ~25% slowdowns compared to building from source or using system
packages, which was attributed to building using an older gcc that doesn't
optimize as well as newer versions.
Post by Nathaniel Smith
Post by Robert McGibbon
Post by Matthew Brett
Both Anaconda and Canopy build on a base default Linux system so that
the built binaries will work on many Linux systems.
I think the base linux system is CentOS 5, and from my experience, it
seems
Post by Robert McGibbon
like this approach
has worked very well. Those packages are compatible with all essentially
all
Post by Robert McGibbon
Linuxes that are
more recent than CentOS 5 (which is ancient). I have not heard of anyone
complaining that the
packages they install through conda don't work on their CentOS 4 or
Ubuntu
Post by Robert McGibbon
6.06 box.
Right. There's a small problem which is that the base linux system
isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries
that you're allowed to link to: ...", where that list is empirically
chosen to include only stuff that really is installed on ~all linux
machines and for which the ABI really has been stable in practice over
multiple years and distros (so e.g. no OpenSSL).
So the key next step is for someone to figure out and write down that
list. Continuum and Enthought both have versions of it that we know
are good...
Does anyone know who maintains Anaconda's linux build environment?
Post by Robert McGibbon
I assume
Python / pip is probably used on a wider diversity of linux flavors than
conda is, so I'm sure that
binaries built on CentOS 5 won't work for absolutely _every_ linux user,
but
Post by Robert McGibbon
it does seem to
cover the substantial majority of linux users.
Building redistributable linux binaries that work across a large number
of
Post by Robert McGibbon
distros and distro
versions is definitely tricky. If you run ``python setup.py
bdist_wheel`` on
Post by Robert McGibbon
your Fedora Rawhide
box, you can't really expect the wheel to work for too many other linux
users. So given that, I
can see why PyPI would want to be careful about accepting Linux wheels.
But it seems like, if they make the upload something like
```
twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \
--yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing
```
that this would potentially be able to let packages like numpy serve
their
Post by Robert McGibbon
linux
users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably
get them to accept a PEP saying "here's a new well-specified platform
tag that means that this wheel works on all linux systems meet the
following list of criteria: ...", and then allow that new platform tag
onto PyPI.
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2016-01-09 03:38:19 UTC
Permalink
Post by Nathan Goldbaum
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
4.8 by using the Redhat Developer Toolset release (which is gcc +
special backport libraries to let it generate RHEL5/CentOS5-compatible
binaries). (I might have one or both of those version numbers slightly
wrong.)
Post by Nathan Goldbaum
I've never tested this, but I've seen claims on the anaconda mailing list of
~25% slowdowns compared to building from source or using system packages,
which was attributed to building using an older gcc that doesn't optimize as
well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as
opposed to a 25% slowdown on some particular inner loop that happened
to neatly match some new feature in a new gcc (e.g. something where
the new autovectorizer kicked in). But yeah, in general this is just
an inevitable trade-off when it comes to distributing binaries: you're
always going to pay some penalty for achieving broad compatibility as
compared to artisanally hand-tuned binaries specialized for your
machine's exact OS version, processor, etc. Not much to be done,
really. At some point the baseline for compatibility will switch to
"compile everything on CentOS 6", and that will be better but it will
still be worse than binaries that target CentOS 7, and so on and so
forth.

-n
--
Nathaniel J. Smith -- http://vorpus.org
Robert McGibbon
2016-01-09 03:41:26 UTC
Permalink
Post by Nathan Goldbaum
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
I have had pretty good luck using the (awesomely named) Holy Build Box
<http://phusion.github.io/holy-build-box/>, which is a CentOS 5 docker
image with a newer gcc version installed (but I guess the same old libc).
I'm not 100% sure how it works, but it's quite nice. For example, you can
use c++11 and still keep all the binary compatibility benefits of CentOS 5.

-Robert
Post by Nathan Goldbaum
Post by Nathan Goldbaum
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
4.8 by using the Redhat Developer Toolset release (which is gcc +
special backport libraries to let it generate RHEL5/CentOS5-compatible
binaries). (I might have one or both of those version numbers slightly
wrong.)
Post by Nathan Goldbaum
I've never tested this, but I've seen claims on the anaconda mailing
list of
Post by Nathan Goldbaum
~25% slowdowns compared to building from source or using system packages,
which was attributed to building using an older gcc that doesn't
optimize as
Post by Nathan Goldbaum
well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as
opposed to a 25% slowdown on some particular inner loop that happened
to neatly match some new feature in a new gcc (e.g. something where
the new autovectorizer kicked in). But yeah, in general this is just
an inevitable trade-off when it comes to distributing binaries: you're
always going to pay some penalty for achieving broad compatibility as
compared to artisanally hand-tuned binaries specialized for your
machine's exact OS version, processor, etc. Not much to be done,
really. At some point the baseline for compatibility will switch to
"compile everything on CentOS 6", and that will be better but it will
still be worse than binaries that target CentOS 7, and so on and so
forth.
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2016-01-09 04:03:44 UTC
Permalink
Post by Nathan Goldbaum
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
I have had pretty good luck using the (awesomely named) Holy Build Box,
which is a CentOS 5 docker image with a newer gcc version installed (but I
guess the same old libc). I'm not 100% sure how it works, but it's quite
nice. For example, you can use c++11 and still keep all the binary
compatibility benefits of CentOS 5.
They say they have gcc 4.8:
https://github.com/phusion/holy-build-box#isolated-build-environment-based-on-docker-and-centos-5
so I bet they're using RH's devtools gcc. This means that it works via
the labor of some unsung programmers at RH who went through all the
library changes between gcc 4.4 and 4.8, and put together a version of
4.8 that for every important symbol knows whether it's available in
the old 4.4 libraries or not; for the ones that are, it dynamically
links them; for the ones that aren't, it has a special static library
that it pulls them out of. Like sewer cleaning, it's the kind of very
impressive, incredibly valuable infrastructure work that I'm really
glad someone does. Someone else who's not me...

Continuum and Enthought both have a whole list of packages beyond
glibc that are safe enough to link to, including a bunch of ones that
would be big pains to statically link everywhere (libX11, etc.).
That's the useful piece of information that goes beyond just CentOS5 +
RH devtools + static linking -- can't tell of the "Holy Build Box" has
anything like that.

-n
--
Nathaniel J. Smith -- http://vorpus.org
Robert McGibbon
2016-01-09 04:08:01 UTC
Permalink
Post by Nathaniel Smith
Continuum and Enthought both have a whole list of packages beyond
glibc that are safe enough to link to, including a bunch of ones that
would be big pains to statically link everywhere (libX11, etc.).
That's the useful piece of information that goes beyond just CentOS5 +
RH devtools + static linking -- can't tell of the "Holy Build Box" has
anything like that.

Probably-crazy Idea: One could reconstruct that list by downloading all of
https://repo.continuum.io/pkgs/free/linux-64/, untarring everything, and
running `ldd` on all of the binaries and .so files. Can't be that hard...
right?


-Robert
Post by Nathaniel Smith
Post by Nathan Goldbaum
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
I have had pretty good luck using the (awesomely named) Holy Build Box,
which is a CentOS 5 docker image with a newer gcc version installed (but
I
guess the same old libc). I'm not 100% sure how it works, but it's quite
nice. For example, you can use c++11 and still keep all the binary
compatibility benefits of CentOS 5.
https://github.com/phusion/holy-build-box#isolated-build-environment-based-on-docker-and-centos-5
so I bet they're using RH's devtools gcc. This means that it works via
the labor of some unsung programmers at RH who went through all the
library changes between gcc 4.4 and 4.8, and put together a version of
4.8 that for every important symbol knows whether it's available in
the old 4.4 libraries or not; for the ones that are, it dynamically
links them; for the ones that aren't, it has a special static library
that it pulls them out of. Like sewer cleaning, it's the kind of very
impressive, incredibly valuable infrastructure work that I'm really
glad someone does. Someone else who's not me...
Continuum and Enthought both have a whole list of packages beyond
glibc that are safe enough to link to, including a bunch of ones that
would be big pains to statically link everywhere (libX11, etc.).
That's the useful piece of information that goes beyond just CentOS5 +
RH devtools + static linking -- can't tell of the "Holy Build Box" has
anything like that.
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2016-01-09 04:19:29 UTC
Permalink
Post by Nathaniel Smith
Post by Nathaniel Smith
Continuum and Enthought both have a whole list of packages beyond
glibc that are safe enough to link to, including a bunch of ones that
would be big pains to statically link everywhere (libX11, etc.).
That's the useful piece of information that goes beyond just CentOS5 +
RH devtools + static linking -- can't tell of the "Holy Build Box" has
anything like that.
Probably-crazy Idea: One could reconstruct that list by downloading all of
https://repo.continuum.io/pkgs/free/linux-64/, untarring everything, and
running `ldd` on all of the binaries and .so files. Can't be that hard...
right?

You'd have to be slightly careful to not count libraries that they ship
themselves, and then use the resulting list to recreate a build environment
that contains those libraries (using a dockerfile, I guess). But yeah,
should work. Are you feeling inspired? :-)

-n
Julian Taylor
2016-01-09 12:12:11 UTC
Permalink
Post by Nathaniel Smith
Post by Nathan Goldbaum
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
4.8 by using the Redhat Developer Toolset release (which is gcc +
special backport libraries to let it generate RHEL5/CentOS5-compatible
binaries). (I might have one or both of those version numbers slightly
wrong.)
Post by Nathan Goldbaum
I've never tested this, but I've seen claims on the anaconda mailing list of
~25% slowdowns compared to building from source or using system packages,
which was attributed to building using an older gcc that doesn't optimize as
well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as
opposed to a 25% slowdown on some particular inner loop that happened
to neatly match some new feature in a new gcc (e.g. something where
the new autovectorizer kicked in). But yeah, in general this is just
an inevitable trade-off when it comes to distributing binaries: you're
always going to pay some penalty for achieving broad compatibility as
compared to artisanally hand-tuned binaries specialized for your
machine's exact OS version, processor, etc. Not much to be done,
really. At some point the baseline for compatibility will switch to
"compile everything on CentOS 6", and that will be better but it will
still be worse than binaries that target CentOS 7, and so on and so
forth.
I have over the years put in one gcc specific optimization after the
other so yes using an ancient version will make many parts significantly
slower. Though that is not really a problem, updating a compiler is easy
even without redhats devtoolset.

At least as far as numpy is concerned linux binaries should not be a
very big problem. The only dependency where the version matters is glibc
which has updated its interfaces we use (in a backward compatible way)
many times.
But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu
10.04) we are fine at reasonable performance costs, basically only
slower memcpy.

Scipy on the other hand is a larger problem as it contains C++ code.
Linux systems are now transitioning to C++11 which is binary
incompatible in parts to the old standard. There a lot of testing is
necessary to check if we are affected.
How does Anaconda deal with C++11?
David Cournapeau
2016-01-09 13:48:40 UTC
Permalink
On Sat, Jan 9, 2016 at 12:12 PM, Julian Taylor <
Post by Nathaniel Smith
Post by Nathaniel Smith
Post by Nathan Goldbaum
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
4.8 by using the Redhat Developer Toolset release (which is gcc +
special backport libraries to let it generate RHEL5/CentOS5-compatible
binaries). (I might have one or both of those version numbers slightly
wrong.)
Post by Nathan Goldbaum
I've never tested this, but I've seen claims on the anaconda mailing
list of
Post by Nathaniel Smith
Post by Nathan Goldbaum
~25% slowdowns compared to building from source or using system
packages,
Post by Nathaniel Smith
Post by Nathan Goldbaum
which was attributed to building using an older gcc that doesn't
optimize as
Post by Nathaniel Smith
Post by Nathan Goldbaum
well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as
opposed to a 25% slowdown on some particular inner loop that happened
to neatly match some new feature in a new gcc (e.g. something where
the new autovectorizer kicked in). But yeah, in general this is just
an inevitable trade-off when it comes to distributing binaries: you're
always going to pay some penalty for achieving broad compatibility as
compared to artisanally hand-tuned binaries specialized for your
machine's exact OS version, processor, etc. Not much to be done,
really. At some point the baseline for compatibility will switch to
"compile everything on CentOS 6", and that will be better but it will
still be worse than binaries that target CentOS 7, and so on and so
forth.
I have over the years put in one gcc specific optimization after the
other so yes using an ancient version will make many parts significantly
slower. Though that is not really a problem, updating a compiler is easy
even without redhats devtoolset.
At least as far as numpy is concerned linux binaries should not be a
very big problem. The only dependency where the version matters is glibc
which has updated its interfaces we use (in a backward compatible way)
many times.
But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu
10.04) we are fine at reasonable performance costs, basically only
slower memcpy.
Scipy on the other hand is a larger problem as it contains C++ code.
Linux systems are now transitioning to C++11 which is binary
incompatible in parts to the old standard. There a lot of testing is
necessary to check if we are affected.
How does Anaconda deal with C++11?
For canopy packages, we use the RH devtoolset w/ gcc 4.8.X, and statically
link the C++ stdlib.

It has worked so far for the few packages requiring C++11 and gcc > 4.4
(llvm/llvmlite/dynd), but that's not a solution I am a fan of myself, as
the implications are not always very clear.

David
Nathaniel Smith
2016-01-09 21:49:29 UTC
Permalink
Post by Julian Taylor
Post by Nathaniel Smith
Post by Nathan Goldbaum
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
4.8 by using the Redhat Developer Toolset release (which is gcc +
special backport libraries to let it generate RHEL5/CentOS5-compatible
binaries). (I might have one or both of those version numbers slightly
wrong.)
Post by Nathan Goldbaum
I've never tested this, but I've seen claims on the anaconda mailing list of
~25% slowdowns compared to building from source or using system packages,
which was attributed to building using an older gcc that doesn't optimize as
well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as
opposed to a 25% slowdown on some particular inner loop that happened
to neatly match some new feature in a new gcc (e.g. something where
the new autovectorizer kicked in). But yeah, in general this is just
an inevitable trade-off when it comes to distributing binaries: you're
always going to pay some penalty for achieving broad compatibility as
compared to artisanally hand-tuned binaries specialized for your
machine's exact OS version, processor, etc. Not much to be done,
really. At some point the baseline for compatibility will switch to
"compile everything on CentOS 6", and that will be better but it will
still be worse than binaries that target CentOS 7, and so on and so
forth.
I have over the years put in one gcc specific optimization after the
other so yes using an ancient version will make many parts significantly
slower. Though that is not really a problem, updating a compiler is easy
even without redhats devtoolset.
At least as far as numpy is concerned linux binaries should not be a
very big problem. The only dependency where the version matters is glibc
which has updated its interfaces we use (in a backward compatible way)
many times.
But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu
10.04) we are fine at reasonable performance costs, basically only
slower memcpy.
Are you saying that it's easy to use, say, gcc 5.3's C compiler to produce
binaries that will run on an out-of-the-box centos 5 install? I assumed
that there'd be issues with things like new symbol versions in libgcc, not
just glibc, but if not then that would be great...
Post by Julian Taylor
Scipy on the other hand is a larger problem as it contains C++ code.
Linux systems are now transitioning to C++11 which is binary
incompatible in parts to the old standard. There a lot of testing is
necessary to check if we are affected.
How does Anaconda deal with C++11?
IIUC the situation with the C++ stdlib changes in gcc 5 is that old
binaries will continue to work on new systems. The only thing that breaks
is that if two libraries want to pass objects of the affected types back
and forth (e.g. std::string), then either they both need to be compiled
with the old abi or they both need to be compiled with the new abi. (And
when using a new compiler it's still possible to choose the old abi with a
#define; old compilers of course only support the old abi.)

See: http://developerblog.redhat.com/2015/02/05/gcc5-and-the-c11-abi/

So the answer is that most python packages don't care, because even the
ones written in C++ don't generally talk C++ across package boundaries, and
for the ones that do care then the people making the binary packages will
have to coordinate to use the same abi. And for local builds on modern
systems that link against binary packages built using the old abi, people
might have to use -D_GLIBCXX_USE_CXX11_ABI=0.

-n
Tony Kelman
2016-01-11 02:36:57 UTC
Permalink
Post by Nathaniel Smith
Right. There's a small problem which is that the base linux system
isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries
that you're allowed to link to: ...", where that list is empirically
chosen to include only stuff that really is installed on ~all linux
machines and for which the ABI really has been stable in practice over
multiple years and distros (so e.g. no OpenSSL).
So the key next step is for someone to figure out and write down that
list. Continuum and Enthought both have versions of it that we know
are good...
Does anyone know who maintains Anaconda's linux build environment?
I strongly suspect it was originally set up by Aaron Meurer. Who
maintains it now that he is no longer at Continuum is a good question.

We build "generic Linux binaries" for Julia and I co-maintain that
environment. It's using CentOS 5 (for now, until we hit an issue that
is easiest to fix by upgrading the builders to CentOS 6 - I don't think
we have any real users on CentOS 5 any more so no one would notice the
bump), but we avoid using the Red Hat devtoolset. We tried it initially,
but had issues due to the way they attempt to statically link libstdc++
and libgfortran. The -static-libgfortran GCC flag has been broken since
GCC 4.5 because libgfortran now depends on libquadmath, and it requires
rather messy modification of every link line to change -lgfortran to an
explicit absolute path to libgfortran.a (and possibly also libquadmath.a)
to really get libraries like BLAS and LAPACK statically linked. And if
you want to statically link the GCC runtime libraries into a .so, your
distribution's copies of libstdc++.a, libgfortran.a, etc must have been
built with -fPIC, which many are not.

Some of the issues where this was discussed and worked out were:
https://github.com/JuliaLang/julia/issues/8397
https://github.com/JuliaLang/julia/issues/8433
https://github.com/JuliaLang/julia/pull/8442
https://github.com/JuliaLang/julia/pull/10043

So for Julia we wound up at building our own copy of latest GCC from
source on CentOS 5 buildbots, and shipping private shared-library copies
of libgfortran, libstdc++, libgcc_s, and a few others. This works on
pretty much any glibc-using Linux distribution as new or newer than the
buildbot. It sadly doesn't work on musl distributions, ah well. Rust has
been experimenting with truly static libraries/executables that statically
link musl libc in addition to everything else, I'm not sure how practical
that would be for numpy, scipy, etc.

If you go this route of building gcc from source and depending on private
copies of the shared runtime libraries, it ends up important that you
pick a newer version of gcc than any of your users. The reason is for
packages that require compilation on the user's machine. If you build and
distribute a set of shared libraries using GCC 4.8 but then someone on
Ubuntu 15.10 tries to build something else (say as an example, pyipopt
which needs a library that uses both C++ and Fortran) from source using
GCC 5.2, the GCC 4.8 runtime libraries that you built against will get
loaded first, and they won't contain newer-ABI symbols needed by the GCC
5.2-built user library. Sometimes this can be fixed by just deleting the
older bundled copies of the runtime libraries and relying on the users'
system copies, but for example libgfortran typically needs to be manually
installed first.
Robert McGibbon
2016-01-11 03:14:49 UTC
Permalink
Post by Tony Kelman
Post by Nathaniel Smith
Right. There's a small problem which is that the base linux system
isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries
that you're allowed to link to: ...", where that list is empirically
chosen to include only stuff that really is installed on ~all linux
machines and for which the ABI really has been stable in practice over
multiple years and distros (so e.g. no OpenSSL).
Does anyone know who maintains Anaconda's linux build environment?
I strongly suspect it was originally set up by Aaron Meurer. Who
maintains it now that he is no longer at Continuum is a good question.

From looking at all of the external libraries referenced by binaries
included in Anaconda
and the conda repos, I am not confident that they have a totally strict
policy here, or at least
not ones that is enforced by tooling. The sonames I listed here
<https://mail.scipy.org/pipermail/numpy-discussion/2016-January/074602.html>
cover
all of the external
dependencies used by the latest Anaconda release, but earlier releases and
other
conda-installable packages from the default channel are not so strict.

-Robert
Travis Oliphant
2016-01-11 14:41:46 UTC
Permalink
Anaconda "build environment" was setup by Ilan and me. Aaron helped to
build packages while he was at Continuum but spent most of his time on the
open-source conda project.

It is important to understand the difference between Anaconda and conda in
this respect. Anaconda is a particular dependency foundation that
Continuum supports and releases -- it will have a particular set of
expected libraries on each platform (we work to keep this fairly limited
and on Linux currently use CentOS 5 as the base).

conda is a general package manager that is open-source and that anyone can
use to produce a set of consistent binaries (there can be many conda-based
distributions). It solves the problem of multiple binary dependency
chains generally using the concept of "features". This concept of
"features" allows you to create environments with different base
dependencies.

What packages you install when you "conda install" depends on which
channels you are pointing to and which features you have "turned on" in the
environment. It's a general system that extends the notions that were
started by the PyPA.

-Travis
Post by Tony Kelman
Post by Tony Kelman
Post by Nathaniel Smith
Right. There's a small problem which is that the base linux system
isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries
that you're allowed to link to: ...", where that list is empirically
chosen to include only stuff that really is installed on ~all linux
machines and for which the ABI really has been stable in practice over
multiple years and distros (so e.g. no OpenSSL).
Does anyone know who maintains Anaconda's linux build environment?
I strongly suspect it was originally set up by Aaron Meurer. Who
maintains it now that he is no longer at Continuum is a good question.
From looking at all of the external libraries referenced by binaries
included in Anaconda
and the conda repos, I am not confident that they have a totally strict
policy here, or at least
not ones that is enforced by tooling. The sonames I listed here
<https://mail.scipy.org/pipermail/numpy-discussion/2016-January/074602.html> cover
all of the external
dependencies used by the latest Anaconda release, but earlier releases and
other
conda-installable packages from the default channel are not so strict.
-Robert
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
*Travis Oliphant*
*Co-founder and CEO*


@teoliphant
512-222-5440
http://www.continuum.io
Chris Barker
2016-01-11 18:25:05 UTC
Permalink
Post by Nathaniel Smith
Post by Robert McGibbon
that this would potentially be able to let packages like numpy serve
their
Post by Robert McGibbon
linux
users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably
get them to accept a PEP saying "here's a new well-specified platform
tag that means that this wheel works on all linux systems meet the
following list of criteria: ...", and then allow that new platform tag
onto PyPI.
The second step is a trick though -- how does pip know, when being run on a
client, that the system meets those requirements? Do we put a bunch of code
in that checks for those libs, etc???

If we get all that worked out, we still haven't made any progress toward
the non-standard libs that aren't python. This is the big "scipy problem"
-- fortran, BLAS, hdf, ad infinitum.

I argued for years that we could build binary wheels that hold each of
these, and other python packages could depend on them, but pypa never
seemed to like that idea. In the end, if you did all this right, you'd have
something like conda -- so why not just use conda?

All that being said, if you folks can get the core scipy stack setup to pip
install on OS_X, Windows, and Linux, that would be pretty nice -- so keep
at it !

-CHB
Post by Nathaniel Smith
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Benjamin Root
2016-01-11 18:35:29 UTC
Permalink
The other half of the fun is how to deal with weird binary issues with
libraries like libgeos_c, libhdf5 and such. You have to get all of the
compile options right for your build of those libraries to get your build
of GDAL and pyhdf working right. You also have packages like gdal and
netcdf4 have diamond dependencies -- not only are they built and linked
against numpy binaries, some of their binary dependencies are built against
numpy binaries as well. Joys!

I don't envy anybody that tries to take on the packaging problem in any
language.

Ben Root
Post by Chris Barker
Post by Nathaniel Smith
Post by Robert McGibbon
that this would potentially be able to let packages like numpy serve
their
Post by Robert McGibbon
linux
users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably
get them to accept a PEP saying "here's a new well-specified platform
tag that means that this wheel works on all linux systems meet the
following list of criteria: ...", and then allow that new platform tag
onto PyPI.
The second step is a trick though -- how does pip know, when being run on
a client, that the system meets those requirements? Do we put a bunch of
code in that checks for those libs, etc???
If we get all that worked out, we still haven't made any progress toward
the non-standard libs that aren't python. This is the big "scipy problem"
-- fortran, BLAS, hdf, ad infinitum.
I argued for years that we could build binary wheels that hold each of
these, and other python packages could depend on them, but pypa never
seemed to like that idea. In the end, if you did all this right, you'd have
something like conda -- so why not just use conda?
All that being said, if you folks can get the core scipy stack setup to
pip install on OS_X, Windows, and Linux, that would be pretty nice -- so
keep at it !
-CHB
Post by Nathaniel Smith
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
David Cournapeau
2016-01-11 19:02:46 UTC
Permalink
Post by Chris Barker
Post by Nathaniel Smith
Post by Robert McGibbon
that this would potentially be able to let packages like numpy serve
their
Post by Robert McGibbon
linux
users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably
get them to accept a PEP saying "here's a new well-specified platform
tag that means that this wheel works on all linux systems meet the
following list of criteria: ...", and then allow that new platform tag
onto PyPI.
The second step is a trick though -- how does pip know, when being run on
a client, that the system meets those requirements? Do we put a bunch of
code in that checks for those libs, etc???
You could make that option an opt-in at first, and gradually autodetect it
for the main distros.
Post by Chris Barker
If we get all that worked out, we still haven't made any progress toward
the non-standard libs that aren't python. This is the big "scipy problem"
-- fortran, BLAS, hdf, ad infinitum.
I argued for years that we could build binary wheels that hold each of
these, and other python packages could depend on them, but pypa never
seemed to like that idea.
I don't think that's an accurate statement. There are issues to solve
around this, but I did not encounter push back, either on the ML or face to
face w/ various pypa members at Pycon, etc... There may be push backs for a
particular detail, but making "pip install scipy" or "pip install
matplotlib" a reality on every platform is something everybody agrees o
Chris Barker
2016-01-11 23:53:39 UTC
Permalink
Post by Chris Barker
If we get all that worked out, we still haven't made any progress toward
Post by Chris Barker
the non-standard libs that aren't python. This is the big "scipy problem"
-- fortran, BLAS, hdf, ad infinitum.
I argued for years that we could build binary wheels that hold each of
these, and other python packages could depend on them, but pypa never
seemed to like that idea.
I don't think that's an accurate statement. There are issues to solve
around this, but I did not encounter push back, either on the ML or face to
face w/ various pypa members at Pycon, etc... There may be push backs for a
particular detail, but making "pip install scipy" or "pip install
matplotlib" a reality on every platform is something everybody agrees o
sure, everyone wants that. But when it gets deeper, they don't want to have
a bunc hof pip-installable binary wheels that are simply clibs re-packaged
as a dependency. And, then you have the problelm of those being "binary
wheel" dependencies, rather than "package" dependencies.

e.g.:

this particular build of pillow depends on the libpng and libjpeg wheels,
but the Pillow package, in general, does not. And you would have different
dependencies on Windows, and OS-X, and Linux.

pip/wheel simply was not designed for that, and I didn't get any warm and
fuzzy feelings from dist-utils sig that the it ever would. And again, then
you are re-designing conda.

So the only way to do all that is to statically link all the dependent libs
in with each binary wheel (or ship a dll). Somehow it always bothered me to
ship the same lib with multiple packages -- isn't that why shared libs
exist?

- In practice, maybe it doesn't matter, memory is cheap. But I also got
sick of building the damn things! I like that Anaconda comes with libs
someone else has built, and I can use them with my packages, too. And when
it comes to ugly stuff like HDF and GDAL, I'm really happy someone else has
built them!

Anyway -- carry on -- being able to pip install the scipy stack would be
very nice.

-CHB
Post by Chris Barker
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Nathaniel Smith
2016-01-12 01:29:30 UTC
Permalink
Post by Chris Barker
Post by David Cournapeau
Post by Chris Barker
If we get all that worked out, we still haven't made any progress
toward the non-standard libs that aren't python. This is the big "scipy
problem" -- fortran, BLAS, hdf, ad infinitum.
Post by Chris Barker
Post by David Cournapeau
Post by Chris Barker
I argued for years that we could build binary wheels that hold each of
these, and other python packages could depend on them, but pypa never
seemed to like that idea.
Post by Chris Barker
Post by David Cournapeau
I don't think that's an accurate statement. There are issues to solve
around this, but I did not encounter push back, either on the ML or face to
face w/ various pypa members at Pycon, etc... There may be push backs for a
particular detail, but making "pip install scipy" or "pip install
matplotlib" a reality on every platform is something everybody agrees o
Post by Chris Barker
sure, everyone wants that. But when it gets deeper, they don't want to
have a bunc hof pip-installable binary wheels that are simply clibs
re-packaged as a dependency. And, then you have the problelm of those being
"binary wheel" dependencies, rather than "package" dependencies.
Post by Chris Barker
this particular build of pillow depends on the libpng and libjpeg wheels,
but the Pillow package, in general, does not. And you would have different
dependencies on Windows, and OS-X, and Linux.
Post by Chris Barker
pip/wheel simply was not designed for that, and I didn't get any warm and
fuzzy feelings from dist-utils sig that the it ever would. And again, then
you are re-designing conda.

I agree that talking about such things on distutils-sig tends to elicit a
certain amount of puzzled incomprehension, but I don't think it matters --
wheels already have everything you need to support this. E.g. wheels for
different platforms can trivially have different dependencies. (They even
go to some lengths to make sure this is possible for pure python packages
where the same wheel can be used on multiple platforms.) When distributing
a library-in-a-wheel then you need a little bit of hackishness to make sure
the runtime loader can find the library, which conda would otherwise handle
for you, but AFAICT it's like 10 lines of code or something.

And in any case we have lots of users who don't use conda and are thus
doomed to support both ecosystems regardless, so we might as well make the
best of it :-).

-n
Robert McGibbon
2016-01-12 01:54:08 UTC
Permalink
Post by Nathaniel Smith
And in any case we have lots of users who don't use conda and are thus
doomed to support both ecosystems regardless, so we might as well make the
best of it :-).

Yes, this is the key. Conda is a great tool for a lot of users / use cases,
but it's not for everyone.

Anyways, I think I've made a pretty good start on the tooling for a wheel
ABI tag for a LSB-style base system that represents a common set of shared
libraries and symbols versions provided by "many" linuxes (previously
discussed by Nathaniel here:
https://code.activestate.com/lists/python-distutils-sig/26272/)

-Robert
Post by Nathaniel Smith
Post by Chris Barker
Post by David Cournapeau
Post by Chris Barker
If we get all that worked out, we still haven't made any progress
toward the non-standard libs that aren't python. This is the big "scipy
problem" -- fortran, BLAS, hdf, ad infinitum.
Post by Chris Barker
Post by David Cournapeau
Post by Chris Barker
I argued for years that we could build binary wheels that hold each of
these, and other python packages could depend on them, but pypa never
seemed to like that idea.
Post by Chris Barker
Post by David Cournapeau
I don't think that's an accurate statement. There are issues to solve
around this, but I did not encounter push back, either on the ML or face to
face w/ various pypa members at Pycon, etc... There may be push backs for a
particular detail, but making "pip install scipy" or "pip install
matplotlib" a reality on every platform is something everybody agrees o
Post by Chris Barker
sure, everyone wants that. But when it gets deeper, they don't want to
have a bunc hof pip-installable binary wheels that are simply clibs
re-packaged as a dependency. And, then you have the problelm of those being
"binary wheel" dependencies, rather than "package" dependencies.
Post by Chris Barker
this particular build of pillow depends on the libpng and libjpeg
wheels, but the Pillow package, in general, does not. And you would have
different dependencies on Windows, and OS-X, and Linux.
Post by Chris Barker
pip/wheel simply was not designed for that, and I didn't get any warm
and fuzzy feelings from dist-utils sig that the it ever would. And again,
then you are re-designing conda.
I agree that talking about such things on distutils-sig tends to elicit a
certain amount of puzzled incomprehension, but I don't think it matters --
wheels already have everything you need to support this. E.g. wheels for
different platforms can trivially have different dependencies. (They even
go to some lengths to make sure this is possible for pure python packages
where the same wheel can be used on multiple platforms.) When distributing
a library-in-a-wheel then you need a little bit of hackishness to make sure
the runtime loader can find the library, which conda would otherwise handle
for you, but AFAICT it's like 10 lines of code or something.
And in any case we have lots of users who don't use conda and are thus
doomed to support both ecosystems regardless, so we might as well make the
best of it :-).
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Chris Barker
2016-01-13 22:23:51 UTC
Permalink
Post by Nathaniel Smith
I agree that talking about such things on distutils-sig tends to elicit a
certain amount of puzzled incomprehension, but I don't think it matters --
wheels already have everything you need to support this.
well, that's what I figured -- and I started down that path a while back
and got no support whatsoever (OK, some from Matthew Brett -- thanks!). But
I know myself well enough to know I wasn't going to get the critical mass
required to make it useful by myself, so I've moved on to an ecosystem that
is doing most of the work already.

Also, you have the problem that there is one PyPi -- so where do you put
your nifty wheels that depend on other binary wheels? you may need to fork
every package you want to build :-(

But sure, let's get the core scipy stack pip-installable as much as
possible -- and maybe some folks with more energy than me can move this all
forward.

Sorry to be a downer -- keep up the good work and energy!

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Oscar Benjamin
2016-01-14 11:39:21 UTC
Permalink
Post by Nathaniel Smith
I agree that talking about such things on distutils-sig tends to elicit a
certain amount of puzzled incomprehension, but I don't think it matters --
wheels already have everything you need to support this.
well, that's what I figured -- and I started down that path a while back and
got no support whatsoever (OK, some from Matthew Brett -- thanks!). But I
know myself well enough to know I wasn't going to get the critical mass
required to make it useful by myself, so I've moved on to an ecosystem that
is doing most of the work already.
I think the problem with discussing these things on distutils-sig is
that the discussions are often very theoretical. In reality PyPA are
waiting for people to adopt the infrastructure that they have created
so far by uploading sets of binary wheels. Once that process really
kicks off then as issues emerge there will be real specific problems
to solve and a more concrete discussion of what changes are needed to
wheel/pip/PyPI can emerge.

The main exceptions to this are wheels for Linux and non-setuptools
build dependencies for sdists so it's definitely good to pursue those
problems and try to complete the basic infrastructure.
Also, you have the problem that there is one PyPi -- so where do you put
your nifty wheels that depend on other binary wheels? you may need to fork
every package you want to build :-(
Is this a real problem or a theoretical one? Do you know of some
situation where this wheel to wheel dependency will occur that won't
just be solved in some other way?

--
Oscar
Chris Barker - NOAA Federal
2016-01-14 17:14:19 UTC
Permalink
Post by Oscar Benjamin
Post by Chris Barker
Also, you have the problem that there is one PyPi -- so where do you put
your nifty wheels that depend on other binary wheels? you may need to fork
every package you want to build :-(
Is this a real problem or a theoretical one? Do you know of some
situation where this wheel to wheel dependency will occur that won't
just be solved in some other way?
It's real -- at least during the whole bootstrapping period. Say I
build a nifty hdf5 binary wheel -- I could probably just grab the name
"libhdf5" on PyPI. So far so good. But the goal here would be to have
netcdf and pytables and GDAL and who knows what else then link against
that wheel. But those projects are all supported be different people,
that all have their own distribution strategy. So where do I put
binary wheels of each of those projects that depend on my libhdf5
wheel? _maybe_ I would put it out there, and it would all grow
organically, but neither the culture nor the tooling support that
approach now, so I'm not very confident you could gather adoption.

Even beyond the adoption period, sometimes you need to do stuff in
more than one way -- look at the proliferation of channels on
Anaconda.org.

This is more likely to work if there is a good infrastructure for
third parties to build and distribute the binaries -- e.g.
Anaconda.org.

Or the Linux dist to model -- for the most part, the people developing
a given library are not packaging it.

-CHB
Matthew Brett
2016-01-14 18:58:17 UTC
Permalink
On Thu, Jan 14, 2016 at 9:14 AM, Chris Barker - NOAA Federal
Post by Chris Barker - NOAA Federal
Post by Oscar Benjamin
Post by Chris Barker
Also, you have the problem that there is one PyPi -- so where do you put
your nifty wheels that depend on other binary wheels? you may need to fork
every package you want to build :-(
Is this a real problem or a theoretical one? Do you know of some
situation where this wheel to wheel dependency will occur that won't
just be solved in some other way?
It's real -- at least during the whole bootstrapping period. Say I
build a nifty hdf5 binary wheel -- I could probably just grab the name
"libhdf5" on PyPI. So far so good. But the goal here would be to have
netcdf and pytables and GDAL and who knows what else then link against
that wheel. But those projects are all supported be different people,
that all have their own distribution strategy. So where do I put
binary wheels of each of those projects that depend on my libhdf5
wheel? _maybe_ I would put it out there, and it would all grow
organically, but neither the culture nor the tooling support that
approach now, so I'm not very confident you could gather adoption.
I don't think there's a very large amount of cultural work - but some
to be sure.

We already have the following on OSX:

pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py

where all the wheels come from pypi. So, I don't think this is really
outside our range, even if the problem is a little more difficult for
Linux.
Post by Chris Barker - NOAA Federal
Even beyond the adoption period, sometimes you need to do stuff in
more than one way -- look at the proliferation of channels on
Anaconda.org.
This is more likely to work if there is a good infrastructure for
third parties to build and distribute the binaries -- e.g.
Anaconda.org.
I thought that Anaconda.org allows pypi channels as well?

Matthew
Chris Barker
2016-01-15 18:22:25 UTC
Permalink
Post by Matthew Brett
Post by Chris Barker - NOAA Federal
but neither the culture nor the tooling support that
approach now, so I'm not very confident you could gather adoption.
I don't think there's a very large amount of cultural work - but some
to be sure.
pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py
where all the wheels come from pypi. So, I don't think this is really
outside our range, even if the problem is a little more difficult for
Linux.
I'm actually less concerned about the Linux issue, I think that can be
solved reasonably with "manylinux" -- which would put us in a very similar
position to OS-X , and pretty similar to Windows -- i.e. a common platform
with the basic libs (libc, etc), but not a whole lot else.

I'm concerned about all the other libs various packages depend on. It's not
too bad if you want the core scipy stack -- a decent BLAS being the real
challenge there, but there is enough coordination between numpy and scipy
that at least efforts to solve that will be shared.

But we're still stuck with delivering dependencies on libs along with each
package -- usually statically linked.
Post by Matthew Brett
I thought that Anaconda.org allows pypi channels as well?
I think you can host pip-compatible wheels, etc on anaconda.org -- though
that may be deprecated... but anyway, I thought the goal here was a simple
"pip install", which will point only to PyPi -- I don't think there is a
way, ala conda, to add "channels" that will then get automatically searched
by pip. But I may be wrong there.
Post by Matthew Brett
pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py
last I checked, each of those is self-contained, except for python-level
dependencies, most notably on numpy. So it doesn't' help me solve my
problem. For instance, I have my own C/C++ code that I'm wrapping that
requires netcdf (https://github.com/NOAA-ORR-ERD/PyGnome), and another that
requires image libs like libpng, libjpeg, etc.(
https://github.com/NOAA-ORR-ERD/py_gd)

netcdf is not too ugly itself, but it depends on hdf5, libcurl, zlib
(others?). So I have all these libs I need. As it happens, matplotlib
probably has the image libs I need, and h5py has hdf5 (and libcurl? and
zlib?). But even then, as far as I can tell, I need to build and provide
these libs myself for my code. Which is a pain in the @%$ and then I'm
shipping (and running) multiple copies of the same libs all over the place
-- will there be compatibility issues? apparently not, but it's still
wastes the advantage of shared libs, and makes things a pain for all of us.

With conda, on the other hand, I get netcdf libs, hdf5 libs, libpng,
libjpeg, ibtiff, and I can build my stuff against those and depend on them
-- saves me a lot of pain, and my users get a better system.

Oh, and add on the GIS stuff: GDAL, etc. (seriously a pain to build), and I
get a lot of value.

And, many of these libs (GDAL, netcdf) come with nifty command line
utilities -- I get those too.

So, pip+wheel _may_ be able to support all that, but AFAICT, no one is
doing it.

And it's really not going to support shipping cmake, and perl, and who
knows what else I might need in my toolchain that's not python or "just" a
library.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Nathaniel Smith
2016-01-15 19:21:26 UTC
Permalink
On Jan 15, 2016 10:23 AM, "Chris Barker" <***@noaa.gov> wrote:
[...]
Post by Chris Barker
Post by Matthew Brett
pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py
last I checked, each of those is self-contained, except for python-level
dependencies, most notably on numpy. So it doesn't' help me solve my
problem. For instance, I have my own C/C++ code that I'm wrapping that
requires netcdf (https://github.com/NOAA-ORR-ERD/PyGnome), and another that
requires image libs like libpng, libjpeg, etc.(
https://github.com/NOAA-ORR-ERD/py_gd)
Post by Chris Barker
netcdf is not too ugly itself, but it depends on hdf5, libcurl, zlib
(others?). So I have all these libs I need. As it happens, matplotlib
probably has the image libs I need, and h5py has hdf5 (and libcurl? and
zlib?). But even then, as far as I can tell, I need to build and provide
these libs myself for my code. Which is a pain in the @%$ and then I'm
shipping (and running) multiple copies of the same libs all over the place
-- will there be compatibility issues? apparently not, but it's still
wastes the advantage of shared libs, and makes things a pain for all of us.
Post by Chris Barker
With conda, on the other hand, I get netcdf libs, hdf5 libs, libpng,
libjpeg, ibtiff, and I can build my stuff against those and depend on them
-- saves me a lot of pain, and my users get a better system.

Sure. Someone's already packaged those for conda, and no one has packaged
them for pypi, so it makes sense that conda is more convenient for you. If
someone does the work of packaging them for pypi, then that difference goes
away. I'm not planning to do that work myself :-). My focus in these
discussions around pip/pypi is selfishly focused on the needs of numpy.
pip/pypi is clearly the weakest link in the space of packaging and
distribution systems that our users care about, so improvements there raise
the minimum baseline we can assume. But if/when we sort out the hard
problems blocking numpy wheels (Linux issues, windows issues, etc.) then I
suspect that we'll start seeing people packaging up those dependencies that
you're worrying about and putting them on pypi, just because there won't be
any big road blocks anymore to doing so.

-n
Travis Oliphant
2016-01-15 19:56:30 UTC
Permalink
Post by Chris Barker
[...]
Post by Chris Barker
Post by Matthew Brett
pip install numpy scipy matplotlib scikit-learn scikit-image pandas
h5py
Post by Chris Barker
last I checked, each of those is self-contained, except for python-level
dependencies, most notably on numpy. So it doesn't' help me solve my
problem. For instance, I have my own C/C++ code that I'm wrapping that
requires netcdf (https://github.com/NOAA-ORR-ERD/PyGnome), and another
that requires image libs like libpng, libjpeg, etc.(
https://github.com/NOAA-ORR-ERD/py_gd)
Post by Chris Barker
netcdf is not too ugly itself, but it depends on hdf5, libcurl, zlib
(others?). So I have all these libs I need. As it happens, matplotlib
probably has the image libs I need, and h5py has hdf5 (and libcurl? and
zlib?). But even then, as far as I can tell, I need to build and provide
shipping (and running) multiple copies of the same libs all over the place
-- will there be compatibility issues? apparently not, but it's still
wastes the advantage of shared libs, and makes things a pain for all of us.
Post by Chris Barker
With conda, on the other hand, I get netcdf libs, hdf5 libs, libpng,
libjpeg, ibtiff, and I can build my stuff against those and depend on them
-- saves me a lot of pain, and my users get a better system.
Sure. Someone's already packaged those for conda, and no one has packaged
them for pypi, so it makes sense that conda is more convenient for you. If
someone does the work of packaging them for pypi, then that difference goes
away. I'm not planning to do that work myself :-). My focus in these
discussions around pip/pypi is selfishly focused on the needs of numpy.
pip/pypi is clearly the weakest link in the space of packaging and
distribution systems that our users care about, so improvements there raise
the minimum baseline we can assume. But if/when we sort out the hard
problems blocking numpy wheels (Linux issues, windows issues, etc.) then I
suspect that we'll start seeing people packaging up those dependencies that
you're worrying about and putting them on pypi, just because there won't be
any big road blocks anymore to doing so.
I still submit that this is not the best use of time. Conda *already*
solves the problem. My sadness is that people keep working to create an
ultimately inferior solution rather than just help make a better solution
more accessible. People mistakenly believe that wheels and conda
packages are equivalent. They are not. If they were we would not have
created conda. We could not do what was necessary with wheels and
contorting wheels to become conda packages was and still is a lot more
work. Now, obviously, it's just code and you can certainly spend effort
and time to migrate wheels so that they functionally equivalently to conda
packages --- but what is the point, really?

Why don't we work together to make the open-source conda project and
open-source conda packages more universally accessible?

The other very real downside is that these efforts to promote numpy as
wheels further encourages people to not use the better solution that
already exists in conda. I have to deal with people that *think* pip will
solve all their problems all the time. It causes a lot of difficulty when
they end up with work-around after work-around that is actually all solved
already with conda. It's a weird situation to be in. I'm really
baffled by the resistance of this community to just help make conda *the*
solution for the scientific python community.

I think it would be better to spend time:

1) helping smooth out the pip/conda divide. There are many ways to do
this that have my full support:

* making sure pip install conda works well
* creating "shadow packages" in conda for things that pip has installed
* making it possible for pip to install conda packages directly (and
ignore the extra features that conda makes possible that pip does not
support). A pull-request to pip that did that would be far more useful
than trying to cram things into the wheel concept.

2) creating a community conda packages to alleviate whatever concerns might
exist about the "control" of the packages that people can install with
conda.

* Continuum has volunteered resources to Numfocus so that it can be the
governing body of what goes into a "pycommunity" channel for conda that
will be as easy to get as a "conda install pycommunity"

I have not yet heard any really valid reason why this community should not
just adopt conda packages and encourage others to do the same. The only
thing I have heard are "chicken-and-egg" stories that come down to "we want
people to be able to use pip." So, good, then let's make it so that pip
can install conda packages and that conda packages with certain
restrictions can be hosted on pypi or anywhere else that you have an
"index". At least if there were valid reasons they could be addressed.
But, this head-in-the-sand attitude towards a viable technology that is
freely available is really puzzling to me.

There are millions of downloads of Anaconda and many millions of downloads
of conda packages each year. That is just with one company doing it.
There could be many millions more with other companies and organizations
hosting conda packages and indexes. The conda user-base is already very
large. A great benefit to the Python ecosystem would be to allow pip
users and conda users to share each other's work --- rather than to spend
time reproducing work that is already done and freely available.

-Travis
Post by Chris Barker
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
*Travis Oliphant*
*Co-founder and CEO*


@teoliphant
512-222-5440
http://www.continuum.io
Robert McGibbon
2016-01-15 21:22:49 UTC
Permalink
Post by Travis Oliphant
I still submit that this is not the best use of time. Conda *already*
solves the problem. My sadness is that people keep working to create an
ultimately inferior solution rather than just help make a better solution
more accessible. People mistakenly believe that wheels and conda
packages are equivalent. They are not. If they were we would not have
created conda. We could not do what was necessary with wheels and
contorting wheels to become conda packages was and still is a lot more
work. Now, obviously, it's just code and you can certainly spend effort
and time to migrate wheels so that they functionally equivalently to conda
packages --- but what is the point, really?
Post by Travis Oliphant
Why don't we work together to make the open-source conda project and
open-source conda packages more universally accessible?

The factors that motivate my interest in making wheels for Linux (i.e. the
proposed manylinux tag) work on PyPI are

- All (new) Python installations come with pip. As a package author writing
documentation, I count on users having pip installed, but I can't count on
conda.
- I would like to see Linux have feature parity with OS X and Windows with
respect to pip and PyPI.
- I want the PyPA tools like pip to be as good as possible.
- I'm confident that the manylinux proposal will work, and it's very
straightforward.

-Robert
Nathaniel Smith
2016-01-16 07:11:34 UTC
Permalink
On Fri, Jan 15, 2016 at 11:56 AM, Travis Oliphant <***@continuum.io> wrote:
[...]
Post by Travis Oliphant
I still submit that this is not the best use of time. Conda *already*
solves the problem. My sadness is that people keep working to create an
ultimately inferior solution rather than just help make a better solution
more accessible. People mistakenly believe that wheels and conda
packages are equivalent. They are not. If they were we would not have
created conda. We could not do what was necessary with wheels and
contorting wheels to become conda packages was and still is a lot more work.
Now, obviously, it's just code and you can certainly spend effort and time
to migrate wheels so that they functionally equivalently to conda packages
--- but what is the point, really?
Sure, conda definitely solves problems that wheels don't. And I
recommend Anaconda to people all the time :-) But let me comment
specifically on the topic of numpy-and-pip. (I don't want to be all
"this is off-topic!" because there isn't really a better list of
general scientific-python-ecosystem discussion, but the part of the
conda-and-pip discussion that actually impacts on numpy development,
or where the numpy maintainers can affect anything, is very small, and
this is numpy-discussion, so...)

Last month, numpy had ~740,000 downloads from PyPI, and there are
probably hundreds of third-party projects that distribute via PyPI and
depend on numpy. So concretely our options as a project are:
1) drop support for pip/PyPI and abandon those users
2) continue to support those users fairly poorly, and at substantial
ongoing cost
3) spend some resources now to make pip/pypi work better, so we can
support them better and at lower ongoing cost

Option 1 would require overwhelming consensus of the community, which
for better or worse is presumably not going to happen while
substantial portions of that community are still using pip/PyPI. If
folks interested in pushing things forward can convince them to switch
to conda instead then the calculation changes, but that's just not the
kind of thing that numpy-the-project can do or not do.

So between the possible options, I've been spending some time trying
to drag pip/PyPI into working better for us because I like (3) better
than (2). It's not a referendum on anything else. I assume that others
have similar motives, though I won't try speaking for them.

I think beyond that there are also (currently) some unique benefits to
supporting pip/PyPI, like the social/political benefits of not forking
away from the broader Python community, and the fact that pip is
currently the only functioning way we have of distributing numpy
prereleases to users for testing. (The fine-grained numpy ABI tracking
in conda is neat, but for this particular use case it's actually a bit
of a problem :-).) But it doesn't much matter either way, because we
can't/won't just abandon all those users regardless.

-n
--
Nathaniel J. Smith -- http://vorpus.org
Chris Barker - NOAA Federal
2016-01-19 16:57:56 UTC
Permalink
Post by Nathaniel Smith
Last month, numpy had ~740,000 downloads from PyPI,
Hm, given that Windows and Linux wheels have not been available, then
that's mostly source installs anyway. Or failed installs -- no
shortage of folks trying to pip install numpy on Windows and then
having questions about why it doesn't work. Unfortunately, there is no
way to know if pip downloads are successful, or if people pip install
Numpy, then find out they need some other non-pip-installable
packages, and go find another system.
Post by Nathaniel Smith
and there are
probably hundreds of third-party projects that distribute via PyPI and
depend on numpy.
I'm not so sure -- see above--as pip install has not been reliable for
Numpy for ages, I doubt it. Not that they aren't there, but I doubt
it's the primary distribution mechanism. There's been an explosion in
the use of conda, and there have been multiple other options for ages:
Canopy, python(x,y), Gohlke's builds, etc.

So at this point, I think the only people using pip are folks that are
set up to build -- mostly Linux. (though Mathew's efforts with the Mac
wheels may have created a different story on the Mac).
Post by Nathaniel Smith
1) drop support for pip/PyPI and abandon those users
There is no one to abandon -- except the Mac users -- we haven't
supported them yet.
Post by Nathaniel Smith
2) continue to support those users fairly poorly, and at substantial
ongoing cost
I'm curious what the cost is for this poor support -- throw the source
up on PyPi, and we're done. The cost comes in when trying to build
binaries...
Post by Nathaniel Smith
Option 1 would require overwhelming consensus of the community, which
for better or worse is presumably not going to happen while
substantial portions of that community are still using pip/PyPI.
Are they? Which community are we talking about? The community I'd like
to target are web developers that aren't doing what they think of as
"scientific" applications, but could use a little of the SciPy stack.
These folks are committed to pip, and are very reluctant to introduce
a difficult dependency. Binary wheels would help these folks, but
that is not a community that exists yet ( or it's small, anyway)

All that being said, I'd be happy to see binary wheels for the core
SciPy stack on PyPi. It would be nice for people to be able to do a
bit with Numpy or pandas, it MPL, without having to jump ship to a
whole new way of doing things.

But we should be realistic about how far it can go.
Post by Nathaniel Smith
If
folks interested in pushing things forward can convince them to switch
to conda instead then the calculation changes, but that's just not the
kind of thing that numpy-the-project can do or not do.
We can't convince anybody, but we can decide where to expend our efforts.

-CHB
Ralf Gommers
2016-01-19 18:05:52 UTC
Permalink
On Tue, Jan 19, 2016 at 5:57 PM, Chris Barker - NOAA Federal <
Post by Chris Barker - NOAA Federal
Post by Nathaniel Smith
2) continue to support those users fairly poorly, and at substantial
ongoing cost
I'm curious what the cost is for this poor support -- throw the source
up on PyPi, and we're done. The cost comes in when trying to build
binaries...
I'm sure Nathaniel means the cost to users of failed installs and of numpy
losing users because of that, not the cost of building binaries.
Post by Chris Barker - NOAA Federal
Option 1 would require overwhelming consensus of the community, which
Post by Nathaniel Smith
for better or worse is presumably not going to happen while
substantial portions of that community are still using pip/PyPI.
Are they? Which community are we talking about? The community I'd like
to target are web developers that aren't doing what they think of as
"scientific" applications, but could use a little of the SciPy stack.
These folks are committed to pip, and are very reluctant to introduce
a difficult dependency. Binary wheels would help these folks, but
that is not a community that exists yet ( or it's small, anyway)
All that being said, I'd be happy to see binary wheels for the core
SciPy stack on PyPi. It would be nice for people to be able to do a
bit with Numpy or pandas, it MPL, without having to jump ship to a
whole new way of doing things.
This is indeed exactly why we need binary wheels. Efforts to provide those
will not change our strong recommendation to our users that they're better
off using a scientific Python distribution.

Ralf
Robert McGibbon
2016-01-21 09:42:44 UTC
Permalink
Hi all,

Just as a heads up: Nathaniel and I wrote a draft PEP on binary linux
wheels that is now being discussed on distutils-sig, so you can check that
out and participate in the conversation if you're interested.

- PEP on python.org: https://www.python.org/dev/peps/pep-0513/
- PEP on github with some typos fixed:
https://github.com/manylinux/manylinux/blob/master/pep-513.rst
- Email archive:
https://mail.python.org/pipermail/distutils-sig/2016-January/027997.html

-Robert
Post by Ralf Gommers
On Tue, Jan 19, 2016 at 5:57 PM, Chris Barker - NOAA Federal <
Post by Chris Barker - NOAA Federal
Post by Nathaniel Smith
2) continue to support those users fairly poorly, and at substantial
ongoing cost
I'm curious what the cost is for this poor support -- throw the source
up on PyPi, and we're done. The cost comes in when trying to build
binaries...
I'm sure Nathaniel means the cost to users of failed installs and of numpy
losing users because of that, not the cost of building binaries.
Post by Chris Barker - NOAA Federal
Option 1 would require overwhelming consensus of the community, which
Post by Nathaniel Smith
for better or worse is presumably not going to happen while
substantial portions of that community are still using pip/PyPI.
Are they? Which community are we talking about? The community I'd like
to target are web developers that aren't doing what they think of as
"scientific" applications, but could use a little of the SciPy stack.
These folks are committed to pip, and are very reluctant to introduce
a difficult dependency. Binary wheels would help these folks, but
that is not a community that exists yet ( or it's small, anyway)
All that being said, I'd be happy to see binary wheels for the core
SciPy stack on PyPi. It would be nice for people to be able to do a
bit with Numpy or pandas, it MPL, without having to jump ship to a
whole new way of doing things.
This is indeed exactly why we need binary wheels. Efforts to provide those
will not change our strong recommendation to our users that they're better
off using a scientific Python distribution.
Ralf
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Chris Barker
2016-01-15 20:09:19 UTC
Permalink
Post by Nathaniel Smith
Sure. Someone's already packaged those for conda, and no one has packaged
them for pypi, so it makes sense that conda is more convenient for you. If
someone does the work of packaging them for pypi, then that difference goes
away.
This is what I meant by "cultural" issues :-)

but pypi has been an option for Windows and OS-X for ages and those
platforms are the bigger problem anyway -- and no one has done it for those
platforms -- Linux is not the issue here. I really did try to get that
effort started a while back -- I got zero support, nothing , nada. Which
doesn't mean I couldn't have gone ahead and done more myself, but I only
have so much time, and I wanted to work on my actual problems, and it it
hadn't gained any support, it would a been a big waste.

MAybe the world is ready for it now...

But besides the cultural/community/critical mass issues, pip+whell isn't
very well set up to support these use cases anyway -- so we have a 50%
solutikon now, and if we did this nifty binary-wheels-of-libs thing we'd
have a 90% solution -- and still struggle eith teh other 10%, until we
re-invented conda.

Anyway -- not trying to be a drag here -- just telling my story.

As for the many linux idea -- that's a great one -- hope we can get that
working.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Nathaniel Smith
2016-01-15 20:20:23 UTC
Permalink
Post by Chris Barker
Post by Nathaniel Smith
Sure. Someone's already packaged those for conda, and no one has packaged
them for pypi, so it makes sense that conda is more convenient for you. If
someone does the work of packaging them for pypi, then that difference goes
away.
This is what I meant by "cultural" issues :-)
but pypi has been an option for Windows and OS-X for ages and those
platforms are the bigger problem anyway -- and no one has done it for those
platforms
I think what's going on here is that Windows *hasn't* been an option
for numpy/scipy due to the toolchain issues, and on OS-X we've just
been using the platform BLAS (Accelerate), so the core packages
haven't had any motivation to sort out the whole library dependency
issue, and no-one else is motivated enough to do it. My prediction is
that after the core packages sort it out in order to solve their own
problems then we might see others picking it up.

-n
--
Nathaniel J. Smith -- http://vorpus.org
Benjamin Root
2016-01-15 21:08:06 UTC
Permalink
Travis -

I will preface the following by pointing out how valuable miniconda and
anaconda has been for our workplace because we were running into issues
with ensuring that everyone in our mixed platform office had access to all
the same tools, particularly GDAL, NetCDF4 and such. For the longest time,
we were all stuck on an ancient "Corporate Python" that our IT staff
managed to put together, but never had the time to update. So, I do
absolutely love conda for the problems that it solved for us.

That being said... I take exception to your assertion that anaconda is
*the* solution to the packaging problem. I still have a number of issues,
particularly with the interactions of GDAL, shapely, and Basemap (they all
seek out libgeos_c differently), and I have to use my own build of GDAL to
enable many of the features that we use (the vanilla GDAL put out by
Continuum just has the default options, and is quite limited). If I don't
set up my environment *just right* one of those packages will fail to
import in some way due to being unable to find their particular version of
libgeos_c. I haven't figure it out exactly why this happens, but it is very
easy to break such an environment this way after an update.

But that problem is a solvable problem within the framework of conda, so I
am not too concerned about that. The bigger problem is Apache. In
particular, mod_wsgi. About a year ago, one of our developers was happily
developing a webtool that utilized numpy, NetCDF4 and libxml via conda
environments. All of the testing was done in flask and everything was
peachy. We figured that final deployment out to the Apache server would be
a cinch, right? Wrong. Because our mod_wsgi was built against the system's
python because it came in through RPMs, it was completely incompatible with
the compiled numpy because of the differences in the python compile
options. In a clutch, we had our IT staff manually build mod_wsgi against
anaconda's python, but they weren't too happy about that, due to mod_wsgi
no longer getting updated via yum.

If anaconda was the end-all, be-all solution, then it should just be a
simple matter to do "conda install mod_wsgi". But the design of conda is
that it is intended to be a user-space package-manager. mod_wsgi is
installed via root/apache user, which is siloed off from the user. I would
have to (in theory) go and install conda for the apache user and likely
have to install a conda "apache" package and mod_wsgi package. I seriously
doubt this would be an acceptable solution for many IT administrators who
would rather depend upon the upstream distributions who are going to be
very quick about getting updates out the door and are updated automatically
through yum or apt.

So, again, I love conda for what it can do when it works well. I only take
exception to the notion that it can address *all* problems, because there
are some problems that it just simply isn't properly situated for.

Cheers!
Ben Root
Post by Oscar Benjamin
Post by Chris Barker
Post by Nathaniel Smith
Sure. Someone's already packaged those for conda, and no one has
packaged
Post by Chris Barker
Post by Nathaniel Smith
them for pypi, so it makes sense that conda is more convenient for you.
If
Post by Chris Barker
Post by Nathaniel Smith
someone does the work of packaging them for pypi, then that difference
goes
Post by Chris Barker
Post by Nathaniel Smith
away.
This is what I meant by "cultural" issues :-)
but pypi has been an option for Windows and OS-X for ages and those
platforms are the bigger problem anyway -- and no one has done it for
those
Post by Chris Barker
platforms
I think what's going on here is that Windows *hasn't* been an option
for numpy/scipy due to the toolchain issues, and on OS-X we've just
been using the platform BLAS (Accelerate), so the core packages
haven't had any motivation to sort out the whole library dependency
issue, and no-one else is motivated enough to do it. My prediction is
that after the core packages sort it out in order to solve their own
problems then we might see others picking it up.
-n
--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Steve Waterbury
2016-01-15 21:56:19 UTC
Permalink
Post by Benjamin Root
So, again, I love conda for what it can do when it works well. I only
take exception to the notion that it can address *all* problems, because
there are some problems that it just simply isn't properly situated for.
Actually, I would say you didn't mention any ... ;) The issue is
not that it "isn't properly situated for" (whatever that means)
the problems you describe, but that -- in the case you mention,
for example -- no one has conda-packaged those solutions yet.

FWIW, our sysadmins and I use conda for django / apache / mod_wsgi
sites and we are very happy with it. IMO, compiling mod_wsgi in
the conda environment and keeping it up is trivial compared to the
awkwardnesses introduced by using pip/virtualenv in those cases.

We also use conda for sites with nginx and the conda-packaged
uwsgi, which works great and even permits the use of a separate
env (with, if necessary, different versions of django, etc.)
for each application. No need to set up an entire VM for each app!
*My* sysadmins love conda -- as soon as they saw how much better
than pip/virtualenv it was, they have never looked back.

IMO, conda is by *far* the best packaging solution the python
community has ever seen (and I have been using python for more
than 20 years). I too have been stunned by some of the resistance
to conda that one sometimes sees in the python packaging world.
I've had a systems package maintainer tell me "it solves a
different problem [than pip]" ... hmmm ... I would say it
solves the same problem *and more*, *better*. I attribute
some of the conda-ignoring to "NIH" and, to some extent,
possibly defensiveness (I would be defensive too if I had been
working on pip as long as they had when conda came along ;).

Cheers,
Steve Waterbury
NASA/GSFC
Matthew Brett
2016-01-15 22:07:22 UTC
Permalink
Hi,

On Fri, Jan 15, 2016 at 1:56 PM, Steve Waterbury
Post by Steve Waterbury
Post by Benjamin Root
So, again, I love conda for what it can do when it works well. I only
take exception to the notion that it can address *all* problems, because
there are some problems that it just simply isn't properly situated for.
Actually, I would say you didn't mention any ... ;) The issue is
not that it "isn't properly situated for" (whatever that means)
the problems you describe, but that -- in the case you mention,
for example -- no one has conda-packaged those solutions yet.
FWIW, our sysadmins and I use conda for django / apache / mod_wsgi
sites and we are very happy with it. IMO, compiling mod_wsgi in
the conda environment and keeping it up is trivial compared to the
awkwardnesses introduced by using pip/virtualenv in those cases.
We also use conda for sites with nginx and the conda-packaged
uwsgi, which works great and even permits the use of a separate
env (with, if necessary, different versions of django, etc.)
for each application. No need to set up an entire VM for each app!
*My* sysadmins love conda -- as soon as they saw how much better
than pip/virtualenv it was, they have never looked back.
IMO, conda is by *far* the best packaging solution the python
community has ever seen (and I have been using python for more
than 20 years).
Yes, I think everyone would agree that, until recently, Python
packaging was in a mess.
Post by Steve Waterbury
I too have been stunned by some of the resistance
to conda that one sometimes sees in the python packaging world.
You can correct me if you see evidence to the contrary, but I think
all of the argument is that it is desirable that pip should work as
well.
Post by Steve Waterbury
I've had a systems package maintainer tell me "it solves a
different problem [than pip]" ... hmmm ... I would say it
solves the same problem *and more*, *better*.
I know what you mean, but I suppose the person you were talking to may
have been thinking that many of us already have Python distributions
that we are using, and for those, we want to use pip.
Post by Steve Waterbury
I attribute
some of the conda-ignoring to "NIH" and, to some extent,
possibly defensiveness (I would be defensive too if I had been
working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example,
I hadn't worked on pip at all before conda came along.

Best,

Matthew
Steve Waterbury
2016-01-15 22:15:23 UTC
Permalink
Post by Matthew Brett
Post by Steve Waterbury
I attribute
some of the conda-ignoring to "NIH" and, to some extent,
possibly defensiveness (I would be defensive too if I had been
working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example,
I hadn't worked on pip at all before conda came along.
By "working on pip", I was referring to *developers* of pip,
not those who *use* pip for packaging things. Are you
contributing to the development of pip, or merely using it
for creating packages?

Steve
Matthew Brett
2016-01-15 22:19:42 UTC
Permalink
On Fri, Jan 15, 2016 at 2:15 PM, Steve Waterbury
Post by Steve Waterbury
Post by Matthew Brett
Post by Steve Waterbury
I attribute
some of the conda-ignoring to "NIH" and, to some extent,
possibly defensiveness (I would be defensive too if I had been
working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example,
I hadn't worked on pip at all before conda came along.
By "working on pip", I was referring to *developers* of pip,
not those who *use* pip for packaging things. Are you
contributing to the development of pip, or merely using it
for creating packages?
Sorry - I assumed you were taking about us here on the list planning
to build Linux and Windows wheels.

It certainly doesn't seem surprising to me that the pip developers
would continue to develop pip rather than switch to conda. Has there
been any attempt to persuade the pip developers to do this?

Best,

Matthew
Steve Waterbury
2016-01-15 22:33:20 UTC
Permalink
Post by Matthew Brett
On Fri, Jan 15, 2016 at 2:15 PM, Steve Waterbury
Post by Steve Waterbury
Post by Matthew Brett
Post by Steve Waterbury
I attribute
some of the conda-ignoring to "NIH" and, to some extent,
possibly defensiveness (I would be defensive too if I had been
working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example,
I hadn't worked on pip at all before conda came along.
By "working on pip", I was referring to *developers* of pip,
not those who *use* pip for packaging things. Are you
contributing to the development of pip, or merely using it
for creating packages?
Sorry - I assumed you were taking about us here on the list planning
to build Linux and Windows wheels.
No, I was definitely *not* talking about those on the list
planning to build Linux and Windows wheels when I referred to
the folks "working on pip". However, that said, I *completely*
agree with Travis's remark:

"The other very real downside is that these efforts to promote
numpy as wheels further encourages people to not use the
better solution that already exists in conda."
Post by Matthew Brett
It certainly doesn't seem surprising to me that the pip developers
would continue to develop pip rather than switch to conda. Has there
been any attempt to persuade the pip developers to do this?
Not that I know of, but as I said, I have asked pip developers and
core python developers for their opinions on conda, and my
impression has always been one of, shall we say, a circling of the
wagons. Yes, even the nice guys in the python community (and I've
know some of them a *long* time) sometimes do that ... ;)

Cheers,
Steve
Matthew Brett
2016-01-15 22:38:35 UTC
Permalink
On Fri, Jan 15, 2016 at 2:33 PM, Steve Waterbury
Post by Steve Waterbury
Post by Matthew Brett
On Fri, Jan 15, 2016 at 2:15 PM, Steve Waterbury
Post by Steve Waterbury
Post by Matthew Brett
Post by Steve Waterbury
I attribute
some of the conda-ignoring to "NIH" and, to some extent,
possibly defensiveness (I would be defensive too if I had been
working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example,
I hadn't worked on pip at all before conda came along.
By "working on pip", I was referring to *developers* of pip,
not those who *use* pip for packaging things. Are you
contributing to the development of pip, or merely using it
for creating packages?
Sorry - I assumed you were taking about us here on the list planning
to build Linux and Windows wheels.
No, I was definitely *not* talking about those on the list
planning to build Linux and Windows wheels when I referred to
the folks "working on pip". However, that said, I *completely*
"The other very real downside is that these efforts to promote
numpy as wheels further encourages people to not use the
better solution that already exists in conda."
I think there's a distinction between 'promote numpy as wheels' and
'make numpy available as a wheel'. I don't think you'll see much
evidence of "promotion" here - it's not really the open-source way.
I'm not quite sure what you mean about 'circling the wagons', but the
general approach of staying on course and seeing how things shake out
seems to me entirely sensible.

Cheers too,

Matthew
Chris Barker
2016-01-16 03:22:51 UTC
Permalink
Post by Matthew Brett
I think there's a distinction between 'promote numpy as wheels' and
'make numpy available as a wheel'. I don't think you'll see much
evidence of "promotion" here - it's not really the open-source way.
Depends on how you define "promotion" I suppose. But I think that
supporting something is indeed promoting it.

I've been a fan of getting the scipy stack available from pip for a long
time -- I think it could be really useful for lots of folks not doing heavy
scipy-style work, but folks are very wary of introducing a new, hard to
install, dependency.

But now I'm not so sure -- the trick is what you tell newbies. When I teach
Python (not scipy), I start folks off with the python.org python. Then they
can pip install ipython, which is the only part of the "scipy stack" I want
them to have if they are not doing real numerical work.

But what if they are? Now it's pretty clear to me that anyone interested in
getting into data analysis, etc with python should just stating off with
Anaconda (or Canopy) -- or maybe Gohlke's binaries for Windows users. But
what if we have wheels on all platforms for the scipy stack (and a few
others?). Now they can learn python, pip install numpy, scipy, etc, learn
some more, get excited -- delve into some domain-specific work,and WHAM --
hit the wall of installation nightmares. NOW, they need to switch to
Anaconda or Canopy...

I think it's too bad to get that far into it and then have to switch and
learn something new. -- Again, is there really a point to a 90% solution?

So this is the point -- heading down this road takes us to a place where
people can get much farther before hitting the wall -- but hot the wall
they will, so we'll still need other solutions, so maybe it would be better
for us to put our energies into those other solutions.

By the way, I'm seeing more and more folks that are not scipy-focused
starting off with conda, even for web apps.
Post by Matthew Brett
I'm not quite sure what you mean about 'circling the wagons', but the
general approach of staying on course and seeing how things shake out
seems to me entirely sensible.
well, what I've observed in the PyPa community may not be circling the
wagons, but it has been a "we're not ever going to solve the scipy
problems" attitude. I'm pretty convinced that pip/wheel will never be even
a 90% solution without some modifications -- so why go there?

Indeed, it's not uncommon for folks on the distutils list to say "go use
conda" in response to issues that pip does not address well.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Bryan Van de Ven
2016-01-16 03:48:17 UTC
Permalink
Indeed, it's not uncommon for folks on the distutils list to say "go use conda" in response to issues that pip does not address well.
I was in the room at the very first proto-PyData conference when Guido told the assembled crowd "if pip doesn't satisfy the needs of the SciPy community, they should build something that does, on their own." Well, here we are.

Bryan
David Cournapeau
2016-01-15 22:37:34 UTC
Permalink
Post by Steve Waterbury
Post by Benjamin Root
So, again, I love conda for what it can do when it works well. I only
take exception to the notion that it can address *all* problems, because
there are some problems that it just simply isn't properly situated for.
Actually, I would say you didn't mention any ... ;) The issue is
not that it "isn't properly situated for" (whatever that means)
the problems you describe, but that -- in the case you mention,
for example -- no one has conda-packaged those solutions yet.
FWIW, our sysadmins and I use conda for django / apache / mod_wsgi
sites and we are very happy with it. IMO, compiling mod_wsgi in
the conda environment and keeping it up is trivial compared to the
awkwardnesses introduced by using pip/virtualenv in those cases.
We also use conda for sites with nginx and the conda-packaged
uwsgi, which works great and even permits the use of a separate
env (with, if necessary, different versions of django, etc.)
for each application. No need to set up an entire VM for each app!
*My* sysadmins love conda -- as soon as they saw how much better
than pip/virtualenv it was, they have never looked back.
IMO, conda is by *far* the best packaging solution the python
community has ever seen (and I have been using python for more
than 20 years). I too have been stunned by some of the resistance
to conda that one sometimes sees in the python packaging world.
I've had a systems package maintainer tell me "it solves a
different problem [than pip]" ... hmmm ... I would say it
solves the same problem *and more*, *better*. I attribute
some of the conda-ignoring to "NIH" and, to some extent,
possibly defensiveness (I would be defensive too if I had been
working on pip as long as they had when conda came along ;).
Conda and pip solve some of the same problems, but pip also does quite a
bit more than conda (and vice et versa, as conda also acts akin
rvm-for-python). Conda works so well because it supports a subset of what
pip does: install things from binaries. This is the logical thing to do
when you want to distribute binaries because in the python world, since for
historical reasons, metadata in the general python lang are dynamic by the
very nature of setup.py.

For a long time, pip worked from sources instead of binaries (this is
actually the reason why it was started following easy_install), and thus
had to cope w/ those dynamic metadata. It also has to deal w/ building
packages and the whole distutils/setuptools interoperability mess. Conda
being solely a packaging solution can sidestep all this complexity (which
is again a the logical and smart thing to do if what you care is
deployment).

Having pip understand conda packages is a non trivial endeavour: since
conda packages are relocatable, they are not compatible with the usual
python interpreters, and setuptools metadata is neither a subset or a
superset of conda metadata. Regarding conda/pip interoperability, there are
things that conda could (and IMO should) do, such as writing the expected
metadata in site-packages (PEP 376). Currently, conda does not recognize
packages installed by pip (because it does not implement PEP 376 and co),
so if you do a "pip install ." of a package, it will likely break existing
package if present.

David
Chris Barker
2016-01-16 03:00:29 UTC
Permalink
hmm -- didn't mean to rev this up quite so much -- sorry!

But it's a good conversation to have, so...
Post by Benjamin Root
That being said... I take exception to your assertion that anaconda is
*the* solution to the packaging problem.
I think we need to keep some things straight here:

"conda" is a binary package management system.

"Anaconda" is a python (and other stuff) distribution, built with conda.

In practice, everyone ( I know of ) uses the Anaconda distribution (or at
least the default conda channel) when using conda, but in theory, you could
maintain your an entirely distinct distribution with conda as the tool.

Also in practice, conda is so easy because continuum has done the hard work
of building a lot of the packages we all need -- there are still a lot
being maintained by the community in various ways, but frankly, we do
depend on continuum for all the hard work. But working on/with conda does
not lock you into that if you think it's not serving your needs.

And this discussion, (for me anyway) is about tools and the way forward,
not existing packages.

So onward!
Post by Benjamin Root
I still have a number of issues, particularly with the interactions of
GDAL, shapely, and Basemap (they all seek out libgeos_c differently), and I
have to use my own build of GDAL to enable many of the features that we use
(the vanilla GDAL put out by Continuum just has the default options, and is
quite limited).
Yeah, GDAL/OGR is a F%$#ing nightmare -- and I do wish that Anaconda had a
better build, but frankly, there is no system that's going to make that any
easier -- do any of the Linux distros ship a really good compatible, up to
date set of these libs -- and OS-X and Windows? yow! (Though Chris Gohlke
is a wonder!)
Post by Benjamin Root
If I don't set up my environment *just right* one of those packages will
fail to import in some way due to being unable to find their particular
version of libgeos_c. I haven't figure it out exactly why this happens, but
it is very easy to break such an environment this way after an update.
Maybe conda could be improved to make this easier, I don't know (though do
checkout out the IOOS channel on anaconda.org Filipe has done some nice
work on this)
Post by Benjamin Root
In a clutch, we had our IT staff manually build mod_wsgi against
anaconda's python, but they weren't too happy about that, due to mod_wsgi
no longer getting updated via yum.
I'm not sure how pip helps you out here, either. sure for easy-to-compile
from source packages, sure, you can just pip install, and you'll get a
package compatible with your (system) python. But binary wheels will give
you the same headaches -- so you're back to expecting your linux dstro to
provide everything, which they don't :-(

I understand that the IT folks want everything to come from their OS vendor
-- they like that -- but it simply isn't practical for scipy-based web
services. And once you've got most of your stack coming from another
source, is it really a big deal for python to come from somewhere else also
(and apache, and ???) -- conda at least is a technology that _can_ provide
an integrated system that includes all this -- I don't think you're going
to be pip-installling apache anytime soon!
Post by Benjamin Root
But the design of conda is that it is intended to be a user-space
package-manager. mod_wsgi is installed via root/apache user, which is
siloed off from the user. I would have to (in theory) go and install conda
for the apache user and likely have to install a conda "apache" package and
mod_wsgi package.
This seems quite reasonable to me frankly. Plus, you can install conda
centrally as well, and then the apache user can get access to it (bot not
modify it)
Post by Benjamin Root
I seriously doubt this would be an acceptable solution for many IT
administrators who would rather depend upon the upstream distributions who
are going to be very quick about getting updates out the door and are
updated automatically through yum or apt.
This is true -- but nothing to do with the technology -- it's a social
problem. And the auto updates? Ha! the real problem with admins that want
to use the system package managers is that they still insist you run python
2.6 for god's sake :-)
Post by Benjamin Root
So, again, I love conda for what it can do when it works well. I only take
exception to the notion that it can address *all* problems, because there
are some problems that it just simply isn't properly situated for.
still true, of course, but it can address many more than pip.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Chris Barker
2016-01-19 17:03:23 UTC
Permalink
hmm -- didn't mean to rev this up quite so much -- sorry!

But it's a good conversation to have, so...
Post by Benjamin Root
That being said... I take exception to your assertion that anaconda is
*the* solution to the packaging problem.
I think we need to keep some things straight here:

"conda" is a binary package management system.

"Anaconda" is a python (and other stuff) distribution, built with conda.

In practice, everyone ( I know of ) uses the Anaconda distribution (or at
least the default conda channel) when using conda, but in theory, you could
maintain your an entirely distinct distribution with conda as the tool.

Also in practice, conda is so easy because continuum has done the hard work
of building a lot of the packages we all need -- there are still a lot
being maintained by the community in various ways, but frankly, we do
depend on continuum for all the hard work. But working on/with conda does
not lock you into that if you think it's not serving your needs.

And this discussion, (for me anyway) is about tools and the way forward,
not existing packages.

So onward!
Post by Benjamin Root
I still have a number of issues, particularly with the interactions of
GDAL, shapely, and Basemap (they all seek out libgeos_c differently), and I
have to use my own build of GDAL to enable many of the features that we use
(the vanilla GDAL put out by Continuum just has the default options, and is
quite limited).
Yeah, GDAL/OGR is a F%$#ing nightmare -- and I do wish that Anaconda had a
better build, but frankly, there is no system that's going to make that any
easier -- do any of the Linux distros ship a really good compatible, up to
date set of these libs -- and OS-X and Windows? yow! (Though Chris Gohlke
is a wonder!)
Post by Benjamin Root
If I don't set up my environment *just right* one of those packages will
fail to import in some way due to being unable to find their particular
version of libgeos_c. I haven't figure it out exactly why this happens, but
it is very easy to break such an environment this way after an update.
Maybe conda could be improved to make this easier, I don't know (though do
checkout out the IOOS channel on anaconda.org Filipe has done some nice
work on this)
Post by Benjamin Root
In a clutch, we had our IT staff manually build mod_wsgi against
anaconda's python, but they weren't too happy about that, due to mod_wsgi
no longer getting updated via yum.
I'm not sure how pip helps you out here, either. Sure for easy-to-compile
from source packages, you can just pip install, and you'll get a package
compatible with your (system) python. But binary wheels will give you the
same headaches -- so you're back to expecting your linux dstro to provide
everything, which they don't :-(

I understand that the IT folks want everything to come from their OS vendor
-- they like that -- but it simply isn't practical for scipy-based web
services. And once you've got most of your stack coming from another
source, is it really a big deal for python to come from somewhere else also
(and apache, and ???) -- conda at least is a technology that _can_ provide
an integrated system that includes all this -- I don't hink you're going to
be pip-installling apache anytime soon! (or node, or ???)
Post by Benjamin Root
If anaconda was the end-all, be-all solution, then it should just be a
simple matter to do "conda install mod_wsgi". But the design of conda is
that it is intended to be a user-space package-manager.
Then you can either install it as the web user (or apache user), or install
it as a system access. I haven't done this, but I don't think it all that
hard -- you're then going to need to sudo to install/upgrade anything new,
but that's expected.

So, again, I love conda for what it can do when it works well. I only take
Post by Benjamin Root
exception to the notion that it can address *all* problems, because there
are some problems that it just simply isn't properly situated for.
But pip isn't situated for any of these either -- I'm still confused as to
the point here.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Travis Oliphant
2016-01-15 18:50:44 UTC
Permalink
Post by Matthew Brett
On Thu, Jan 14, 2016 at 9:14 AM, Chris Barker - NOAA Federal
Post by Chris Barker - NOAA Federal
Post by Oscar Benjamin
Post by Chris Barker
Also, you have the problem that there is one PyPi -- so where do you
put
Post by Chris Barker - NOAA Federal
Post by Oscar Benjamin
Post by Chris Barker
your nifty wheels that depend on other binary wheels? you may need to
fork
Post by Chris Barker - NOAA Federal
Post by Oscar Benjamin
Post by Chris Barker
every package you want to build :-(
Is this a real problem or a theoretical one? Do you know of some
situation where this wheel to wheel dependency will occur that won't
just be solved in some other way?
It's real -- at least during the whole bootstrapping period. Say I
build a nifty hdf5 binary wheel -- I could probably just grab the name
"libhdf5" on PyPI. So far so good. But the goal here would be to have
netcdf and pytables and GDAL and who knows what else then link against
that wheel. But those projects are all supported be different people,
that all have their own distribution strategy. So where do I put
binary wheels of each of those projects that depend on my libhdf5
wheel? _maybe_ I would put it out there, and it would all grow
organically, but neither the culture nor the tooling support that
approach now, so I'm not very confident you could gather adoption.
I don't think there's a very large amount of cultural work - but some
to be sure.
pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py
where all the wheels come from pypi. So, I don't think this is really
outside our range, even if the problem is a little more difficult for
Linux.
Post by Chris Barker - NOAA Federal
Even beyond the adoption period, sometimes you need to do stuff in
more than one way -- look at the proliferation of channels on
Anaconda.org.
This is more likely to work if there is a good infrastructure for
third parties to build and distribute the binaries -- e.g.
Anaconda.org.
I thought that Anaconda.org allows pypi channels as well?
It does: http://pypi.anaconda.org/

-Travis
Post by Matthew Brett
Matthew
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
*Travis Oliphant*
*Co-founder and CEO*


@teoliphant
512-222-5440
http://www.continuum.io
James E.H. Turner
2016-01-14 14:08:32 UTC
Permalink
Right. There's a small problem which is that the base linux system isn't just
"CentOS 5", it's "CentOS 5 and here's the list of libraries that you're
allowed to link to: ...", where that list is empirically chosen to include
only stuff that really is installed on ~all linux machines and for which the
ABI really has been stable in practice over multiple years and distros (so
e.g. no OpenSSL).
So the key next step is for someone to figure out and write down that list.
Continuum and Enthought both have versions of it that we know are good...
You mean something more empirical than
http://refspecs.linuxfoundation.org/lsb.shtml ? I tend to
cross-reference with that when adding stuff to Ureka and just err
on the side of including things where feasible, then of course test
it on the main target platforms. We have also been building on
CentOS 5-6 BTW (I believe the former is about to be unsupported).

Just skimming the thread...

Cheers,

James.
Oscar Benjamin
2016-01-09 11:39:22 UTC
Permalink
Post by Chris Barker
Post by Robert McGibbon
I'm not sure if this is the right path for numpy or not,
probably not -- AFAICT, the PyPa folks aren't interested in solving teh
problems we have in the scipy community -- we can tweak around the edges,
but we wont get there without a commitment to really solve the issues -- and
if pip did that, it would essentially be conda -- non one wants to
re-impliment conda.
I think that's a little unfair to the PyPA people. They would like to
solve all of these problems is just a question of priority and
expertise. As always in open source you have to scratch your own itch
and those guys are working on other things like the security,
stability, scalability of the infrastructure, consistency of pip's
version handling and dependency resolution etc.

Linux wheels is a problem that has been discussed on distutils-sig.
The reason it hasn't happened is that it's a lower priority than
wheels for OSX/Windows because:
1) Most distros already package this stuff i.e. apt-get numpy
2) On Linux it's much easier to get the appropriate compilers so that
pip can build e.g. numpy.
3) The average Linux user is more capable of solving these problems.
4) Getting binary distribution to work across all Linux distributions
is significantly harder than for Windows/OSX because of the myriad
different distros/versions.

Considering point 2 pip install numpy etc. already works for a lot of
Linux users even if it is slow because of the time taken to compile (3
minutes on my system). Depending on your use case that problem is
partially solved by wheel caching. So if Linux wheels were allowed but
didn't always work then that would be a regression for many users. On
OSX binary wheels for numpy are already available and work fine AFAIK.
The absence of binary numpy wheels for Windows is not down to PyPA.

Considering point 4 the idea of compiling on an old base Linux system
has been discussed on distutils-sig before and it seems likely to
work. The problem is really about the external non-libc dependencies
though. The reason progress there has stalled is not because the PyPA
folks don't want to solve it but rather because they have other
priorities and are hoping that people with more expertise in that area
will step up to address those problems. Most of the issues stem from
the scientific Python community so ideally someone from the scientific
Python community would address how to solve those problems.

Recently Nathaniel brought some suggestions to distutils-sig to
address the problem of build-requires which is a particular pain
point. I think that people there appreciated the effort from someone
who understands the needs of hard-to-build packages to improve the way
that pip/PyPI works in that area. There was a lot of confusion from
people not understanding each others needs but ultimately I thought
there was agreement on how to move forward. (Although what happened to
that in the end?)

The same can happen with other problems like Linux wheels. If you
guys here have a clear idea of how to solve the external dependency
problem then I'm sure they'll be receptive. Personally I think the
best approach is the pyopenblas approach: internalise the external
dependency so that pip can work with it. This is precisely what
Anaconda does and there's actually no need to make substantive changes
to the way pip/pypi/wheel works in order to achieve that. It just
needs someone to package the external dependencies as sdist/wheel (and
for PyPI to allow Linux wheels).

--
Oscar
Sandro Tosi
2016-01-09 12:44:30 UTC
Permalink
Post by Matthew Brett
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really
appreciate if you dont suggest to use pip to install packages in
Debian, or at least not as the only solution.
--
Sandro "morph" Tosi
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
Matthew Brett
2016-01-09 18:08:17 UTC
Permalink
Hi Sandro,
Post by Sandro Tosi
Post by Matthew Brett
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really
appreciate if you dont suggest to use pip to install packages in
Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.

I know what you mean, but I can't yet see how to write a page that
would be good for explaining the benefits / tradeoffs of using deb
packages vs mainly or only pip packages vs a mix of the two. Do you
have any thoughts?

Cheers,

Matthew
Sandro Tosi
2016-01-10 02:57:41 UTC
Permalink
Post by Matthew Brett
Hi Sandro,
Post by Sandro Tosi
Post by Matthew Brett
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really
appreciate if you dont suggest to use pip to install packages in
Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that
would be good for explaining the benefits / tradeoffs of using deb
packages vs mainly or only pip packages vs a mix of the two. Do you
have any thoughts?
you can start by making extremely clear that this is not the Debian
supported way to install python modules on a Debian system, that if a
user uses pip to do it, it's very likely other applications or modules
will fail, that if they have any problem with anything python related,
they are on their own as they "broke" their system on purpose. thanks
for considering
--
Sandro "morph" Tosi
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
Matthew Brett
2016-01-10 03:55:28 UTC
Permalink
Post by Sandro Tosi
Post by Matthew Brett
Hi Sandro,
Post by Sandro Tosi
Post by Matthew Brett
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really
appreciate if you dont suggest to use pip to install packages in
Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that
would be good for explaining the benefits / tradeoffs of using deb
packages vs mainly or only pip packages vs a mix of the two. Do you
have any thoughts?
you can start by making extremely clear that this is not the Debian
supported way to install python modules on a Debian system, that if a
user uses pip to do it, it's very likely other applications or modules
will fail, that if they have any problem with anything python related,
they are on their own as they "broke" their system on purpose. thanks
for considering
I updated the page with more on reasons to prefer Debian packages over
installing with pip:

https://matthew-brett.github.io/pydagogue/installing_on_debian.html

Is that enough to get the message across?

Cheers,

Matthew
Sandro Tosi
2016-01-10 11:40:49 UTC
Permalink
Post by Matthew Brett
I updated the page with more on reasons to prefer Debian packages over
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Is that enough to get the message across?
That looks a lot better, thanks! I also kinda agree with all Nathaniel
said on the matter
--
Sandro "morph" Tosi
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
Matthew Brett
2016-01-10 19:25:48 UTC
Permalink
Post by Sandro Tosi
Post by Matthew Brett
I updated the page with more on reasons to prefer Debian packages over
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Is that enough to get the message across?
That looks a lot better, thanks! I also kinda agree with all Nathaniel
said on the matter
Good so. You probably saw, I already removed use of sudo on the page as well.

Matthew
Nathaniel Smith
2016-01-10 04:49:48 UTC
Permalink
Post by Matthew Brett
Hi Sandro,
Post by Sandro Tosi
Post by Matthew Brett
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really
appreciate if you dont suggest to use pip to install packages in
Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that
would be good for explaining the benefits / tradeoffs of using deb
packages vs mainly or only pip packages vs a mix of the two. Do you
have any thoughts?
Why not replace all the "sudo pip" calls with "pip --user"? The trade offs
between Debian-installed packages versus pip --user installed packages are
subtle, and both are good options. Personal I'd generally recommend anyone
actively developing python code to skip straight to pip for most things,
since you'll eventually end up there anyway, but this is definitely
debatable and situation dependent. On the other hand, "sudo pip"
specifically is something I'd never recommend, and indeed has the potential
to totally break your system.

-n
Matthew Brett
2016-01-10 04:58:36 UTC
Permalink
Post by Nathaniel Smith
Post by Matthew Brett
Hi Sandro,
Post by Sandro Tosi
Post by Matthew Brett
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really
appreciate if you dont suggest to use pip to install packages in
Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that
would be good for explaining the benefits / tradeoffs of using deb
packages vs mainly or only pip packages vs a mix of the two. Do you
have any thoughts?
Why not replace all the "sudo pip" calls with "pip --user"? The trade offs
between Debian-installed packages versus pip --user installed packages are
subtle, and both are good options. Personal I'd generally recommend anyone
actively developing python code to skip straight to pip for most things,
since you'll eventually end up there anyway, but this is definitely
debatable and situation dependent. On the other hand, "sudo pip"
specifically is something I'd never recommend, and indeed has the potential
to totally break your system.
Sure, but I don't think the page is suggesting doing ``sudo pip`` for
anything other than upgrading pip and virtualenv(wrapper) - and I
don't think that is likely to break the system.

Matthew
Nathaniel Smith
2016-01-10 06:22:42 UTC
Permalink
Post by Matthew Brett
Post by Nathaniel Smith
Post by Matthew Brett
Hi Sandro,
Post by Sandro Tosi
Post by Matthew Brett
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really
appreciate if you dont suggest to use pip to install packages in
Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that
would be good for explaining the benefits / tradeoffs of using deb
packages vs mainly or only pip packages vs a mix of the two. Do you
have any thoughts?
Why not replace all the "sudo pip" calls with "pip --user"? The trade offs
between Debian-installed packages versus pip --user installed packages are
subtle, and both are good options. Personal I'd generally recommend anyone
actively developing python code to skip straight to pip for most things,
since you'll eventually end up there anyway, but this is definitely
debatable and situation dependent. On the other hand, "sudo pip"
specifically is something I'd never recommend, and indeed has the potential
to totally break your system.
Sure, but I don't think the page is suggesting doing ``sudo pip`` for
anything other than upgrading pip and virtualenv(wrapper) - and I
don't think that is likely to break the system.
It could... a quick glance suggests that currently installing
virtualenvwrapper like that will also pull in some random pypi
snapshot of stevedore, which will shadow the built-in package version.
And then stevedore is used by tons of different debian packages,
including large parts of openstack...

But more to the point, the target audience for your page is hardly
equipped to perform that kind of analysis, never mind in the general
case of using 'sudo pip' for arbitrary Python packages, and your very
first example is one that demonstrates bad habits... So personally I'd
avoid mentioning the possibility of 'sudo pip', or better yet
explicitly warn against it.

-n
--
Nathaniel J. Smith -- http://vorpus.org
Loading...