Ilhan Polat
2017-01-08 15:17:03 UTC
Hi everyone,
I was stalking the deprecating the numpy.matrix discussion on the other
thread and I wondered maybe the mailing list is a better place for the
discussion about something I've been meaning to ask the dev members. I
thought mailing lists are something we dumped using together with ICQ and
geocities stuff but apparently not :-)
Anyways, first thing is first: I have been in need of the ill-conditioned
warning behavior of matlab (and possibly other software suites) for my own
work. So I looked around in the numpy issues and found
https://github.com/numpy/numpy/issues/3755 some time ago. Then I've learned
from @rkern that there were C translations involved in the numpy source and
frankly I couldn't even find the entry point of how the project is
structured so I've switched to SciPy side where things are a bit more
convenient. Next to teaching me more about f2py machinery, I have noticed
that the linear algebra module is a bit less competitive than the usual
level of scipy though it is definitely a personal opinion.
So in order to get the ill-conditioning (or at least the condition number)
I've wrapped up a PR using the expert routines of LAPACK (which is I think
ready to merge) but still it is far from the contemporary software
convenience that you generally get.
https://github.com/scipy/scipy/pull/6775
The "assume_a" keyword introduced here is hopefully modular enough that
should there be any need for more structures we can simply keep adding to
the list without any backwards compatibility. It will be at least offering
more options than what we have currently. The part that I would like to
discuss requires a bit of intro so please bear with me. Let me copy/paste
the part from the old PR:
Around many places online, we can witness the rant about numpy/scipy not
letting the users know about the conditioning for example Mike Croucher's
blog <http://www.walkingrandomly.com/?p=5092> and numpy/numpy#3755
<https://github.com/numpy/numpy/issues/3755>
Since we don't have any central backslash function that optimizes
depending on the input data, should we create a function, let's name it
with the matlab equivalent for the time being linsolve such that it
automatically calls for the right solver? This way, we can avoid writing
new functions for each LAPACK driver . As a reference here is a SO thread
<http://stackoverflow.com/questions/18553210/how-to-implement-matlabs-mldivide-a-k-a-the-backslash-operator>
that summarizes the linsolve
<http://nl.mathworks.com/help/matlab/ref/linsolve.html> functionality.
I'm sure you are aware, but just for completeness, the linear equation
solvers are often built around the concept of polyalgorithm which is a
fancy way of saying that the array is tested consecutively for certain
structures and the checks are ordered in such a way that the simpler
structure is tested the sooner. E.g. first check for diagonal matrix, then
for upper/lower triangular then permuted triangular then symmetrical and so
on. Here is also another example from AdvanPix http://www.advanpix.com/2016/
10/07/architecture-of-linear-systems-solver/
Now, according to what I have coded and optimized as much as I can, a pure
Python is not acceptable as an overhead during these checks. It would
definitely be a noticeable slowdown if this was in place in the existing
linalg.solve however I think this is certainly doable in the low-level
C/FORTRAN level. CPython is certainly faster but I think only a straight
C/FORTRAN implementation would cut it. Note that we only need the discovery
of the structure then we can pass to the dedicated solver as is. Hence I'm
not saying that we should recode the existing solve functionality. We
already have the branching in place to ?GE/SY/HE/POSVX routines.
-------
The second issue about the current linalg.solve function is when trying to
solve for right inverse e.g. xA = b. Again with some copy/paste: The right
inversion is currently a bit annoying, that is to say if we would like to
compute, say, BA^{-1}, then the user has to explicitly transpose the
explicitly transposed equation to avoid using an explicit inv(whose use
should be discouraged anyways)
x = scipy.linalg.solve(A.T, B.T).T.
Since expert drivers come with a trans switch that can internally handle
whether to solve the transposed or the regular equation, these routines
avoid the A.T off-the-shelf. I am wondering what might be the best way to
add a "r_inv" keyword such that the B.T is also handled at the FORTRAN
level instead such that the user can simply write "solve(A,B, r_inv=True)".
Because we don't have a backslash operation we could at least provide this
much as convenience I guess.
I would love to have go at it but I'm definitely not competent enough in
C/FORTRAN at the production level so I was wondering whether I could get
some help about this. Anyways, I hope I could make my point with a rather
lengthy post. Please let me know if this is a plausible feature
ilhan
PS: In case gmail links won't be parsed, here are the inline links
MC blog: http://www.walkingrandomly.com/?p=5092
SO thread : http://stackoverflow.com/questions/18553210/how-to-
implement-matlabs-mldivide-a-k-a-the-backslash-operator
linsolve/mldivide page : http://nl.mathworks.com/help/
matlab/ref/mldivide.html
I was stalking the deprecating the numpy.matrix discussion on the other
thread and I wondered maybe the mailing list is a better place for the
discussion about something I've been meaning to ask the dev members. I
thought mailing lists are something we dumped using together with ICQ and
geocities stuff but apparently not :-)
Anyways, first thing is first: I have been in need of the ill-conditioned
warning behavior of matlab (and possibly other software suites) for my own
work. So I looked around in the numpy issues and found
https://github.com/numpy/numpy/issues/3755 some time ago. Then I've learned
from @rkern that there were C translations involved in the numpy source and
frankly I couldn't even find the entry point of how the project is
structured so I've switched to SciPy side where things are a bit more
convenient. Next to teaching me more about f2py machinery, I have noticed
that the linear algebra module is a bit less competitive than the usual
level of scipy though it is definitely a personal opinion.
So in order to get the ill-conditioning (or at least the condition number)
I've wrapped up a PR using the expert routines of LAPACK (which is I think
ready to merge) but still it is far from the contemporary software
convenience that you generally get.
https://github.com/scipy/scipy/pull/6775
The "assume_a" keyword introduced here is hopefully modular enough that
should there be any need for more structures we can simply keep adding to
the list without any backwards compatibility. It will be at least offering
more options than what we have currently. The part that I would like to
discuss requires a bit of intro so please bear with me. Let me copy/paste
the part from the old PR:
Around many places online, we can witness the rant about numpy/scipy not
letting the users know about the conditioning for example Mike Croucher's
blog <http://www.walkingrandomly.com/?p=5092> and numpy/numpy#3755
<https://github.com/numpy/numpy/issues/3755>
Since we don't have any central backslash function that optimizes
depending on the input data, should we create a function, let's name it
with the matlab equivalent for the time being linsolve such that it
automatically calls for the right solver? This way, we can avoid writing
new functions for each LAPACK driver . As a reference here is a SO thread
<http://stackoverflow.com/questions/18553210/how-to-implement-matlabs-mldivide-a-k-a-the-backslash-operator>
that summarizes the linsolve
<http://nl.mathworks.com/help/matlab/ref/linsolve.html> functionality.
I'm sure you are aware, but just for completeness, the linear equation
solvers are often built around the concept of polyalgorithm which is a
fancy way of saying that the array is tested consecutively for certain
structures and the checks are ordered in such a way that the simpler
structure is tested the sooner. E.g. first check for diagonal matrix, then
for upper/lower triangular then permuted triangular then symmetrical and so
on. Here is also another example from AdvanPix http://www.advanpix.com/2016/
10/07/architecture-of-linear-systems-solver/
Now, according to what I have coded and optimized as much as I can, a pure
Python is not acceptable as an overhead during these checks. It would
definitely be a noticeable slowdown if this was in place in the existing
linalg.solve however I think this is certainly doable in the low-level
C/FORTRAN level. CPython is certainly faster but I think only a straight
C/FORTRAN implementation would cut it. Note that we only need the discovery
of the structure then we can pass to the dedicated solver as is. Hence I'm
not saying that we should recode the existing solve functionality. We
already have the branching in place to ?GE/SY/HE/POSVX routines.
-------
The second issue about the current linalg.solve function is when trying to
solve for right inverse e.g. xA = b. Again with some copy/paste: The right
inversion is currently a bit annoying, that is to say if we would like to
compute, say, BA^{-1}, then the user has to explicitly transpose the
explicitly transposed equation to avoid using an explicit inv(whose use
should be discouraged anyways)
x = scipy.linalg.solve(A.T, B.T).T.
Since expert drivers come with a trans switch that can internally handle
whether to solve the transposed or the regular equation, these routines
avoid the A.T off-the-shelf. I am wondering what might be the best way to
add a "r_inv" keyword such that the B.T is also handled at the FORTRAN
level instead such that the user can simply write "solve(A,B, r_inv=True)".
Because we don't have a backslash operation we could at least provide this
much as convenience I guess.
I would love to have go at it but I'm definitely not competent enough in
C/FORTRAN at the production level so I was wondering whether I could get
some help about this. Anyways, I hope I could make my point with a rather
lengthy post. Please let me know if this is a plausible feature
ilhan
PS: In case gmail links won't be parsed, here are the inline links
MC blog: http://www.walkingrandomly.com/?p=5092
SO thread : http://stackoverflow.com/questions/18553210/how-to-
implement-matlabs-mldivide-a-k-a-the-backslash-operator
linsolve/mldivide page : http://nl.mathworks.com/help/
matlab/ref/mldivide.html