[Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve

Discussion:

Ilhan Polat

2017-01-08 15:17:03 UTC

Hi everyone,

I was stalking the deprecating the numpy.matrix discussion on the other
thread and I wondered maybe the mailing list is a better place for the
discussion about something I've been meaning to ask the dev members. I
thought mailing lists are something we dumped using together with ICQ and
geocities stuff but apparently not :-)

Anyways, first thing is first: I have been in need of the ill-conditioned
warning behavior of matlab (and possibly other software suites) for my own
work. So I looked around in the numpy issues and found
https://github.com/numpy/numpy/issues/3755 some time ago. Then I've learned
from @rkern that there were C translations involved in the numpy source and
frankly I couldn't even find the entry point of how the project is
structured so I've switched to SciPy side where things are a bit more
convenient. Next to teaching me more about f2py machinery, I have noticed
that the linear algebra module is a bit less competitive than the usual
level of scipy though it is definitely a personal opinion.

So in order to get the ill-conditioning (or at least the condition number)
I've wrapped up a PR using the expert routines of LAPACK (which is I think
ready to merge) but still it is far from the contemporary software
convenience that you generally get.

https://github.com/scipy/scipy/pull/6775

The "assume_a" keyword introduced here is hopefully modular enough that
should there be any need for more structures we can simply keep adding to
the list without any backwards compatibility. It will be at least offering
more options than what we have currently. The part that I would like to
discuss requires a bit of intro so please bear with me. Let me copy/paste
the part from the old PR:

Around many places online, we can witness the rant about numpy/scipy not
letting the users know about the conditioning for example Mike Croucher's
blog <http://www.walkingrandomly.com/?p=5092> and numpy/numpy#3755
<https://github.com/numpy/numpy/issues/3755>

Since we don't have any central backslash function that optimizes
depending on the input data, should we create a function, let's name it
with the matlab equivalent for the time being linsolve such that it
automatically calls for the right solver? This way, we can avoid writing
new functions for each LAPACK driver . As a reference here is a SO thread
<http://stackoverflow.com/questions/18553210/how-to-implement-matlabs-mldivide-a-k-a-the-backslash-operator>
that summarizes the linsolve
<http://nl.mathworks.com/help/matlab/ref/linsolve.html> functionality.

I'm sure you are aware, but just for completeness, the linear equation
solvers are often built around the concept of polyalgorithm which is a
fancy way of saying that the array is tested consecutively for certain
structures and the checks are ordered in such a way that the simpler
structure is tested the sooner. E.g. first check for diagonal matrix, then
for upper/lower triangular then permuted triangular then symmetrical and so
on. Here is also another example from AdvanPix http://www.advanpix.com/2016/
10/07/architecture-of-linear-systems-solver/

Now, according to what I have coded and optimized as much as I can, a pure
Python is not acceptable as an overhead during these checks. It would
definitely be a noticeable slowdown if this was in place in the existing
linalg.solve however I think this is certainly doable in the low-level
C/FORTRAN level. CPython is certainly faster but I think only a straight
C/FORTRAN implementation would cut it. Note that we only need the discovery
of the structure then we can pass to the dedicated solver as is. Hence I'm
not saying that we should recode the existing solve functionality. We
already have the branching in place to ?GE/SY/HE/POSVX routines.

-------

The second issue about the current linalg.solve function is when trying to
solve for right inverse e.g. xA = b. Again with some copy/paste: The right
inversion is currently a bit annoying, that is to say if we would like to
compute, say, BA^{-1}, then the user has to explicitly transpose the
explicitly transposed equation to avoid using an explicit inv(whose use
should be discouraged anyways)
x = scipy.linalg.solve(A.T, B.T).T.

Since expert drivers come with a trans switch that can internally handle
whether to solve the transposed or the regular equation, these routines
avoid the A.T off-the-shelf. I am wondering what might be the best way to
add a "r_inv" keyword such that the B.T is also handled at the FORTRAN
level instead such that the user can simply write "solve(A,B, r_inv=True)".
Because we don't have a backslash operation we could at least provide this
much as convenience I guess.

I would love to have go at it but I'm definitely not competent enough in
C/FORTRAN at the production level so I was wondering whether I could get
some help about this. Anyways, I hope I could make my point with a rather
lengthy post. Please let me know if this is a plausible feature

ilhan

PS: In case gmail links won't be parsed, here are the inline links

MC blog: http://www.walkingrandomly.com/?p=5092
SO thread : http://stackoverflow.com/questions/18553210/how-to-
implement-matlabs-mldivide-a-k-a-the-backslash-operator
linsolve/mldivide page : http://nl.mathworks.com/help/
matlab/ref/mldivide.html

Ralf Gommers

2017-01-09 10:33:56 UTC

Permalink

Post by Ilhan Polat
Hi everyone,
I was stalking the deprecating the numpy.matrix discussion on the other
thread and I wondered maybe the mailing list is a better place for the
discussion about something I've been meaning to ask the dev members. I
thought mailing lists are something we dumped using together with ICQ and
geocities stuff but apparently not :-)
Anyways, first thing is first: I have been in need of the ill-conditioned
warning behavior of matlab (and possibly other software suites) for my own
work. So I looked around in the numpy issues and found
https://github.com/numpy/numpy/issues/3755 some time ago. Then I've
source and frankly I couldn't even find the entry point of how the project
is structured so I've switched to SciPy side where things are a bit more
convenient. Next to teaching me more about f2py machinery, I have noticed
that the linear algebra module is a bit less competitive than the usual
level of scipy though it is definitely a personal opinion.
So in order to get the ill-conditioning (or at least the condition number)
I've wrapped up a PR using the expert routines of LAPACK (which is I think
ready to merge) but still it is far from the contemporary software
convenience that you generally get.
https://github.com/scipy/scipy/pull/6775
The "assume_a" keyword introduced here is hopefully modular enough that
should there be any need for more structures we can simply keep adding to
the list without any backwards compatibility. It will be at least offering
more options than what we have currently. The part that I would like to
discuss requires a bit of intro so please bear with me. Let me copy/paste
Around many places online, we can witness the rant about numpy/scipy not
letting the users know about the conditioning for example Mike Croucher's
blog <http://www.walkingrandomly.com/?p=5092> and numpy/numpy#3755
<https://github.com/numpy/numpy/issues/3755>
Since we don't have any central backslash function that optimizes
depending on the input data, should we create a function, let's name it
with the matlab equivalent for the time being linsolve such that it
automatically calls for the right solver? This way, we can avoid writing
new functions for each LAPACK driver . As a reference here is a SO thread
<http://stackoverflow.com/questions/18553210/how-to-implement-matlabs-mldivide-a-k-a-the-backslash-operator>
that summarizes the linsolve
<http://nl.mathworks.com/help/matlab/ref/linsolve.html> functionality.

Note that you're proposing a new scipy feature (right?) on the numpy
list....

This sounds like a good idea to me. As a former heavy Matlab user I
remember a lot of things to dislike, but "\" behavior was quite nice.

Post by Ilhan Polat
I'm sure you are aware, but just for completeness, the linear equation
solvers are often built around the concept of polyalgorithm which is a
fancy way of saying that the array is tested consecutively for certain
structures and the checks are ordered in such a way that the simpler
structure is tested the sooner. E.g. first check for diagonal matrix, then
for upper/lower triangular then permuted triangular then symmetrical and so
on. Here is also another example from AdvanPix
http://www.advanpix.com/2016/10/07/architecture-of-linear-systems-solver/
Now, according to what I have coded and optimized as much as I can, a pure
Python is not acceptable as an overhead during these checks. It would
definitely be a noticeable slowdown if this was in place in the existing
linalg.solve however I think this is certainly doable in the low-level
C/FORTRAN level.

How much is a noticeable slowdown? Note that we still have the current
interfaces available for users that know what they need, so a nice
convenience function that is say 5-10% slower would not be the end of the
world.

Ralf

Post by Ilhan Polat
CPython is certainly faster but I think only a straight C/FORTRAN
implementation would cut it. Note that we only need the discovery of the
structure then we can pass to the dedicated solver as is. Hence I'm not
saying that we should recode the existing solve functionality. We already
have the branching in place to ?GE/SY/HE/POSVX routines.
-------
The second issue about the current linalg.solve function is when trying to
solve for right inverse e.g. xA = b. Again with some copy/paste: The right
inversion is currently a bit annoying, that is to say if we would like to
compute, say, BA^{-1}, then the user has to explicitly transpose the
explicitly transposed equation to avoid using an explicit inv(whose use
should be discouraged anyways)
x = scipy.linalg.solve(A.T, B.T).T.
Since expert drivers come with a trans switch that can internally handle
whether to solve the transposed or the regular equation, these routines
avoid the A.T off-the-shelf. I am wondering what might be the best way to
add a "r_inv" keyword such that the B.T is also handled at the FORTRAN
level instead such that the user can simply write "solve(A,B, r_inv=True)".
Because we don't have a backslash operation we could at least provide this
much as convenience I guess.
I would love to have go at it but I'm definitely not competent enough in
C/FORTRAN at the production level so I was wondering whether I could get
some help about this. Anyways, I hope I could make my point with a rather
lengthy post. Please let me know if this is a plausible feature
ilhan
PS: In case gmail links won't be parsed, here are the inline links
MC blog: http://www.walkingrandomly.com/?p=5092
SO thread : http://stackoverflow.com/questions/18553210/how-to-implement
-matlabs-mldivide-a-k-a-the-backslash-operator
linsolve/mldivide page : http://nl.mathworks.com/help/m
atlab/ref/mldivide.html
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Ilhan Polat

2017-01-09 11:27:57 UTC

Permalink

Post by Ralf Gommers
Note that you're proposing a new scipy feature (right?) on the numpy

list....

Post by Ralf Gommers
This sounds like a good idea to me. As a former heavy Matlab user I

remember a lot of things to dislike, but "\" behavior was quite nice.

Correct, I am not sure where this might go in. It seemed like a NumPy array
operation (touching array elements rapidly etc. can also be added for
similar functionalities other than solve) hence the NumPy list. But of
course it can be pushed as an exclusive SciPy feature. I'm not sure what
the outlook on np.linalg.solve is.

Post by Ralf Gommers
How much is a noticeable slowdown? Note that we still have the current

interfaces available for users that know what they need, so a nice
convenience function that is say 5-10% slower would not be the end of the
world.

the fastest case was around 150-400% slower but of course it might be the
case that I'm not using the fastest methods. It was mostly shuffling things
around and using np.any on them in the pure python3 case. I will cook up
something again for the baseline as soon as I have time.

j***@gmail.com

2017-01-09 19:30:20 UTC

Permalink

Post by Ralf Gommers

Post by Ralf Gommers
Note that you're proposing a new scipy feature (right?) on the numpy

list....

Post by Ralf Gommers
This sounds like a good idea to me. As a former heavy Matlab user I

remember a lot of things to dislike, but "\" behavior was quite nice.
Correct, I am not sure where this might go in. It seemed like a NumPy
array operation (touching array elements rapidly etc. can also be added for
similar functionalities other than solve) hence the NumPy list. But of
course it can be pushed as an exclusive SciPy feature. I'm not sure what
the outlook on np.linalg.solve is.

Post by Ralf Gommers
How much is a noticeable slowdown? Note that we still have the current

interfaces available for users that know what they need, so a nice
convenience function that is say 5-10% slower would not be the end of the
world.
the fastest case was around 150-400% slower but of course it might be the
case that I'm not using the fastest methods. It was mostly shuffling things
around and using np.any on them in the pure python3 case. I will cook up
something again for the baseline as soon as I have time.

All this checks sound a bit expensive, if we have almost always completely
unstructured arrays that don't satisfy any special matrix pattern.

In analogy to the type proliferation in Julia to handle those cases: Is
there a way to attach information to numpy arrays that for example signals
that a 2d array is hermitian, banded or diagonal or ...?

(After second thought: maybe completely unstructured is not too expensive
to detect if the checks are short-circuited, one off diagonal element
nonzero - not diagonal, two opposite diagonal different - not symmetric,
...)

Josef

Post by Ralf Gommers
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Ilhan Polat

2017-01-10 01:09:48 UTC

Permalink

Indeed, generic is the cheapest discovery including the worst case that
only the last off-diagonal element is nonzero, a pseudo code is first
remove the diagonals check the remaining parts for nonzero, then check the
upper triangle then lower, then morally triangularness from zero structure
if any then bandedness and so on. If you have access to matlab, then you
can set the sparse monitor to verbose mode " spparms('spumoni', 1) " and
perform a backslash operation on sparse matrices. It will spit out what it
does during the checks.

A = sparse([0 2 0 1 0; 4 -1 -1 0 0; 0 0 0 3 -6; -2 0 0 0 2; 0 0 4 2 0]);
B = sparse([8; -1; -18; 8; 20]);
spparms('spumoni',1)
x = A\B

So every test in the polyalgorithm is cheaper than the next one. I'm not
exactly sure what might be the best strategy yet hence the question. It's
really interesting that LAPACK doesn't have this type of fast checks.

Post by j***@gmail.com

Post by Ralf Gommers

Post by Ralf Gommers
Note that you're proposing a new scipy feature (right?) on the numpy

list....

Post by Ralf Gommers
This sounds like a good idea to me. As a former heavy Matlab user I

remember a lot of things to dislike, but "\" behavior was quite nice.
Correct, I am not sure where this might go in. It seemed like a NumPy
array operation (touching array elements rapidly etc. can also be added for
similar functionalities other than solve) hence the NumPy list. But of
course it can be pushed as an exclusive SciPy feature. I'm not sure what
the outlook on np.linalg.solve is.

Post by Ralf Gommers
How much is a noticeable slowdown? Note that we still have the current

interfaces available for users that know what they need, so a nice
convenience function that is say 5-10% slower would not be the end of the
world.
the fastest case was around 150-400% slower but of course it might be the
case that I'm not using the fastest methods. It was mostly shuffling things
around and using np.any on them in the pure python3 case. I will cook up
something again for the baseline as soon as I have time.

All this checks sound a bit expensive, if we have almost always completely
unstructured arrays that don't satisfy any special matrix pattern.
In analogy to the type proliferation in Julia to handle those cases: Is
there a way to attach information to numpy arrays that for example signals
that a 2d array is hermitian, banded or diagonal or ...?
(After second thought: maybe completely unstructured is not too expensive
to detect if the checks are short-circuited, one off diagonal element
nonzero - not diagonal, two opposite diagonal different - not symmetric,
...)
Josef

Post by Ralf Gommers
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

2017-01-10 01:29:25 UTC

Permalink

Post by Ilhan Polat
So every test in the polyalgorithm is cheaper than the next one. I'm not

exactly sure what might be the best strategy yet hence the question. It's
really interesting that LAPACK doesn't have this type of fast checks.

In Fortran LAPACK, if you have a special structured matrix, you usually
explicitly use packed storage and call the appropriate function type on it.
It's only when you go to a system that only has a generic, unstructured
dense matrix data type that it makes sense to do those kinds of checks.

--
Robert Kern

Ilhan Polat

2017-01-10 03:10:13 UTC

Permalink

Yes, that's precisely the case but when we know the structure we can just
choose the appropriate solver anyhow with a little bit of overhead. What I
mean is that, to my knowledge, FORTRAN routines for checking for
triangularness etc. are absent.

Post by Ilhan Polat

Post by Ilhan Polat
So every test in the polyalgorithm is cheaper than the next one. I'm not

exactly sure what might be the best strategy yet hence the question. It's
really interesting that LAPACK doesn't have this type of fast checks.
In Fortran LAPACK, if you have a special structured matrix, you usually
explicitly use packed storage and call the appropriate function type on it.
It's only when you go to a system that only has a generic, unstructured
dense matrix data type that it makes sense to do those kinds of checks.
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

2017-01-10 03:16:33 UTC

Permalink

Post by Ilhan Polat
Yes, that's precisely the case but when we know the structure we can just

choose the appropriate solver anyhow with a little bit of overhead. What I
mean is that, to my knowledge, FORTRAN routines for checking for
triangularness etc. are absent.

I'm responding to that. The reason that they don't have those FORTRAN
routines for testing for structure inside of a generic dense matrix is that
in FORTRAN it's more natural (and efficient) to just use the explicit
packed structure and associated routines instead. You would only use a
generic dense matrix if you know that there isn't structure in the matrix.
So there are no routines for detecting that structure in generic dense
matrices.

--
Robert Kern

Ilhan Polat

2017-01-10 10:58:17 UTC

Permalink

I've done some benchmarking and it seems that the packed storage comes with
a runtime penalty which agrees with a few links I've found online

https://blog.debroglie.net/2013/09/01/lapack-and-packed-storage/
http://stackoverflow.com/questions/8941678/lapack-are-operations-on-packed-storage-matrices-faster

The access of individual elements in packed stored matrices is expected to
be more costly than in full storage, because of the more complicated
indexing necessary. Hence, I am not sure if this justifies the absence just
by having a dedicated solver for a prescribed structure.

Existence of these polyalgorithms in matlab and not having in lapack should
not imply FORTRAN users always know the structure in their matrices. I will
also ask in LAPACK message board about this for some context.

But thanks tough. As usual there is more to it than meets the eye probably,
ilhan

Post by Ilhan Polat
Yes, that's precisely the case but when we know the structure we can

just choose the appropriate solver anyhow with a little bit of overhead.
What I mean is that, to my knowledge, FORTRAN routines for checking for
triangularness etc. are absent.
I'm responding to that. The reason that they don't have those FORTRAN
routines for testing for structure inside of a generic dense matrix is that
in FORTRAN it's more natural (and efficient) to just use the explicit
packed structure and associated routines instead. You would only use a
generic dense matrix if you know that there isn't structure in the matrix.
So there are no routines for detecting that structure in generic dense
matrices.
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

CJ Carey

2017-01-10 17:26:35 UTC

Permalink

I agree that this seems more like a scipy feature than a numpy feature.

Users with structured matrices often use a sparse matrix format, though the
API for using them in solvers could use some work. (I have a
work-in-progress PR along those lines here:
https://github.com/scipy/scipy/pull/6331)

Perhaps this polyalgorithm approach could be used to dispatch sparse
matrices to the appropriate solver, while optionally checking dense
matrices for structure before dispatching them as well. Usage might look
like:

# if A is sparse, use scipy.sparse.linalg.solve, otherwise use
scipy.linalg.solve
scipy.linalg.generic_solve(A, b)

# converts A to banded representation and calls scipy.linalg.solveh_banded,
regardless if A is sparse or dense
scipy.linalg.generic_solve(A, b, symmetric=True, banded=(-5, 5))

# runs possibly-expensive checks, then dispatches to the appropriate solver
scipy.linalg.generic_solve(A, b, detect_structure=True)

(I'm not advocating for "generic_solve" as the final name, I just needed a
placeholder.)

Post by Ilhan Polat
I've done some benchmarking and it seems that the packed storage comes
with a runtime penalty which agrees with a few links I've found online
https://blog.debroglie.net/2013/09/01/lapack-and-packed-storage/
http://stackoverflow.com/questions/8941678/lapack-are-
operations-on-packed-storage-matrices-faster
The access of individual elements in packed stored matrices is expected to
be more costly than in full storage, because of the more complicated
indexing necessary. Hence, I am not sure if this justifies the absence just
by having a dedicated solver for a prescribed structure.
Existence of these polyalgorithms in matlab and not having in lapack
should not imply FORTRAN users always know the structure in their matrices.
I will also ask in LAPACK message board about this for some context.
But thanks tough. As usual there is more to it than meets the eye probably,
ilhan

Post by Ilhan Polat
Yes, that's precisely the case but when we know the structure we can

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion