[Numpy-discussion] Change default order to Fortran order

Discussion:

[Numpy-discussion] Change default order to Fortran order

Kang Wang

2015-08-02 13:55:54 UTC

Hi,

I am an imaging researcher, and a new Python user. My first Python project is to somehow modify NumPy source code such that everything is Fortran column-major by default.

I read about the information in the link below, but for us, the fact is that we absolutely want to use Fortran column major, and we want to make it default. Explicitly writing " order = 'F' " all over the place is not acceptable to us.
http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues

I tried searching in this email list, as well as google search in general. However, I have not found anything useful. This must be a common request/need, I believe.

Can anyone provide any insight/help?

Thank you very much,

Kang

--
Kang Wang, Ph.D.
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------

Sturla Molden

2015-08-02 14:27:08 UTC

Post by Kang Wang
Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array. The only thing that makes C order "default" is the keyword
argument to np.empty, np.ones and np.zeros. Just monkey patch those
functions and it should be fine.

Sturla

Sebastian Berg

2015-08-02 17:19:43 UTC

Well, numpy has a tendency to prefer C order. There is nothing you can do about that really. But you just cannot be sure what you get in some cases.
Often you need something specific for interfaceing other code. But in that case quite often you also do not need to fear the copy.

- Sebastian

Post by Sturla Molden

Post by Kang Wang
Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array. The only thing that makes C order "default" is the keyword
argument to np.empty, np.ones and np.zeros. Just monkey patch those
functions and it should be fine.
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Kang Wang

2015-08-02 20:14:15 UTC

Thank you all for replying!

I did a quick test, using python 2.6.6, and the original numpy package on my Linux computer without any change.==
x = np.zeros((2,3),dtype=np.int32,order='F')
print "x.strides ="
print x.strides

y = x + 1
print "y.strides ="
print y.strides

==

Output:
--------
x.strides =
(4, 8)
y.strides =
(12, 4)

--------

So, basically, "x" is Fortran-style column-major (because I explicitly write order='F'), but "y" is C-style row-major. This is going to be very annoying. What I really want is:
- I do not have to write order='F' explicitly when declaring "x"
- both "x" and "y" are Fortran-style column-major

Which file should I modify to achieve this goal?

Right now, I am just trying to get some basic stuff working with all arrays default to Fortran-style, and I can worry about interfacing with other code/libraries later.

Thanks,

Kang

Post by Sebastian Berg
Well, numpy has a tendency to prefer C order. There is nothing you can do about that really. But you just cannot be sure what you get in some cases.
Often you need something specific for interfaceing other code. But in that case quite often you also do not need to fear the copy.
- Sebastian

Post by Sturla Molden

Post by Kang Wang
Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array. The only thing that makes C order "default" is the keyword
argument to np.empty, np.ones and np.zeros. Just monkey patch those
functions and it should be fine.
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Kang Wang, Ph.D.
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
TEL 608-263-0066
http://www.medphysics.wisc.edu/~kang/
----------------------------------------

Sturla Molden

2015-08-02 20:22:01 UTC

Post by Kang Wang
Thank you all for replying!
I did a quick test, using python 2.6.6, and the original numpy package
on my Linux computer without any change.
==
x = np.zeros((2,3),dtype=np.int32,order='F')
print "x.strides ="
print x.strides
y = x + 1
print "y.strides ="
print y.strides
==
--------
x.strides =
(4, 8)
y.strides =
(12, 4)
--------

Update NumPy. This is the behavior I talked about that has changed.

Now NumPy does this:

In [21]: x = np.zeros((2,3),dtype=np.int32,order='F')

In [22]: y = x + 1

In [24]: x.strides
Out[24]: (4, 8)

In [25]: y.strides
Out[25]: (4, 8)

Sturla

Bryan Van de Ven

2015-08-02 20:28:24 UTC

And to eliminate the order kwarg, use functools.partial to patch the zeros function (or any others, as needed):

In [26]: import numpy as np

In [27]: from functools import partial

In [28]: np.zeros = partial(np.zeros, order="F")

In [29]: x = np.zeros((2,3), dtype=np.int32)

In [30]: y = x + 1

In [31]: x.strides
Out[31]: (4, 8)

In [32]: y.strides
Out[32]: (4, 8)

In [33]: np.__version__
Out[33]: '1.9.2'

Bryan

Post by Sturla Molden

Post by Kang Wang
Thank you all for replying!
I did a quick test, using python 2.6.6, and the original numpy package
on my Linux computer without any change.
==
x = np.zeros((2,3),dtype=np.int32,order='F')
print "x.strides ="
print x.strides
y = x + 1
print "y.strides ="
print y.strides
==
--------
x.strides =
(4, 8)
y.strides =
(12, 4)
--------

Update NumPy. This is the behavior I talked about that has changed.
In [21]: x = np.zeros((2,3),dtype=np.int32,order='F')
In [22]: y = x + 1
In [24]: x.strides
Out[24]: (4, 8)
In [25]: y.strides
Out[25]: (4, 8)
Sturla
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

2015-08-02 20:46:50 UTC

This will probably break code that depends on NumPy, like SciPy and
scikit-image. But if NumPy is all that matters, sure go ahead and monkey
patch. Otherwise keep the patched functions in another namespace.

:-)

Sturla

Chris Barker

2015-08-03 16:30:06 UTC

Post by Sturla Molden

Post by Bryan Van de Ven
And to eliminate the order kwarg, use functools.partial to patch the

This will probably break code that depends on NumPy, like SciPy and
scikit-image. But if NumPy is all that matters, sure go ahead and monkey
patch. Otherwise keep the patched functions in another namespace.

I"d be really careful about this -- sure it's annoying, but a kind of
global change of behavior could wreak havok.

I'd create a set of Fortran-order constructors -- if it were me, I do:

fzeros
fones,

etc.....

but you could, I suppose, create a namespace and ut hem all there, then
create a fnumpy that would write those over:

import numpy as np

and away you go -- but that wouldn't change any code that imports numpy in
the usual way.

-Chris
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov

Nathaniel Smith

2015-08-02 22:17:51 UTC

Post by Kang Wang
Thank you all for replying!
I did a quick test, using python 2.6.6,

There's pretty much no good reason these days to be using python 2.6 (which
was released in *2008*). I assume you're using it because you're using
redhat or some redhat derivative, and that's what they ship by default?
Even redhat engineers officially recommend that users *not* use the default
python -- it's basically only intended for use by their own built-in system
management scripts.

If you're just getting started with python, then at this point I'd
recommend starting with python 3.4.

Some easy ways to get this installed:
- Anaconda: the most popular scientific python distribution -- you pretty
much just download one file and get a full, up to date setup of python and
all the main scientific packages, in your home directory. Supported on all
popular platforms. Trivial to use and requires no special permissions.
http://continuum.io/downloads#py34
- One of Anaconda's competitors: http://www.scipy.org/install.html
- Software collections: redhat's official way to do things like this:
https://www.softwarecollections.org/en/

-n

Nathaniel Smith

2015-08-02 22:52:50 UTC

Post by Sturla Molden

Post by Kang Wang
Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array.

This is... overoptimistic. I would not rely on this in code that I wrote.

It's true that many numpy operations do preserve the input order. But there
are also many operations that don't. And which are which often changes
between releases. (Not on purpose usually, but it's an easy bug to
introduce. And sometimes it is done intentionally, e.g. to make functions
faster. It sucks to have to make a function slower for everyone because
someone somewhere is depending on memory layout default details.) And there
are operations where it's not even clear what preserving order means
(indexing a C array with a Fortran array, add(C, fortran), ...), and even
lots of operations that intrinsically break contiguity/ordering (transpose,
diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings
one way or another in any non-trivial program.

Instead, it's better to explicitly specify order= just at the places where
you care. That way your code is *locally* correct (you can check it will
work by just reading the one function). The alternative is to try and
enforce a *global* property on your program ("everyone everywhere is very
careful to only use contiguity-preserving operations", where "everyone"
includes third party libraries like numpy and others). In software design,
local invariants invariants are always better than global invariants -- the
most well known example is local variables versus global variables, but the
principle is much broader.

-n

Kang Wang

2015-08-03 02:16:17 UTC

Thank you all for replying and providing useful insights and suggestions.

The reasons I really want to use column-major are:

I am image-oriented user (not matrix-oriented, as explained in http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues)
I am so used to read/write "I(x, y, z)" in textbook and code, and it is very likely that if the environment (row-major environment) forces me to write I(z, y, x), I will write a bug if I am not 100% focused. When this happens, it is difficult to debug, because everything compile and build fine. You will see run time error. Depending on environment, you may get useful error message (i.e. index out of range), but sometimes you just get bad image results.
It actually has not too much to do with the actual data layout in memory. In imaging processing, especially medical imaging where I am working in, if you have a 3D image, everyone will agree that in memory, the X index is the fasted changing index, and the Z dimension (we often call it the "slice" dimension) has the largest stride in memory. So, if data layout is like this in memory, and image-oriented users are so used to read/write "I(x,y,z)", the only storage order that makes sense is column-major
I also write code in MATLAB and C/C++. In MATLAB, matrix is column-major array. In C/C++, we often use ITK, which is also column-major (http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really prefer always read/write column-major code to minimize coding bugs related to storage order.
I also prefer index to be 0-based; however, there is nothing I can do about it for MATLAB (which is 1-based).

I can see that my original thought about "modifying NumPy source and re-compile" is probably a bad idea. The suggestions about using "fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so far, in my opinion, and I am going to give it a try.

Again, thank you all for replying.

Kang

Post by Nathaniel Smith

Post by Sturla Molden

Post by Kang Wang
Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array.

This is... overoptimistic. I would not rely on this in code that I wrote.
It's true that many numpy operations do preserve the input order. But there are also many operations that don't. And which are which often changes between releases. (Not on purpose usually, but it's an easy bug to introduce. And sometimes it is done intentionally, e.g. to make functions faster. It sucks to have to make a function slower for everyone because someone somewhere is depending on memory layout default details.) And there are operations where it's not even clear what preserving order means (indexing a C array with a Fortran array, add(C, fortran), ...), and even lots of operations that intrinsically break contiguity/ordering (transpose, diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings one way or another in any non-trivial program.
Instead, it's better to explicitly specify order= just at the places where you care. That way your code is *locally* correct (you can check it will work by just reading the one function). The alternative is to try and enforce a *global* property on your program ("everyone everywhere is very careful to only use contiguity-preserving operations", where "everyone" includes third party libraries like numpy and others). In software design, local invariants invariants are always better than global invariants -- the most well known example is local variables versus global variables, but the principle is much broader.
-n

--
Kang Wang, Ph.D.
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------

Jason Newton

2015-08-03 02:55:53 UTC

Just chiming in with my 2 cents, in direct response to your points...

- Image oriented processing is most typically done with row-major
storage layout. From hardware to general software implementations.
- Well really think of it as [slice,] row, column (logical)... you don't
actually have to be concerned about the layout unless you want higher
performance - in which case for a better access pattern you process a
fundamental image-line at a time. I also find it helps me avoid bugs with
xyz semantics by working with rows and columns only and remembering x=col,
y = row.
- I'm most familiar with having slice first like the above.
- ITK is stored as row-major actually, but it's index type has
dimensions specified as column,row, slice . Matlab does alot of things
column order and thus acts different from implementations which can result
in different outputs, but matlab seems perfectly happy living on an island
where it's the only implementation providing a specific answer given a
specific input.
- Numpy is 0 based...?

Good luck keeping it all sane though,

-Jason

Post by Kang Wang
Thank you all for replying and providing useful insights and suggestions.
- I am image-oriented user (not matrix-oriented, as explained in
http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues
)
- I am so used to read/write "I(x, y, z)" in textbook and code, and it
is very likely that if the environment (row-major environment) forces me to
write I(z, y, x), I will write a bug if I am not 100% focused. When this
happens, it is difficult to debug, because everything compile and build
fine. You will see run time error. Depending on environment, you may get
useful error message (i.e. index out of range), but sometimes you just get
bad image results.
- It actually has not too much to do with the actual data layout in
memory. In imaging processing, especially medical imaging where I am
working in, if you have a 3D image, everyone will agree that in memory, the
X index is the fasted changing index, and the Z dimension (we often call it
the "slice" dimension) has the largest stride in memory. So, if
data layout is like this in memory, and image-oriented users are so used to
read/write "I(x,y,z)", the only storage order that makes sense is
column-major
- I also write code in MATLAB and C/C++. In MATLAB, matrix is
column-major array. In C/C++, we often use ITK, which is also column-major (
http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really
prefer always read/write column-major code to minimize coding bugs related
to storage order.
- I also prefer index to be 0-based; however, there is nothing I can
do about it for MATLAB (which is 1-based).
I can see that my original thought about "modifying NumPy source and
re-compile" is probably a bad idea. The suggestions about using
"fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so
far, in my opinion, and I am going to give it a try.
Again, thank you all for replying.
Kang

Post by Sturla Molden

Post by Kang Wang
Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array.

This is... overoptimistic. I would not rely on this in code that I wrote.
It's true that many numpy operations do preserve the input order. But
there are also many operations that don't. And which are which often
changes between releases. (Not on purpose usually, but it's an easy bug to
introduce. And sometimes it is done intentionally, e.g. to make functions
faster. It sucks to have to make a function slower for everyone because
someone somewhere is depending on memory layout default details.) And there
are operations where it's not even clear what preserving order means
(indexing a C array with a Fortran array, add(C, fortran), ...), and even
lots of operations that intrinsically break contiguity/ordering (transpose,
diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings
one way or another in any non-trivial program.
Instead, it's better to explicitly specify order= just at the places where
you care. That way your code is *locally* correct (you can check it will
work by just reading the one function). The alternative is to try and
enforce a *global* property on your program ("everyone everywhere is very
careful to only use contiguity-preserving operations", where "everyone"
includes third party libraries like numpy and others). In software design,
local invariants invariants are always better than global invariants -- the
most well known example is local variables versus global variables, but the
principle is much broader.
-n
--
*Kang Wang, Ph.D.*
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Daniel Sank

2015-08-03 04:27:53 UTC

Kang,

Thank you for explaining your motivation. It's clear from your last note,
as you said, that your desire for column-first indexing has nothing to do
with in-memory data layout. That being the case, I strongly urge you to
just use bare numpy and do not use the "fortran_zeros" function I
recommended before. Changing the in-memory layout via the "order" keyword
in numpy.zeros will not change the way indexing works at all. You gain
absolutely nothing by changing the in-memory order unless you are writing
some C or Fortran code which will interact with the data in memory.

To see what I mean, consider the following examples:

x = np.array([1, 2, 3], [4, 5, 6]])
x.shape

Post by Kang Wang

Post by Sturla Molden

(2, 3)

and

x = np.array([1, 2, 3], [4, 5, 6]], order='F')
x.shape

Post by Kang Wang

Post by Sturla Molden

(2, 3)

You see that changing the in-memory order has nothing whatsoever to do with
the array's shape or how you access it.

Post by Kang Wang
You will see run time error. Depending on environment, you may get useful

error message

Post by Kang Wang
(i.e. index out of range), but sometimes you just get bad image results.

Could you give a very simple example of what you mean? I can't think of how
this could ever happen and your fear here makes me think there's a
fundamental misunderstanding about how array operations in numpy and other
programming languages work. As an example, iteration in numpy goes through
the first index:

x = np.array([[1, 2, 3], [4, 5, 6]])
for foo in x:
...

Inside the for loop, foo takes on the values [1, 2, 3] on the first
iteration and [4, 5, 6] on the second. If you want to iterate through the
columns just do this instead

x = np.array([[1, 2, 3], [4, 5, 6]])
for foo in x.T:
...

If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to
produce an array with shape (3, 2) then you should own up to the fact that
the array constructor expects it the other way around and do this

x = np.array([[1, 2, 3], [4, 5, 6]]).T

instead. This is infinity times better than trying to write a shim function
or patch numpy because with .T you're using (fast) built-in functionality
which other people your code will understand.

The real message here is that whether the first index runs over rows or
columns is actually meaningless. The only places the row versus column
issue has any meaning is when doing input/output (in which case you should
use the transpose if you actually need it), or when doing iteration. One
thing that would make sense if you're reading from a binary file format
which uses column-major format would be to write your own reader function:

def read_fortran_style_binary_file(file):
return np.fromfile(file).T

Note that if you do this then you already have a column major array in
numpy and you don't have to worry about any other transposes (except,
again, when doing more I/O or passing to something like a plotting
function).

Post by Kang Wang
Thank you all for replying and providing useful insights and suggestions.
- I am image-oriented user (not matrix-oriented, as explained in
http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues
)
- I am so used to read/write "I(x, y, z)" in textbook and code, and it
is very likely that if the environment (row-major environment) forces me to
write I(z, y, x), I will write a bug if I am not 100% focused. When this
happens, it is difficult to debug, because everything compile and build
fine. You will see run time error. Depending on environment, you may get
useful error message (i.e. index out of range), but sometimes you just get
bad image results.
- It actually has not too much to do with the actual data layout in
memory. In imaging processing, especially medical imaging where I am
working in, if you have a 3D image, everyone will agree that in memory, the
X index is the fasted changing index, and the Z dimension (we often call it
the "slice" dimension) has the largest stride in memory. So, if
data layout is like this in memory, and image-oriented users are so used to
read/write "I(x,y,z)", the only storage order that makes sense is
column-major
- I also write code in MATLAB and C/C++. In MATLAB, matrix is
column-major array. In C/C++, we often use ITK, which is also column-major (
http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really
prefer always read/write column-major code to minimize coding bugs related
to storage order.
- I also prefer index to be 0-based; however, there is nothing I can
do about it for MATLAB (which is 1-based).
I can see that my original thought about "modifying NumPy source and
re-compile" is probably a bad idea. The suggestions about using
"fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so
far, in my opinion, and I am going to give it a try.
Again, thank you all for replying.
Kang

Post by Sturla Molden

Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array.

This is... overoptimistic. I would not rely on this in code that I wrote.
It's true that many numpy operations do preserve the input order. But
there are also many operations that don't. And which are which often
changes between releases. (Not on purpose usually, but it's an easy bug to
introduce. And sometimes it is done intentionally, e.g. to make functions
faster. It sucks to have to make a function slower for everyone because
someone somewhere is depending on memory layout default details.) And there
are operations where it's not even clear what preserving order means
(indexing a C array with a Fortran array, add(C, fortran), ...), and even
lots of operations that intrinsically break contiguity/ordering (transpose,
diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings
one way or another in any non-trivial program.
Instead, it's better to explicitly specify order= just at the places where
you care. That way your code is *locally* correct (you can check it will
work by just reading the one function). The alternative is to try and
enforce a *global* property on your program ("everyone everywhere is very
careful to only use contiguity-preserving operations", where "everyone"
includes third party libraries like numpy and others). In software design,
local invariants invariants are always better than global invariants -- the
most well known example is local variables versus global variables, but the
principle is much broader.
-n
--
*Kang Wang, Ph.D.*
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Daniel Sank

Juan Nunez-Iglesias

2015-08-03 04:54:31 UTC

Hi Kang,

Feel free to come chat about your application on the scikit-image list [1]!
I'll note that we've been through the array order discussion many times
there and even have a doc page about it [2].

The short version is that you'll save yourself a lot of pain by starting to
think of your images as (plane, row, column) instead of (x, y, z). The
syntax actually becomes friendlier too. For example, to do something to
each slice of data, you do:

for plane in image:
plane += foo

instead of

for z in image.shape[2]:
image[:, :, z] += foo

for example.

Juan.

[1] scikit-***@googlegroups.com
[2]
http://scikit-image.org/docs/dev/user_guide/numpy_images.html#coordinate-conventions

PS: As to the renamed Fortran-ordered numpy, may I suggest "funpy". The F
is for Fortran and the fun is for all the fun you'll have maintaining it. =P

Post by Daniel Sank
Kang,
Thank you for explaining your motivation. It's clear from your last note,
as you said, that your desire for column-first indexing has nothing to do
with in-memory data layout. That being the case, I strongly urge you to
just use bare numpy and do not use the "fortran_zeros" function I
recommended before. Changing the in-memory layout via the "order" keyword
in numpy.zeros will not change the way indexing works at all. You gain
absolutely nothing by changing the in-memory order unless you are writing
some C or Fortran code which will interact with the data in memory.
x = np.array([1, 2, 3], [4, 5, 6]])
x.shape

Post by Kang Wang

Post by Sturla Molden

(2, 3)

and
x = np.array([1, 2, 3], [4, 5, 6]], order='F')
x.shape

Post by Kang Wang

Post by Sturla Molden

(2, 3)

You see that changing the in-memory order has nothing whatsoever to do
with the array's shape or how you access it.

Post by Kang Wang
You will see run time error. Depending on environment, you may get

useful error message

Post by Kang Wang
(i.e. index out of range), but sometimes you just get bad image results.

Could you give a very simple example of what you mean? I can't think of
how this could ever happen and your fear here makes me think there's a
fundamental misunderstanding about how array operations in numpy and other
programming languages work. As an example, iteration in numpy goes through
x = np.array([[1, 2, 3], [4, 5, 6]])
...
Inside the for loop, foo takes on the values [1, 2, 3] on the first
iteration and [4, 5, 6] on the second. If you want to iterate through the
columns just do this instead
x = np.array([[1, 2, 3], [4, 5, 6]])
...
If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to
produce an array with shape (3, 2) then you should own up to the fact that
the array constructor expects it the other way around and do this
x = np.array([[1, 2, 3], [4, 5, 6]]).T
instead. This is infinity times better than trying to write a shim
function or patch numpy because with .T you're using (fast) built-in
functionality which other people your code will understand.
The real message here is that whether the first index runs over rows or
columns is actually meaningless. The only places the row versus column
issue has any meaning is when doing input/output (in which case you should
use the transpose if you actually need it), or when doing iteration. One
thing that would make sense if you're reading from a binary file format
return np.fromfile(file).T
Note that if you do this then you already have a column major array in
numpy and you don't have to worry about any other transposes (except,
again, when doing more I/O or passing to something like a plotting
function).

Post by Kang Wang
Thank you all for replying and providing useful insights and suggestions.
- I am image-oriented user (not matrix-oriented, as explained in
http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues
)
- I am so used to read/write "I(x, y, z)" in textbook and code, and
it is very likely that if the environment (row-major environment) forces me
to write I(z, y, x), I will write a bug if I am not 100% focused. When
this happens, it is difficult to debug, because everything compile and
build fine. You will see run time error. Depending on environment, you may
get useful error message (i.e. index out of range), but sometimes you just
get bad image results.
- It actually has not too much to do with the actual data layout in
memory. In imaging processing, especially medical imaging where I am
working in, if you have a 3D image, everyone will agree that in memory, the
X index is the fasted changing index, and the Z dimension (we often call it
the "slice" dimension) has the largest stride in memory. So, if
data layout is like this in memory, and image-oriented users are so used to
read/write "I(x,y,z)", the only storage order that makes sense is
column-major
- I also write code in MATLAB and C/C++. In MATLAB, matrix is
column-major array. In C/C++, we often use ITK, which is also column-major (
http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really
prefer always read/write column-major code to minimize coding bugs related
to storage order.
- I also prefer index to be 0-based; however, there is nothing I can
do about it for MATLAB (which is 1-based).
I can see that my original thought about "modifying NumPy source and
re-compile" is probably a bad idea. The suggestions about using
"fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so
far, in my opinion, and I am going to give it a try.
Again, thank you all for replying.
Kang

Post by Sturla Molden

Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array.

This is... overoptimistic. I would not rely on this in code that I wrote.
It's true that many numpy operations do preserve the input order. But
there are also many operations that don't. And which are which often
changes between releases. (Not on purpose usually, but it's an easy bug to
introduce. And sometimes it is done intentionally, e.g. to make functions
faster. It sucks to have to make a function slower for everyone because
someone somewhere is depending on memory layout default details.) And there
are operations where it's not even clear what preserving order means
(indexing a C array with a Fortran array, add(C, fortran), ...), and even
lots of operations that intrinsically break contiguity/ordering (transpose,
diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings
one way or another in any non-trivial program.
Instead, it's better to explicitly specify order= just at the places
where you care. That way your code is *locally* correct (you can check it
will work by just reading the one function). The alternative is to try and
enforce a *global* property on your program ("everyone everywhere is very
careful to only use contiguity-preserving operations", where "everyone"
includes third party libraries like numpy and others). In software design,
local invariants invariants are always better than global invariants -- the
most well known example is local variables versus global variables, but the
principle is much broader.
-n
--
*Kang Wang, Ph.D.*
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Daniel Sank
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Kang Wang

2015-08-03 06:02:42 UTC

This is very good discussion. Thank you all for replying.

I can see the fundamental difference is that I always think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI (magnetic resonance imaging) research, and I can assure you that the entire MRI community is using (x, y, z), including books, journal papers, conference abstracts, presentations, everything. We even talk about what we called "logical x/y/z" and "physical x/y/z", and the rotation matrix that converts the two coordinate systems. The radiologists are also used to (x, y, z). For example, we always say "my image is 256 by 256 by 20 slices", and we never say "20 by 256 by 256".

So, basically, at least in MRI, we always talk about an image as I(x, y, z), and we always assume that "x" is the fastest changing index. That's why I prefer column-major (because it is more natural).

Of course, I can totally get my work done by using row-major, I just have to always remind myself "write last dimension index first" when coding. I actually have done this before, and I found it would be so much easier if just using column-major.

Kang

Post by Juan Nunez-Iglesias
Hi Kang,
Feel free to come chat about your application on the scikit-image list [1]! I'll note that we've been through the array order discussion many times there and even have a doc page about it [2].
plane += foo
instead of
image[:, :, z] += foo
for example.
Juan.
[2] http://scikit-image.org/docs/dev/user_guide/numpy_images.html#coordinate-conventions
PS: As to the renamed Fortran-ordered numpy, may I suggest "funpy". The F is for Fortran and the fun is for all the fun you'll have maintaining it. =P

Kang,
Thank you for explaining your motivation. It's clear from your last note, as you said, that your desire for column-first indexing has nothing to do with in-memory data layout. That being the case, I strongly urge you to just use bare numpy and do not use the "fortran_zeros" function I recommended before. Changing the in-memory layout via the "order" keyword in numpy.zeros will not change the way indexing works at all. You gain absolutely nothing by changing the in-memory order unless you are writing some C or Fortran code which will interact with the data in memory.
x = np.array([1, 2, 3], [4, 5, 6]])
x.shape

Post by Kang Wang

Post by Nathaniel Smith

(2, 3)

and
x = np.array([1, 2, 3], [4, 5, 6]], order='F')
x.shape

Post by Kang Wang

Post by Nathaniel Smith

(2, 3)

You see that changing the in-memory order has nothing whatsoever to do with the array's shape or how you access it.

Post by Kang Wang
You will see run time error. Depending on environment, you may get useful error message
(i.e. index out of range), but sometimes you just get bad image results.

x = np.array([[1, 2, 3], [4, 5, 6]])
...
Inside the for loop, foo takes on the values [1, 2, 3] on the first iteration and [4, 5, 6] on the second. If you want to iterate through the columns just do this instead
x = np.array([[1, 2, 3], [4, 5, 6]])
...
If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to produce an array with shape (3, 2) then you should own up to the fact that the array constructor expects it the other way around and do this
x = np.array([[1, 2, 3], [4, 5, 6]]).T
instead. This is infinity times better than trying to write a shim function or patch numpy because with .T you're using (fast) built-in functionality which other people your code will understand.
return np.fromfile(file).T
Note that if you do this then you already have a column major array in numpy and you don't have to worry about any other transposes (except, again, when doing more I/O or passing to something like a plotting function).

Post by Kang Wang
Thank you all for replying and providing useful insights and suggestions.
I am image-oriented user (not matrix-oriented, as explained in http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues)
I am so used to read/write "I(x, y, z)" in textbook and code, and it is very likely that if the environment (row-major environment) forces me to write I(z, y, x), I will write a bug if I am not 100% focused. When this happens, it is difficult to debug, because everything compile and build fine. You will see run time error. Depending on environment, you may get useful error message (i.e. index out of range), but sometimes you just get bad image results.
It actually has not too much to do with the actual data layout in memory. In imaging processing, especially medical imaging where I am working in, if you have a 3D image, everyone will agree that in memory, the X index is the fasted changing index, and the Z dimension (we often call it the "slice" dimension) has the largest stride in memory. So, if data layout is like this in memory, and image-oriented users are so used to read/write "I(x,y,z)", the only storage order that makes sense is column-major
I also write code in MATLAB and C/C++. In MATLAB, matrix is column-major array. In C/C++, we often use ITK, which is also column-major (http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really prefer always read/write column-major code to minimize coding bugs related to storage order.
I also prefer index to be 0-based; however, there is nothing I can do about it for MATLAB (which is 1-based).
I can see that my original thought about "modifying NumPy source and re-compile" is probably a bad idea. The suggestions about using "fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so far, in my opinion, and I am going to give it a try.
Again, thank you all for replying.
Kang

Post by Nathaniel Smith

Post by Kang Wang
Can anyone provide any insight/help?

There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array.

This is... overoptimistic. I would not rely on this in code that I wrote.
It's true that many numpy operations do preserve the input order. But there are also many operations that don't. And which are which often changes between releases. (Not on purpose usually, but it's an easy bug to introduce. And sometimes it is done intentionally, e.g. to make functions faster. It sucks to have to make a function slower for everyone because someone somewhere is depending on memory layout default details.) And there are operations where it's not even clear what preserving order means (indexing a C array with a Fortran array, add(C, fortran), ...), and even lots of operations that intrinsically break contiguity/ordering (transpose, diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings one way or another in any non-trivial program.
Instead, it's better to explicitly specify order= just at the places where you care. That way your code is *locally* correct (you can check it will work by just reading the one function). The alternative is to try and enforce a *global* property on your program ("everyone everywhere is very careful to only use contiguity-preserving operations", where "everyone" includes third party libraries like numpy and others). In software design, local invariants invariants are always better than global invariants -- the most well known example is local variables versus global variables, but the principle is much broader.
-n

--
Kang Wang, Ph.D.
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Daniel Sank
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
Kang Wang, Ph.D.
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------

Nathaniel Smith

2015-08-03 07:09:00 UTC

Post by Kang Wang
This is very good discussion. Thank you all for replying.
I can see the fundamental difference is that I always

think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) .
I am coming from MRI (magnetic resonance imaging) research, and I can
assure you that the entire MRI community is using (x, y, z), including
books, journal papers, conference abstracts, presentations, everything. We
even talk about what we called "logical x/y/z" and "physical x/y/z", and
the rotation matrix that converts the two coordinate systems. The
radiologists are also used to (x, y, z). For example, we always say "my
image is 256 by 256 by 20 slices", and we never say "20 by 256 by 256".

Post by Kang Wang
So, basically, at least in MRI, we always talk about an image as I(x, y,

z), and we always assume that "x" is the fastest changing index. That's why
I prefer column-major (because it is more natural).

Post by Kang Wang
Of course, I can totally get my work done by using row-major, I just have

to always remind myself "write last dimension index first" when coding. I
actually have done this before, and I found it would be so much easier if
just using column-major.

Why not just use I[x, y, z] like you're used to, and let the computer worry
about the physical layout in memory? Sometimes this will be Fortran order
and sometimes C order and sometimes something else, but you don't have to
know or care; 99% of the time it doesn't matter. The worst case is that
when you use a python wrapper to call into a library that can only handle
Fortran order, then the wrapper will quietly have to convert the memory
order around and it will be slightly slower than if you had used Fortran
order in the first place. But in practice you'll barely ever notice this,
and when you do, *then* you can tell numpy explicitly what memory layout
you want in the situation where it matters.

General principle: do what's easiest for the programmer, not what's easiest
for the computer.

-n

Matthew Brett

2015-08-03 08:49:35 UTC

Hi,

Post by Nathaniel Smith

Post by Kang Wang
This is very good discussion. Thank you all for replying.
I can see the fundamental difference is that I always
think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I
am coming from MRI (magnetic resonance imaging) research, and I can assure
you that the entire MRI community is using (x, y, z), including books,
journal papers, conference abstracts, presentations, everything. We even
talk about what we called "logical x/y/z" and "physical x/y/z", and the
rotation matrix that converts the two coordinate systems. The radiologists
are also used to (x, y, z). For example, we always say "my image is 256 by
256 by 20 slices", and we never say "20 by 256 by 256".
So, basically, at least in MRI, we always talk about an image as I(x, y,
z), and we always assume that "x" is the fastest changing index. That's why
I prefer column-major (because it is more natural).
Of course, I can totally get my work done by using row-major, I just have
to always remind myself "write last dimension index first" when coding. I
actually have done this before, and I found it would be so much easier if
just using column-major.

Why not just use I[x, y, z] like you're used to, and let the computer worry
about the physical layout in memory? Sometimes this will be Fortran order
and sometimes C order and sometimes something else, but you don't have to
know or care; 99% of the time it doesn't matter. The worst case is that when
you use a python wrapper to call into a library that can only handle Fortran
order, then the wrapper will quietly have to convert the memory order around
and it will be slightly slower than if you had used Fortran order in the
first place. But in practice you'll barely ever notice this, and when you
do, *then* you can tell numpy explicitly what memory layout you want in the
situation where it matters.

Yes - if you are using numpy, you really have to look numpy in the eye and say:

"I will let you worry about the array element order in memory, and in
return, you promise to make indexing work as I would expect"

Just for example, let's say you loaded an MRI image into memory:

In [1]: import nibabel
In [2]: img = nibabel.load('my_mri.nii')
In [3]: data = img.get_data()

Because NIfTI images are Fortran memory layout, this happens to be the
memory layout you get for your array:

In [4]: data.flags
Out[4]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

But now - in Python - all I care about is what data I have on the
first, second, third axes. For example, I could do this:

In [5]: data_copy = data.copy()

This has exactly the same values as the original array, and at the
same index positions:

In [7]: import numpy as np
In [8]: np.all(data == data)
Out[8]: memmap(True, dtype=bool)

but I now have a C memory layout array.

In [9]: data_copy.flags
Out[9]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

Worse than that, if I slice my original data array, then I get an
array that is neither C- or Fortran- compatible in memory:

In [10]: data_view = data[:, :, ::2]
In [11]: data_view.flags
Out[11]:
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

So - if you want every array to be Fortran-contiguous in memory, I
would not start with numpy at all, I would write your own array
library.

The alternative - or "the numpy way" - is to give up on enforcing a
particular layout in memory, until you need to pass an array to some C
or C++ or Fortran code that needs some particular layout, in which
case you get your extension code to copy the array into the required
layout on entry. Of course this is what numpy itself has to do when
interfacing with external libraries like BLAS or LAPACK.

Cheers,

Matthew

Sebastian Berg

2015-08-03 12:02:15 UTC

Post by Matthew Brett
Hi,

Post by Nathaniel Smith

Post by Kang Wang
This is very good discussion. Thank you all for replying.
I can see the fundamental difference is that I always
think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I
am coming from MRI (magnetic resonance imaging) research, and I can assure
you that the entire MRI community is using (x, y, z), including books,
journal papers, conference abstracts, presentations, everything. We even
talk about what we called "logical x/y/z" and "physical x/y/z", and the
rotation matrix that converts the two coordinate systems. The radiologists
are also used to (x, y, z). For example, we always say "my image is 256 by
256 by 20 slices", and we never say "20 by 256 by 256".
So, basically, at least in MRI, we always talk about an image as I(x, y,
z), and we always assume that "x" is the fastest changing index. That's why
I prefer column-major (because it is more natural).
Of course, I can totally get my work done by using row-major, I just have
to always remind myself "write last dimension index first" when coding. I
actually have done this before, and I found it would be so much easier if
just using column-major.

Why not just use I[x, y, z] like you're used to, and let the computer worry
about the physical layout in memory? Sometimes this will be Fortran order
and sometimes C order and sometimes something else, but you don't have to
know or care; 99% of the time it doesn't matter. The worst case is that when
you use a python wrapper to call into a library that can only handle Fortran
order, then the wrapper will quietly have to convert the memory order around
and it will be slightly slower than if you had used Fortran order in the
first place. But in practice you'll barely ever notice this, and when you
do, *then* you can tell numpy explicitly what memory layout you want in the
situation where it matters.

"I will let you worry about the array element order in memory, and in
return, you promise to make indexing work as I would expect"
In [1]: import nibabel
In [2]: img = nibabel.load('my_mri.nii')
In [3]: data = img.get_data()
Because NIfTI images are Fortran memory layout, this happens to be the
In [4]: data.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
But now - in Python - all I care about is what data I have on the
In [5]: data_copy = data.copy()
This has exactly the same values as the original array, and at the
In [7]: import numpy as np
In [8]: np.all(data == data)
Out[8]: memmap(True, dtype=bool)
but I now have a C memory layout array.
In [9]: data_copy.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

Yeah, I would like to second those arguments. Most of the time, there is no need to worry about layout. For large chunks you allocate, it may make sense for speed, etc. So you can alias creation functions. Generally, I would suggest to simply not worry about the memory layout. Also do not *trust* the layout for most function returns. If you need a specific layout to interface other code, always check what you got it.

-Sebastian

Post by Matthew Brett
Worse than that, if I slice my original data array, then I get an
In [10]: data_view = data[:, :, ::2]
In [11]: data_view.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
So - if you want every array to be Fortran-contiguous in memory, I
would not start with numpy at all, I would write your own array
library.
The alternative - or "the numpy way" - is to give up on enforcing a
particular layout in memory, until you need to pass an array to some C
or C++ or Fortran code that needs some particular layout, in which
case you get your extension code to copy the array into the required
layout on entry. Of course this is what numpy itself has to do when
interfacing with external libraries like BLAS or LAPACK.
Cheers,
Matthew
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Gregory Lee

2015-08-03 14:13:56 UTC

I agree that often you don't need to worry about the memory order.
However, it is not uncommon in medical imaging to go back and forth between
a 2D or 3D image representation and a 1D array representation (e.g. as
often used in image reconstruction algorithms). I found that the main time
it was necessary to pay careful attention to the memory layout was when
converting Matlab scripts that involve reshaping operations.

Post by Kang Wang

Post by Matthew Brett
Hi,

Post by Nathaniel Smith

Post by Kang Wang
This is very good discussion. Thank you all for replying.
I can see the fundamental difference is that I always
think/talk/read/write a 3D image as I(x, y, z), not (plane, row,

column) . I

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
am coming from MRI (magnetic resonance imaging) research, and I can

assure

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
you that the entire MRI community is using (x, y, z), including books,
journal papers, conference abstracts, presentations, everything. We

even

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
talk about what we called "logical x/y/z" and "physical x/y/z", and

the

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
rotation matrix that converts the two coordinate systems. The

radiologists

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
are also used to (x, y, z). For example, we always say "my image is

256 by

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
256 by 20 slices", and we never say "20 by 256 by 256".
So, basically, at least in MRI, we always talk about an image as I(x,

y,

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
z), and we always assume that "x" is the fastest changing index.

That's why

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
I prefer column-major (because it is more natural).
Of course, I can totally get my work done by using row-major, I just

have

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
to always remind myself "write last dimension index first" when

coding. I

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
actually have done this before, and I found it would be so much

easier if

Post by Matthew Brett

Post by Nathaniel Smith

Post by Kang Wang
just using column-major.

Why not just use I[x, y, z] like you're used to, and let the computer

worry

Post by Matthew Brett

Post by Nathaniel Smith
about the physical layout in memory? Sometimes this will be Fortran

order

Post by Matthew Brett

Post by Nathaniel Smith
and sometimes C order and sometimes something else, but you don't have

to

Post by Matthew Brett

Post by Nathaniel Smith
know or care; 99% of the time it doesn't matter. The worst case is

that when

Post by Matthew Brett

Post by Nathaniel Smith
you use a python wrapper to call into a library that can only handle

Fortran

Post by Matthew Brett

Post by Nathaniel Smith
order, then the wrapper will quietly have to convert the memory order

around

Post by Matthew Brett

Post by Nathaniel Smith
and it will be slightly slower than if you had used Fortran order in

the

Post by Matthew Brett

Post by Nathaniel Smith
first place. But in practice you'll barely ever notice this, and when

you

Post by Matthew Brett

Post by Nathaniel Smith
do, *then* you can tell numpy explicitly what memory layout you want

in the

Post by Matthew Brett

Post by Nathaniel Smith
situation where it matters.

Yes - if you are using numpy, you really have to look numpy in the eye
"I will let you worry about the array element order in memory, and in
return, you promise to make indexing work as I would expect"
In [1]: import nibabel
In [2]: img = nibabel.load('my_mri.nii')
In [3]: data = img.get_data()
Because NIfTI images are Fortran memory layout, this happens to be the
In [4]: data.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
But now - in Python - all I care about is what data I have on the
In [5]: data_copy = data.copy()
This has exactly the same values as the original array, and at the
In [7]: import numpy as np
In [8]: np.all(data == data)
Out[8]: memmap(True, dtype=bool)
but I now have a C memory layout array.
In [9]: data_copy.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

Yeah, I would like to second those arguments. Most of the time, there is
no need to worry about layout. For large chunks you allocate, it may make
sense for speed, etc. So you can alias creation functions. Generally, I
would suggest to simply not worry about the memory layout. Also do not
*trust* the layout for most function returns. If you need a specific layout
to interface other code, always check what you got it.
-Sebastian

Post by Matthew Brett
Worse than that, if I slice my original data array, then I get an
In [10]: data_view = data[:, :, ::2]
In [11]: data_view.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
So - if you want every array to be Fortran-contiguous in memory, I
would not start with numpy at all, I would write your own array
library.
The alternative - or "the numpy way" - is to give up on enforcing a
particular layout in memory, until you need to pass an array to some C
or C++ or Fortran code that needs some particular layout, in which
case you get your extension code to copy the array into the required
layout on entry. Of course this is what numpy itself has to do when
interfacing with external libraries like BLAS or LAPACK.
Cheers,
Matthew
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Matthew Brett

2015-08-03 14:26:10 UTC

Hi,

I agree that often you don't need to worry about the memory order. However,
it is not uncommon in medical imaging to go back and forth between a 2D or
3D image representation and a 1D array representation (e.g. as often used in
image reconstruction algorithms). I found that the main time it was
necessary to pay careful attention to the memory layout was when converting
Matlab scripts that involve reshaping operations.

Yes, good point. A typical example would be this kind of thing:

# data is a 4D array with time / volume axis last
data_2d = data.reshape((-1, data.shape[-1])

For MATLAB, the columns of this array would (by default) have the
values on the first axis fastest changing, then the second, then the
third, whereas numpy's default is the other way round.

I find I usually don't have to worry about this, because I'm later going to do:

data_processed_4d = data_2d.reshape(data.shape)

which will reverse the previous reshape in the correct way.

But in any case - this is not directly to do with the array memory
layout. You will get the same output from reshape whether the memory
layout of `data` was Fortran or C.

Cheers,

Matthew

Sebastian Berg

2015-08-03 14:53:23 UTC

Post by Matthew Brett
Hi,

I agree that often you don't need to worry about the memory order. However,
it is not uncommon in medical imaging to go back and forth between a 2D or
3D image representation and a 1D array representation (e.g. as often used in
image reconstruction algorithms). I found that the main time it was
necessary to pay careful attention to the memory layout was when converting
Matlab scripts that involve reshaping operations.

# data is a 4D array with time / volume axis last
data_2d = data.reshape((-1, data.shape[-1])
For MATLAB, the columns of this array would (by default) have the
values on the first axis fastest changing, then the second, then the
third, whereas numpy's default is the other way round.
data_processed_4d = data_2d.reshape(data.shape)
which will reverse the previous reshape in the correct way.
But in any case - this is not directly to do with the array memory
layout. You will get the same output from reshape whether the memory
layout of `data` was Fortran or C.

Just as a remark. Reshape has an (iteration not really memory) order parameter, thou it may do more copies if those do not match.

- Sebastian

Post by Matthew Brett
Cheers,
Matthew
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

2015-08-03 14:55:04 UTC

Post by Juan Nunez-Iglesias
The short version is that you'll save yourself a lot of pain by starting to
think of your images as (plane, row, column) instead of (x, y, z).

There are several things to consider here.

1. The vertices in computer graphics (OpenGL) are (x,y,z).

2. OpenGL rotation matrices and projection matrice are stored in column
major order.

3. OpenGL frame buffers are indexed (x,y) in column major order with (0,0)
being lower left.

4. ITK and VTK depends on OpenGL and are thus using column major order.

5. Those who use Matlab or Fortran in addition to Python prefer column
major order.

6. BLAS and LAPACK use column major order.

7. The common notation in image prorcessing (as opposed to computer
graphics in geberal) is indexing (row, column), in row major order, with
(0,0) being upper left.

All in all, this is a strong case for prefering column major order and the
common mathematical notation (x,y,z).

Also notice how the ususal notation in image pricessing differs from
OpenGL.

Sturla

Matthew Brett

2015-08-03 15:16:02 UTC

Post by Sturla Molden

Post by Juan Nunez-Iglesias
The short version is that you'll save yourself a lot of pain by starting to
think of your images as (plane, row, column) instead of (x, y, z).

There are several things to consider here.
1. The vertices in computer graphics (OpenGL) are (x,y,z).
2. OpenGL rotation matrices and projection matrice are stored in column
major order.
3. OpenGL frame buffers are indexed (x,y) in column major order with (0,0)
being lower left.
4. ITK and VTK depends on OpenGL and are thus using column major order.
5. Those who use Matlab or Fortran in addition to Python prefer column
major order.
6. BLAS and LAPACK use column major order.
7. The common notation in image prorcessing (as opposed to computer
graphics in geberal) is indexing (row, column), in row major order, with
(0,0) being upper left.
All in all, this is a strong case for prefering column major order and the
common mathematical notation (x,y,z).
Also notice how the ususal notation in image pricessing differs from
OpenGL.

Sure, but to avoid confusion, maybe move the discussion of image
indexing order to another thread?

I think this thread is about memory layout, which is a different issue.

Cheers,

Matthew

Sturla Molden

2015-08-03 16:01:09 UTC

Post by Matthew Brett
Sure, but to avoid confusion, maybe move the discussion of image
indexing order to another thread?
I think this thread is about memory layout, which is a different issue.

It is actually a bit convoluted and not completely orthogonal. Memory
layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you
are careful when iterating, but it does matter for Nd indexing. There is a
reason to prefer (x,y,z,t,r) in column major order or (recording, time,
slice, row, column) in row major order. Otherwise you can get very
inefficient memory traversals. Then if you work with visualization
libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and
OpenGL, this is really what you want to use. And the choise of indexing
(x,y,z) cannot be seen as independent of the memory layout. Remember, it is
not just a matter of mapping coordinates to pixels. The data sets are so
large in MRI processing that memory layout does matter.

Sturla

Matthew Brett

2015-08-03 16:24:51 UTC

Post by Sturla Molden

Post by Matthew Brett
Sure, but to avoid confusion, maybe move the discussion of image
indexing order to another thread?
I think this thread is about memory layout, which is a different issue.

It is actually a bit convoluted and not completely orthogonal. Memory
layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you
are careful when iterating, but it does matter for Nd indexing. There is a
reason to prefer (x,y,z,t,r) in column major order or (recording, time,
slice, row, column) in row major order. Otherwise you can get very
inefficient memory traversals. Then if you work with visualization
libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and
OpenGL, this is really what you want to use. And the choise of indexing
(x,y,z) cannot be seen as independent of the memory layout. Remember, it is
not just a matter of mapping coordinates to pixels. The data sets are so
large in MRI processing that memory layout does matter.

I completely agree that memory layout can affect performance, and this
can be important.

On the other hand, I think you agree that the relationship of axis
order to memory layout is just a matter of mapping coordinates to
pixels.

So you can change the axis ordering without changing the memory layout
and the memory layout without changing the axis ordering.

Of course you could argue that it would be simpler to fuse the two
issues, and enforce one memory layout - say Fortran. The result
might well be easier think about, but it wouldn't be much like numpy,
and it would have lots of performance and memory disadvantages.

Cheers,

Matthew

Charles R Harris

2015-08-03 17:11:02 UTC

Post by Matthew Brett

Post by Sturla Molden

Post by Matthew Brett
Sure, but to avoid confusion, maybe move the discussion of image
indexing order to another thread?
I think this thread is about memory layout, which is a different issue.

It is actually a bit convoluted and not completely orthogonal. Memory
layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if

you

Post by Sturla Molden
are careful when iterating, but it does matter for Nd indexing. There is

a

Post by Sturla Molden
reason to prefer (x,y,z,t,r) in column major order or (recording, time,
slice, row, column) in row major order. Otherwise you can get very
inefficient memory traversals. Then if you work with visualization
libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and
OpenGL, this is really what you want to use. And the choise of indexing
(x,y,z) cannot be seen as independent of the memory layout. Remember, it

is

Post by Sturla Molden
not just a matter of mapping coordinates to pixels. The data sets are so
large in MRI processing that memory layout does matter.

I completely agree that memory layout can affect performance, and this
can be important.
On the other hand, I think you agree that the relationship of axis
order to memory layout is just a matter of mapping coordinates to
pixels.
So you can change the axis ordering without changing the memory layout
and the memory layout without changing the axis ordering.
Of course you could argue that it would be simpler to fuse the two
issues, and enforce one memory layout - say Fortran. The result
might well be easier think about, but it wouldn't be much like numpy,
and it would have lots of performance and memory disadvantages.
Cheers,

I would also strongly suggest that once you have decided on a convention
that you thoroughly document it somewhere. That will not only help you, but
anyone who later needs to maintain the code will bless you rather than d*nm
you to eternal torture by the seven demons of Ipsos.

Chuck

Nathaniel Smith

2015-08-02 22:08:19 UTC

Post by Kang Wang
Hi,
I am an imaging researcher, and a new Python user. My first Python

project is to somehow modify NumPy source code such that everything is
Fortran column-major by default.

Post by Kang Wang
I read about the information in the link below, but for us, the fact is

that we absolutely want to use Fortran column major, and we want to make
it default. Explicitly writing " order = 'F' " all over the place is not
acceptable to us.
http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues

Post by Kang Wang
I tried searching in this email list, as well as google search in

general. However, I have not found anything useful. This must be a common
request/need, I believe.

It isn't, I'm afraid. Basically what you're signing up for is to maintain
your own copy of numpy all by yourself. You're totally within your rights
to do this, but it isn't something I would really recommend as a first
python project (to put it mildly).

And unfortunately, there are plenty of libraries out there that use numpy
and assume they will get C order by default, so your version of numpy will
create lots of obscure errors, segfaults, etc. as you start using it with
other packages. Obviously this will be a problem for you -- basically you
may find yourself having to maintain your own copy of lots of libraries.
Less obviously, this would also create a big problem for us, because your
users will start filling bug reports on numpy, or on these random third
party packages, and it will be massively confusing and a big waste of time
because the problem will be with your package, not with any of our code. So
if you do do this, please either (a) change the name of your package
somehow ('import numpyfortran' or similar) so that everyone using it is
clear that it's a non-standard product, or else (b) make sure that you only
use it within your own team, don't allow anyone else to use it, and make a
rule that no one is allowed to file bug reports, or ask or answer questions
on mailing lists or stackoverflow, unless they have first double checked
*every* time that what they're saying is also valid when using regular
numpy.

Again, I strongly recommend you not do this.

There are literally millions of users who are using numpy as it currently
is, and able to get stuff done. I don't know your specific situation, but
maybe if you describe a bit more what it is you're doing and why you think
you need all-Fortran-all-the-time, then people will be able to suggest
strategies to work around things on your end, or find smaller tweaks to
numpy that could go into the standard version.

-n

Daniel Sank

2015-08-02 22:24:51 UTC

Could you please explain why you need 'F' ordering? It's pretty unlikely
that you actually care about the internal memory layout, and you'll get
better advice if you explain why you think you do care.

Post by Kang Wang
My first Python project is to somehow modify NumPy source
code such that everything is Fortran column-major by default.

This is the road to pain. You'll have to maintain your own fork and will
probably inject bugs when trying to rewrite. Nobody will want to help fix
them because everyone else just uses numpy as is.

Post by Kang Wang
And to eliminate the order kwarg, use functools.partial to patch the

Instead of monkey patching, why not just define your own shims:

fortran_zeros = partial(np.zeros(order='F'))

Seems like this would lead to a lot less confusion (although until you tell
us why you care about the in-memory layout I don't know the point of doing
this at all).

SMRUTI RANJAN SAHOO

2015-08-03 15:00:27 UTC

well its really great idea. i can help on python but i don't have any
knowledge on fortran.

Post by Kang Wang
Hi,
I am an imaging researcher, and a new Python user. My first Python project
is to somehow modify NumPy source code such that everything is Fortran
column-major by default.
I read about the information in the link below, but for us, the fact is
that *we absolutely want to use Fortran column major, and we want to
make it default. Explicitly writing " order = 'F' " all over the place is
not acceptable to us*.
http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues
I tried searching in this email list, as well as google search in general.
However, I have not found anything useful. This must be a common
request/need, I believe.
Can anyone provide any insight/help?
Thank you very much,
Kang
--
*Kang Wang, Ph.D.*
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

2015-08-03 15:42:03 UTC

Post by SMRUTI RANJAN SAHOO
well its really great idea. i can help on python but i don't have any
knowledge on fortran.

I have been thinking in these lines too. But I have always thought it would
be too much work for very little in return, and it might not interop
properly with libraries written for NumPy (though PEP3118 might have
changed that). I am not sure using Fortran in addition to Cython is a good
idea, but it might be. At least if we limit the number of dimenstions to,
say, 4 or less, ot would be easy to implement most of the code in
vectorized Fortran.

Sturla

29 Replies
4 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Kang Wang 2015-08-02 13:55:54 UTC

Sturla Molden 2015-08-02 14:27:08 UTC

Sebastian Berg 2015-08-02 17:19:43 UTC

Kang Wang 2015-08-02 20:14:15 UTC

Sturla Molden 2015-08-02 20:22:01 UTC

Bryan Van de Ven 2015-08-02 20:28:24 UTC

Sturla Molden 2015-08-02 20:46:50 UTC

Chris Barker 2015-08-03 16:30:06 UTC

Nathaniel Smith 2015-08-02 22:17:51 UTC

Nathaniel Smith 2015-08-02 22:52:50 UTC

Kang Wang 2015-08-03 02:16:17 UTC

Jason Newton 2015-08-03 02:55:53 UTC

Daniel Sank 2015-08-03 04:27:53 UTC

Juan Nunez-Iglesias 2015-08-03 04:54:31 UTC

Kang Wang 2015-08-03 06:02:42 UTC

Nathaniel Smith 2015-08-03 07:09:00 UTC

Matthew Brett 2015-08-03 08:49:35 UTC

Sebastian Berg 2015-08-03 12:02:15 UTC

Gregory Lee 2015-08-03 14:13:56 UTC

Matthew Brett 2015-08-03 14:26:10 UTC

Sebastian Berg 2015-08-03 14:53:23 UTC

Sturla Molden 2015-08-03 14:55:04 UTC

Matthew Brett 2015-08-03 15:16:02 UTC

Sturla Molden 2015-08-03 16:01:09 UTC

Matthew Brett 2015-08-03 16:24:51 UTC

Charles R Harris 2015-08-03 17:11:02 UTC

Nathaniel Smith 2015-08-02 22:08:19 UTC

Daniel Sank 2015-08-02 22:24:51 UTC

SMRUTI RANJAN SAHOO 2015-08-03 15:00:27 UTC

Sturla Molden 2015-08-03 15:42:03 UTC

about - legalese

Loading...