Kang,
Thank you for explaining your motivation. It's clear from your last note,
as you said, that your desire for column-first indexing has nothing to do
with in-memory data layout. That being the case, I strongly urge you to
just use bare numpy and do not use the "fortran_zeros" function I
recommended before. Changing the in-memory layout via the "order" keyword
in numpy.zeros will not change the way indexing works at all. You gain
absolutely nothing by changing the in-memory order unless you are writing
some C or Fortran code which will interact with the data in memory.
To see what I mean, consider the following examples:
x = np.array([1, 2, 3], [4, 5, 6]])
x.shape
and
x = np.array([1, 2, 3], [4, 5, 6]], order='F')
x.shape
You see that changing the in-memory order has nothing whatsoever to do with
the array's shape or how you access it.
Post by Kang WangYou will see run time error. Depending on environment, you may get useful
error message
Post by Kang Wang(i.e. index out of range), but sometimes you just get bad image results.
Could you give a very simple example of what you mean? I can't think of how
this could ever happen and your fear here makes me think there's a
fundamental misunderstanding about how array operations in numpy and other
programming languages work. As an example, iteration in numpy goes through
the first index:
x = np.array([[1, 2, 3], [4, 5, 6]])
for foo in x:
...
Inside the for loop, foo takes on the values [1, 2, 3] on the first
iteration and [4, 5, 6] on the second. If you want to iterate through the
columns just do this instead
x = np.array([[1, 2, 3], [4, 5, 6]])
for foo in x.T:
...
If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to
produce an array with shape (3, 2) then you should own up to the fact that
the array constructor expects it the other way around and do this
x = np.array([[1, 2, 3], [4, 5, 6]]).T
instead. This is infinity times better than trying to write a shim function
or patch numpy because with .T you're using (fast) built-in functionality
which other people your code will understand.
The real message here is that whether the first index runs over rows or
columns is actually meaningless. The only places the row versus column
issue has any meaning is when doing input/output (in which case you should
use the transpose if you actually need it), or when doing iteration. One
thing that would make sense if you're reading from a binary file format
which uses column-major format would be to write your own reader function:
def read_fortran_style_binary_file(file):
return np.fromfile(file).T
Note that if you do this then you already have a column major array in
numpy and you don't have to worry about any other transposes (except,
again, when doing more I/O or passing to something like a plotting
function).
Post by Kang WangThank you all for replying and providing useful insights and suggestions.
- I am image-oriented user (not matrix-oriented, as explained in
http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues
)
- I am so used to read/write "I(x, y, z)" in textbook and code, and it
is very likely that if the environment (row-major environment) forces me to
write I(z, y, x), I will write a bug if I am not 100% focused. When this
happens, it is difficult to debug, because everything compile and build
fine. You will see run time error. Depending on environment, you may get
useful error message (i.e. index out of range), but sometimes you just get
bad image results.
- It actually has not too much to do with the actual data layout in
memory. In imaging processing, especially medical imaging where I am
working in, if you have a 3D image, everyone will agree that in memory, the
X index is the fasted changing index, and the Z dimension (we often call it
the "slice" dimension) has the largest stride in memory. So, if
data layout is like this in memory, and image-oriented users are so used to
read/write "I(x,y,z)", the only storage order that makes sense is
column-major
- I also write code in MATLAB and C/C++. In MATLAB, matrix is
column-major array. In C/C++, we often use ITK, which is also column-major (
http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really
prefer always read/write column-major code to minimize coding bugs related
to storage order.
- I also prefer index to be 0-based; however, there is nothing I can
do about it for MATLAB (which is 1-based).
I can see that my original thought about "modifying NumPy source and
re-compile" is probably a bad idea. The suggestions about using
"fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so
far, in my opinion, and I am going to give it a try.
Again, thank you all for replying.
Kang
Post by Sturla MoldenCan anyone provide any insight/help?
There is no "default order". There was before, but now all operators
control the order of their return arrays from the order of their input
array.
This is... overoptimistic. I would not rely on this in code that I wrote.
It's true that many numpy operations do preserve the input order. But
there are also many operations that don't. And which are which often
changes between releases. (Not on purpose usually, but it's an easy bug to
introduce. And sometimes it is done intentionally, e.g. to make functions
faster. It sucks to have to make a function slower for everyone because
someone somewhere is depending on memory layout default details.) And there
are operations where it's not even clear what preserving order means
(indexing a C array with a Fortran array, add(C, fortran), ...), and even
lots of operations that intrinsically break contiguity/ordering (transpose,
diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings
one way or another in any non-trivial program.
Instead, it's better to explicitly specify order= just at the places where
you care. That way your code is *locally* correct (you can check it will
work by just reading the one function). The alternative is to try and
enforce a *global* property on your program ("everyone everywhere is very
careful to only use contiguity-preserving operations", where "everyone"
includes third party libraries like numpy and others). In software design,
local invariants invariants are always better than global invariants -- the
most well known example is local variables versus global variables, but the
principle is much broader.
-n
--
*Kang Wang, Ph.D.*
1111 Highland Ave., Room 1113
Madison, WI 53705-2275
----------------------------------------
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Daniel Sank