Discussion:
[Numpy-discussion] About the npz format
onefire
2014-04-16 18:26:37 UTC
Permalink
Hi all,

I have been playing with the idea of using Numpy's binary format as a
lightweight alternative to HDF5 (which I believe is the "right" way to do
if one does not have a problem with the dependency).

I am pretty happy with the npy format, but the npz format seems to be
broken as far as performance is concerned (or I am missing obvious!). The
following ipython session illustrates the issue:

ln [1]: import numpy as np

In [2]: x = np.linspace(1, 10, 50000000)

In [3]: %time np.save("x.npy", x)
CPU times: user 40 ms, sys: 230 ms, total: 270 ms
Wall time: 488 ms

In [4]: %time np.savez("x.npz", data = x)
CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
Wall time: 7.7 s

I can inspect the files to verify that they contain the same data, and I
can change the example, but this seems to always hold (I am running Arch
Linux, but I've done the test on other machines too): for bigger arrays,
the npz format seems to add an unbelievable amount of overhead.

Looking at Numpy's code, it looks like the real work is being done by
Python's zipfile module, and I suspect that all the extra time is spent
computing the crc32. Am I correct in my assumption (I am not familiar with
zipfile's internals)? Or perhaps I am doing something really dumb and there
is an easy way to speed things up?

Assuming that I am correct, my next question is: why compute the crc32 at
all? I mean, I know that it is part of what defines a "zip file", but is it
really necessary for a npz file to be a (compliant) zip file? If, for
example, I open the resulting npz file with a hex editor, and insert a
bogus crc32, np.load will happily load the file anyway (Gnome's Archive
Manager will do the same) To me this suggests that the fact that npz files
are zip files is not that important. .

Perhaps, people think that the ability to browse arrays and extract
individual ones like they would do with a regular zip file is really
important, but reading the little documentation that I found, I got the
impression that npz files are zip files just because this was the easiest
way to have multiple arrays in the same file. But my main point is: it
should be fairly simple to make npz files much more efficient with simple
changes like not computing checksums (or using a different algorithm like
adler32).

Let me know what you think about this. I've searched around the internet,
and on places like Stackoverflow, it seems that the standard answer is: you
are doing it wrong, forget Numpy's format and start using hdf5! Please do
not give that answer. Like I said in the beginning, I am well aware of
hdf5 and I use it on my "production code" (on C++). But I believe that
there should be a lightweight alternative (right now, to use hdf5 I need to
have installed the C library, the C++ wrappers, and the h5py library to
play with the data using Python, that is a bit too heavy for my needs). I
really like Numpy's format (if anything, it makes me feel better knowing
that it is
so easy to reverse engineer it, while the hdf5 format is very complicated),
but the (apparent) poor performance of npz files if a deal breaker.

Gilberto
Valentin Haenel
2014-04-16 20:57:30 UTC
Permalink
Hi Gilberto,
Post by onefire
I have been playing with the idea of using Numpy's binary format as a
lightweight alternative to HDF5 (which I believe is the "right" way to do
if one does not have a problem with the dependency).
I am pretty happy with the npy format, but the npz format seems to be
broken as far as performance is concerned (or I am missing obvious!). The
ln [1]: import numpy as np
In [2]: x = np.linspace(1, 10, 50000000)
In [3]: %time np.save("x.npy", x)
CPU times: user 40 ms, sys: 230 ms, total: 270 ms
Wall time: 488 ms
In [4]: %time np.savez("x.npz", data = x)
CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
Wall time: 7.7 s
If it just serialization speed, You may want to look at Bloscpack:

https://github.com/Blosc/Bloscpack

Which only has blosc/python-blosc and Numpy as a dependency.

You can use it on Numpy arrays like so:

https://github.com/Blosc/Bloscpack#numpy

(thats instructions for master you are looking at)

And it can certainly be faster than NPZ and sometimes faster than NPY --
depending of course on your system and the type of data -- and also more
lightweight than HDF5.

I wrote an article about it with some benchmarks, also vs NPY/NPZ here:

https://github.com/euroscipy/euroscipy_proceedings/tree/master/papers/23_haenel

Since it is not yet officially published, you can find a compiled PDF
draft I just made at:

http://fldmp.zetatech.org/haenel_bloscpack_euroscipy2013_ac25c19cb6.pdf

Perhaps it is interesting for you.
Post by onefire
I can inspect the files to verify that they contain the same data, and I
can change the example, but this seems to always hold (I am running Arch
Linux, but I've done the test on other machines too): for bigger arrays,
the npz format seems to add an unbelievable amount of overhead.
You mean time or space wise? In my experience NPZ is fairly slow but
can yield some good compression rations, depending on the LZ-complexity
of the input data. In fact, AFAIK, NPZ uses the DEFLATE algorithm as
implemented by ZLIB which is fairly slow and not optimized for
compression decompression speed. FYI: if you really want ZLIB, Blosc
also supports using it internally, which is nice.
Post by onefire
Looking at Numpy's code, it looks like the real work is being done by
Python's zipfile module, and I suspect that all the extra time is spent
computing the crc32. Am I correct in my assumption (I am not familiar with
zipfile's internals)? Or perhaps I am doing something really dumb and there
is an easy way to speed things up?
I am guessing here, but a checksum *should* be fairly fast. I would
guess it is at least in part due to use of DEFLATE.
Post by onefire
Assuming that I am correct, my next question is: why compute the crc32 at
all? I mean, I know that it is part of what defines a "zip file", but is it
really necessary for a npz file to be a (compliant) zip file? If, for
example, I open the resulting npz file with a hex editor, and insert a
bogus crc32, np.load will happily load the file anyway (Gnome's Archive
Manager will do the same) To me this suggests that the fact that npz files
are zip files is not that important. .
Well, the good news here is that Bloscpack supports adding checksums to
secure the integrity of the compressed data. You can choose between
many, including CRC32, ADLER32 and even sha512.
Post by onefire
Perhaps, people think that the ability to browse arrays and extract
individual ones like they would do with a regular zip file is really
important, but reading the little documentation that I found, I got the
impression that npz files are zip files just because this was the easiest
way to have multiple arrays in the same file. But my main point is: it
should be fairly simple to make npz files much more efficient with simple
changes like not computing checksums (or using a different algorithm like
adler32)
Ah, so you want to store multiple arrays in a single file. I must
disappoint you there, Bloscpack doesn't support that right now. Although
it is in principle possible to achieve this.
Post by onefire
Let me know what you think about this. I've searched around the internet,
and on places like Stackoverflow, it seems that the standard answer is: you
are doing it wrong, forget Numpy's format and start using hdf5! Please do
not give that answer. Like I said in the beginning, I am well aware of
hdf5 and I use it on my "production code" (on C++). But I believe that
there should be a lightweight alternative (right now, to use hdf5 I need to
have installed the C library, the C++ wrappers, and the h5py library to
play with the data using Python, that is a bit too heavy for my needs). I
really like Numpy's format (if anything, it makes me feel better knowing
that it is
so easy to reverse engineer it, while the hdf5 format is very complicated),
but the (apparent) poor performance of npz files if a deal breaker.
Well, I hope that Bloscpack is lightweight enough for you. As I said the
only dependency is blosc/python-blosc which can be compiled using
a C compiler (C++ if you want all the additional codecs) and the Python
headers.

Hope it helps and let me know what you think!

V-
Nathaniel Smith
2014-04-16 21:03:55 UTC
Permalink
crc32 extremely fast, and I think zip might use adler32 instead which is
even faster. OTOH compression is incredibly slow, unless you're using one
of the 'just a little bit of compression' formats like blosc or lzo1. If
your npz files are compressed then this is certainly the culprit.

The zip format supports storing files without compression. Maybe what you
want is an option to use this with .npz?

-n
Post by onefire
Hi all,
I have been playing with the idea of using Numpy's binary format as a
lightweight alternative to HDF5 (which I believe is the "right" way to do
if one does not have a problem with the dependency).
I am pretty happy with the npy format, but the npz format seems to be
broken as far as performance is concerned (or I am missing obvious!). The
ln [1]: import numpy as np
In [2]: x = np.linspace(1, 10, 50000000)
In [3]: %time np.save("x.npy", x)
CPU times: user 40 ms, sys: 230 ms, total: 270 ms
Wall time: 488 ms
In [4]: %time np.savez("x.npz", data = x)
CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
Wall time: 7.7 s
I can inspect the files to verify that they contain the same data, and I
can change the example, but this seems to always hold (I am running Arch
Linux, but I've done the test on other machines too): for bigger arrays,
the npz format seems to add an unbelievable amount of overhead.
Looking at Numpy's code, it looks like the real work is being done by
Python's zipfile module, and I suspect that all the extra time is spent
computing the crc32. Am I correct in my assumption (I am not familiar with
zipfile's internals)? Or perhaps I am doing something really dumb and there
is an easy way to speed things up?
Assuming that I am correct, my next question is: why compute the crc32 at
all? I mean, I know that it is part of what defines a "zip file", but is it
really necessary for a npz file to be a (compliant) zip file? If, for
example, I open the resulting npz file with a hex editor, and insert a
bogus crc32, np.load will happily load the file anyway (Gnome's Archive
Manager will do the same) To me this suggests that the fact that npz files
are zip files is not that important. .
Perhaps, people think that the ability to browse arrays and extract
individual ones like they would do with a regular zip file is really
important, but reading the little documentation that I found, I got the
impression that npz files are zip files just because this was the easiest
way to have multiple arrays in the same file. But my main point is: it
should be fairly simple to make npz files much more efficient with simple
changes like not computing checksums (or using a different algorithm like
adler32).
Let me know what you think about this. I've searched around the internet,
and on places like Stackoverflow, it seems that the standard answer is: you
are doing it wrong, forget Numpy's format and start using hdf5! Please do
not give that answer. Like I said in the beginning, I am well aware of
hdf5 and I use it on my "production code" (on C++). But I believe that
there should be a lightweight alternative (right now, to use hdf5 I need to
have installed the C library, the C++ wrappers, and the h5py library to
play with the data using Python, that is a bit too heavy for my needs). I
really like Numpy's format (if anything, it makes me feel better knowing
that it is
so easy to reverse engineer it, while the hdf5 format is very
complicated), but the (apparent) poor performance of npz files if a deal
breaker.
Gilberto
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
onefire
2014-04-17 00:57:36 UTC
Permalink
Valentin Haenel, Bloscpack definitely looks interesting but I need to take
a careful look first. I will let you know if I like it. Thanks for the
suggestion!

I think you and Nathaniel Smith misunderstood my questions (my fault, since
I did not explain myself well!).
First, Numpy's savez will not do any compression by default. It will simply
store the npy file normally. The documentation suggests so and I can open
the resulting file to confirm it.
Also, if you run the commands that I specified in my previous post, you can
see that the resulting files have sizes 400000080 (x.npy) and 400000194
(x.npz). The npy header takes 80 bytes (it actually needs less than that,
but it is padded to be divisible by 16). The npz file that saves the same
array takes 114 extra bytes (for the zip file metadata), so the space
overhead is pretty small.
What I cannot understand is why savez takes more than 10 times longer than
saving the data to a npy file. The only reason that I could come up with
was the computation of the crc32.
BUT it might be more than this...
This afternoon I found out about this Julia package (
https://github.com/fhs/NPZ.jl) to manipulate Numpy files. I did a few tests
and it seems to work correctly. It becomes interesting when I do the
npy-npz comparison using Julia.
Here is the code that I used:

using NPZ

function write_npy(x)

tic()

npzwrite("data.npy", x)

toc()

end



function write_npz(x)

tic()

npzwrite("data.npz", (ASCIIString => Any)["data" => x])

toc()

end

x = linspace(1, 10, 50000000)

write_npy(x) # this prints: elapsed time: 0.417742163 seconds

write_npz(x) # this prints: elapsed time: 0.882226675 seconds

The Julia timings (tested with Julia 0.3) are closer to what I would
expect. Notice that the time to save the npy file is very similar to the
one that I got with Numpy's save function (see my previous post), but the
"npz overhead" only adds half a second.

So now I think there are two things going on:
1) It is wasteful to compute the crc32. At a minimum I would like to either
have the option to choose a different, faster checksum (like adler32) or to
turn that off (I prefer the second option, because if I am worried about
the integrity of the data, I will likely compute the sha512sum of the
entire file anyway).
2) The Python implementation is inefficient (to be honest, I just found out
about the Julia package and I cannot guarantee anything about its quality,
but if I compute a crc32 from 0.5 GB of data from C code, it does takes
less than a second!). My guess is that the problem is in the zip module,
but like I said before, I do not know the details of what it is doing.

Let me know what you think.

Gilberto
Post by Nathaniel Smith
crc32 extremely fast, and I think zip might use adler32 instead which is
even faster. OTOH compression is incredibly slow, unless you're using one
of the 'just a little bit of compression' formats like blosc or lzo1. If
your npz files are compressed then this is certainly the culprit.
The zip format supports storing files without compression. Maybe what you
want is an option to use this with .npz?
-n
Post by onefire
Hi all,
I have been playing with the idea of using Numpy's binary format as a
lightweight alternative to HDF5 (which I believe is the "right" way to do
if one does not have a problem with the dependency).
I am pretty happy with the npy format, but the npz format seems to be
broken as far as performance is concerned (or I am missing obvious!). The
ln [1]: import numpy as np
In [2]: x = np.linspace(1, 10, 50000000)
In [3]: %time np.save("x.npy", x)
CPU times: user 40 ms, sys: 230 ms, total: 270 ms
Wall time: 488 ms
In [4]: %time np.savez("x.npz", data = x)
CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
Wall time: 7.7 s
I can inspect the files to verify that they contain the same data, and I
can change the example, but this seems to always hold (I am running Arch
Linux, but I've done the test on other machines too): for bigger arrays,
the npz format seems to add an unbelievable amount of overhead.
Looking at Numpy's code, it looks like the real work is being done by
Python's zipfile module, and I suspect that all the extra time is spent
computing the crc32. Am I correct in my assumption (I am not familiar with
zipfile's internals)? Or perhaps I am doing something really dumb and there
is an easy way to speed things up?
Assuming that I am correct, my next question is: why compute the crc32 at
all? I mean, I know that it is part of what defines a "zip file", but is it
really necessary for a npz file to be a (compliant) zip file? If, for
example, I open the resulting npz file with a hex editor, and insert a
bogus crc32, np.load will happily load the file anyway (Gnome's Archive
Manager will do the same) To me this suggests that the fact that npz files
are zip files is not that important. .
Perhaps, people think that the ability to browse arrays and extract
individual ones like they would do with a regular zip file is really
important, but reading the little documentation that I found, I got the
impression that npz files are zip files just because this was the easiest
way to have multiple arrays in the same file. But my main point is: it
should be fairly simple to make npz files much more efficient with simple
changes like not computing checksums (or using a different algorithm like
adler32).
Let me know what you think about this. I've searched around the internet,
and on places like Stackoverflow, it seems that the standard answer is: you
are doing it wrong, forget Numpy's format and start using hdf5! Please do
not give that answer. Like I said in the beginning, I am well aware of
hdf5 and I use it on my "production code" (on C++). But I believe that
there should be a lightweight alternative (right now, to use hdf5 I need to
have installed the C library, the C++ wrappers, and the h5py library to
play with the data using Python, that is a bit too heavy for my needs). I
really like Numpy's format (if anything, it makes me feel better knowing
that it is
so easy to reverse engineer it, while the hdf5 format is very
complicated), but the (apparent) poor performance of npz files if a deal
breaker.
Gilberto
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2014-04-17 09:23:07 UTC
Permalink
Post by onefire
What I cannot understand is why savez takes more than 10 times longer
than saving the data to a npy file. The only reason that I could come up
with was the computation of the crc32.

We can all make guesses but the solution is just to profile it :-). %prun
in ipython (and then if you need more granularity installing line_profiler
is useful).

-n
onefire
2014-04-17 19:30:59 UTC
Permalink
Hi Nathaniel,

Thanks for the suggestion. I did profile the program before, just not using
Python.

But following your suggestion, I used %prun. Here's (part of) the output
(when I use savez):

195503 function calls in 4.466 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
2 2.284 1.142 2.284 1.142 {method 'close' of
'_io.BufferedWriter' objects}
1 0.918 0.918 0.918 0.918 {built-in method remove}
48841 0.568 0.000 0.568 0.000 {method 'write' of
'_io.BufferedWriter' objects}
48829 0.379 0.000 0.379 0.000 {built-in method crc32}
48830 0.148 0.000 0.148 0.000 {method 'read' of
'_io.BufferedReader' objects}
1 0.090 0.090 0.993 0.993 zipfile.py:1315(write)
1 0.072 0.072 0.072 0.072 {method 'tostring' of
'numpy.ndarray' objects}
48848 0.005 0.000 0.005 0.000 {built-in method len}
1 0.001 0.001 0.270 0.270 format.py:362(write_array)
3 0.000 0.000 0.000 0.000 {built-in method open}
1 0.000 0.000 4.466 4.466 npyio.py:560(_savez)
2 0.000 0.000 0.000 0.000 zipfile.py:1459(close)
1 0.000 0.000 4.466 4.466 {built-in method exec}

Here's the output when I use save to save to a npy file:

39 function calls in 0.266 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
4 0.196 0.049 0.196 0.049 {method 'write' of
'_io.BufferedWriter' objects}
1 0.069 0.069 0.069 0.069 {method 'tostring' of
'numpy.ndarray' objects}
1 0.001 0.001 0.266 0.266 format.py:362(write_array)
1 0.000 0.000 0.000 0.000 {built-in method open}
1 0.000 0.000 0.266 0.266 npyio.py:406(save)
1 0.000 0.000 0.000 0.000
format.py:261(write_array_header_1_0)
1 0.000 0.000 0.000 0.000 {method 'close' of
'_io.BufferedWriter' objects}
1 0.000 0.000 0.266 0.266 {built-in method exec}
1 0.000 0.000 0.000 0.000 format.py:154(magic)
1 0.000 0.000 0.000 0.000
format.py:233(header_data_from_array_1_0)
1 0.000 0.000 0.266 0.266 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 numeric.py:462(asanyarray)
1 0.000 0.000 0.000 0.000 py3k.py:28(asbytes)

The calls to close and the built-in method remove seem to be the
responsible for the inefficiency of the Numpy implementation (compared to
the Julia package that I mentioned before). This was tested using Python
3.4 and Numpy 1.8.1.
However if I do the tests with Python 3.3.5 and Numpy 1.8.0, savez becomes
much faster, so I think there is something wrong with this combination
Python 3.4/Numpy 1.8.1.
Also, if I use Python 2.4 and Numpy 1.2 (from my school's cluster) I get
that np.save takes about 3.5 seconds and np.savez takes about 7 seconds, so
all these timings seem to be hugely dependent on the system/version (maybe
this explain David Palao's results?).

However, they all point out that a significant amount of time is spent
computing the crc32. Notice that prun reports that it takes 0.379 second to
compute the crc32 of an array that takes 0.2 seconds to save to a npy file.
I believe this is too much! And it get worse if you try to save bigger
arrays.
Post by Nathaniel Smith
Post by onefire
What I cannot understand is why savez takes more than 10 times longer
than saving the data to a npy file. The only reason that I could come up
with was the computation of the crc32.
We can all make guesses but the solution is just to profile it :-). %prun
in ipython (and then if you need more granularity installing line_profiler
is useful).
-n
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Julian Taylor
2014-04-17 19:51:32 UTC
Permalink
Post by onefire
Hi Nathaniel,
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.

It would be nice if we could add support for different compression
modules like gzip or xz which allow streaming data directly into a file
without an intermediate.
Valentin Haenel
2014-04-17 20:26:35 UTC
Permalink
Hi,
Post by Julian Taylor
Post by onefire
Hi Nathaniel,
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.

best,

V-
Valentin Haenel
2014-04-17 20:56:27 UTC
Permalink
Post by Valentin Haenel
Hi,
Post by Julian Taylor
Post by onefire
Hi Nathaniel,
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.
There is a proof-of-concept implementation here:

https://github.com/esc/numpy/compare/feature;npz_no_temp_file

Here are the timings, again using ``sync()`` from bloscpack (but it's
just a ``os.system('sync')``, in case you want to run your own
benchmarks):

In [1]: import numpy as np

In [2]: import bloscpack.sysutil as bps

In [3]: x = np.linspace(1, 10, 50000000)

In [4]: %timeit np.save("x.npy", x) ; bps.sync()
1 loops, best of 3: 1.93 s per loop

In [5]: %timeit np.savez("x.npz", x) ; bps.sync()
1 loops, best of 3: 7.88 s per loop

In [6]: %timeit np._savez_no_temp("x.npy", [x], {}, False) ; bps.sync()
1 loops, best of 3: 3.22 s per loop

Not too bad, but still slower than plain NPY, memory copies would be my
guess.

V-

PS: Running Python 2.7.6 :: Anaconda 1.9.2 (64-bit) and Numpy master
Valentin Haenel
2014-04-17 21:18:09 UTC
Permalink
Post by Valentin Haenel
Post by Valentin Haenel
Hi,
Post by Julian Taylor
Post by onefire
Hi Nathaniel,
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.
https://github.com/esc/numpy/compare/feature;npz_no_temp_file
Here are the timings, again using ``sync()`` from bloscpack (but it's
just a ``os.system('sync')``, in case you want to run your own
In [1]: import numpy as np
In [2]: import bloscpack.sysutil as bps
In [3]: x = np.linspace(1, 10, 50000000)
In [4]: %timeit np.save("x.npy", x) ; bps.sync()
1 loops, best of 3: 1.93 s per loop
In [5]: %timeit np.savez("x.npz", x) ; bps.sync()
1 loops, best of 3: 7.88 s per loop
In [6]: %timeit np._savez_no_temp("x.npy", [x], {}, False) ; bps.sync()
1 loops, best of 3: 3.22 s per loop
Not too bad, but still slower than plain NPY, memory copies would be my
guess.
PS: Running Python 2.7.6 :: Anaconda 1.9.2 (64-bit) and Numpy master
Also, in cae you were wondering, here is the profiler output:

In [2]: %prun -l 10 np._savez_no_temp("x.npy", [x], {}, False)
943 function calls (917 primitive calls) in 1.139 seconds

Ordered by: internal time
List reduced from 99 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.386 0.386 0.386 0.386 {zlib.crc32}
8 0.234 0.029 0.234 0.029 {method 'write' of 'file' objects}
27 0.162 0.006 0.162 0.006 {method 'write' of 'cStringIO.StringO' objects}
1 0.158 0.158 0.158 0.158 {method 'getvalue' of 'cStringIO.StringO' objects}
1 0.091 0.091 0.091 0.091 {method 'close' of 'file' objects}
24 0.064 0.003 0.064 0.003 {method 'tobytes' of 'numpy.ndarray' objects}
1 0.022 0.022 1.119 1.119 npyio.py:608(_savez_no_temp)
1 0.019 0.019 1.139 1.139 <string>:1(<module>)
1 0.002 0.002 0.227 0.227 format.py:362(write_array)
1 0.001 0.001 0.001 0.001 zipfile.py:433(_GenerateCRCTable)

V-
Valentin Haenel
2014-04-17 21:35:37 UTC
Permalink
Hi,
Post by Valentin Haenel
Post by Valentin Haenel
Post by Valentin Haenel
Hi,
Post by Julian Taylor
Post by onefire
Hi Nathaniel,
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.
https://github.com/esc/numpy/compare/feature;npz_no_temp_file
Here are the timings, again using ``sync()`` from bloscpack (but it's
just a ``os.system('sync')``, in case you want to run your own
In [1]: import numpy as np
In [2]: import bloscpack.sysutil as bps
In [3]: x = np.linspace(1, 10, 50000000)
In [4]: %timeit np.save("x.npy", x) ; bps.sync()
1 loops, best of 3: 1.93 s per loop
In [5]: %timeit np.savez("x.npz", x) ; bps.sync()
1 loops, best of 3: 7.88 s per loop
In [6]: %timeit np._savez_no_temp("x.npy", [x], {}, False) ; bps.sync()
1 loops, best of 3: 3.22 s per loop
Not too bad, but still slower than plain NPY, memory copies would be my
guess.
PS: Running Python 2.7.6 :: Anaconda 1.9.2 (64-bit) and Numpy master
In [2]: %prun -l 10 np._savez_no_temp("x.npy", [x], {}, False)
943 function calls (917 primitive calls) in 1.139 seconds
Ordered by: internal time
List reduced from 99 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.386 0.386 0.386 0.386 {zlib.crc32}
8 0.234 0.029 0.234 0.029 {method 'write' of 'file' objects}
27 0.162 0.006 0.162 0.006 {method 'write' of 'cStringIO.StringO' objects}
1 0.158 0.158 0.158 0.158 {method 'getvalue' of 'cStringIO.StringO' objects}
1 0.091 0.091 0.091 0.091 {method 'close' of 'file' objects}
24 0.064 0.003 0.064 0.003 {method 'tobytes' of 'numpy.ndarray' objects}
1 0.022 0.022 1.119 1.119 npyio.py:608(_savez_no_temp)
1 0.019 0.019 1.139 1.139 <string>:1(<module>)
1 0.002 0.002 0.227 0.227 format.py:362(write_array)
1 0.001 0.001 0.001 0.001 zipfile.py:433(_GenerateCRCTable)
And, to shed some more light on this, the kernprofiler (line-by-line)
output (of a slightly modified version):

zsh» cat mp.py
import numpy as np
x = np.linspace(1, 10, 50000000)
np._savez_no_temp("x.npy", [x], {}, False)

zsh» ./kernprof.py -v -l mp.py
Wrote profile results to mp.py.lprof
Timer unit: 1e-06 s

File: numpy/lib/npyio.py
Function: _savez_no_temp at line 608
Total time: 1.16438 s

Line # Hits Time Per Hit % Time Line Contents
==============================================================
608 @profile
609 def _savez_no_temp(file, args, kwds, compress):
610 # Import is postponed to here since zipfile depends on gzip, an optional
611 # component of the so-called standard library.
612 1 5655 5655.0 0.5 import zipfile
613
614 1 6 6.0 0.0 from cStringIO import StringIO
615
616 1 2 2.0 0.0 if isinstance(file, basestring):
617 1 2 2.0 0.0 if not file.endswith('.npz'):
618 1 1 1.0 0.0 file = file + '.npz'
619
620 1 1 1.0 0.0 namedict = kwds
621 2 4 2.0 0.0 for i, val in enumerate(args):
622 1 6 6.0 0.0 key = 'arr_%d' % i
623 1 1 1.0 0.0 if key in namedict.keys():
624 raise ValueError(
625 "Cannot use un-named variables and keyword %s" % key)
626 1 1 1.0 0.0 namedict[key] = val
627
628 1 0 0.0 0.0 if compress:
629 compression = zipfile.ZIP_DEFLATED
630 else:
631 1 1 1.0 0.0 compression = zipfile.ZIP_STORED
632
633 1 42734 42734.0 3.7 zipf = zipfile_factory(file, mode="w", compression=compression)
634 # reusable memory buffer
635 1 5 5.0 0.0 sio = StringIO()
636 2 10 5.0 0.0 for key, val in namedict.items():
637 1 3 3.0 0.0 fname = key + '.npy'
638 1 4 4.0 0.0 sio.seek(0) # reset buffer
639 1 219843 219843.0 18.9 format.write_array(sio, np.asanyarray(val))
640 1 156962 156962.0 13.5 array_bytes = sio.getvalue(True)
641 1 625162 625162.0 53.7 zipf.writestr(fname, array_bytes)
642
643 1 113977 113977.0 9.8 zipf.close()

So it would appear that >50% of the time is spent in the
zipfile.writestr.

V-
Valentin Haenel
2014-04-18 16:29:27 UTC
Permalink
Hi,
Post by Valentin Haenel
Post by Valentin Haenel
Post by Julian Taylor
Post by onefire
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.
https://github.com/esc/numpy/compare/feature;npz_no_temp_file
Anybody interested in me fixing this up (unit tests, API, etc..) for
inclusion?

V-
Julian Taylor
2014-04-18 17:20:33 UTC
Permalink
Post by Valentin Haenel
Hi,
Post by Valentin Haenel
Post by Valentin Haenel
Post by Julian Taylor
Post by onefire
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.
https://github.com/esc/numpy/compare/feature;npz_no_temp_file
Anybody interested in me fixing this up (unit tests, API, etc..) for
inclusion?
I wonder if it would be better to instead use a fifo to avoid the memory
doubling. Windows probably hasn't got them (exposed via python) but one
can slap a platform check in front.
attached a proof of concept without proper error handling (which is
unfortunately the tricky part)
Valentin Haenel
2014-07-04 13:49:54 UTC
Permalink
sorry, for the top-post, but should we add this as an issue on the
github tracker? I'd like to revisit it this summer.

V-
Post by Julian Taylor
Post by Valentin Haenel
Hi,
Post by Valentin Haenel
Post by Valentin Haenel
Post by Julian Taylor
Post by onefire
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.
https://github.com/esc/numpy/compare/feature;npz_no_temp_file
Anybody interested in me fixing this up (unit tests, API, etc..) for
inclusion?
I wonder if it would be better to instead use a fifo to avoid the memory
doubling. Windows probably hasn't got them (exposed via python) but one
can slap a platform check in front.
attached a proof of concept without proper error handling (which is
unfortunately the tricky part)
Sturla Molden
2014-07-06 07:52:53 UTC
Permalink
There is no os.mkfifo on Windows.

Sturla
Post by Valentin Haenel
sorry, for the top-post, but should we add this as an issue on the
github tracker? I'd like to revisit it this summer.
V-
Post by Julian Taylor
Post by Valentin Haenel
Hi,
Post by Valentin Haenel
Post by Valentin Haenel
Post by Julian Taylor
Post by onefire
Thanks for the suggestion. I did profile the program before, just not
using Python.
one problem of npz is that the zipfile module does not support streaming
data in (or if it does now we aren't using it).
So numpy writes the file uncompressed to disk and then zips it which is
horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.
https://github.com/esc/numpy/compare/feature;npz_no_temp_file
Anybody interested in me fixing this up (unit tests, API, etc..) for
inclusion?
I wonder if it would be better to instead use a fifo to avoid the memory
doubling. Windows probably hasn't got them (exposed via python) but one
can slap a platform check in front.
attached a proof of concept without proper error handling (which is
unfortunately the tricky part)
David Palao
2014-04-17 09:17:37 UTC
Permalink
Post by onefire
Hi all,
I have been playing with the idea of using Numpy's binary format as a
lightweight alternative to HDF5 (which I believe is the "right" way to do if
one does not have a problem with the dependency).
I am pretty happy with the npy format, but the npz format seems to be broken
as far as performance is concerned (or I am missing obvious!). The following
ln [1]: import numpy as np
In [2]: x = np.linspace(1, 10, 50000000)
In [3]: %time np.save("x.npy", x)
CPU times: user 40 ms, sys: 230 ms, total: 270 ms
Wall time: 488 ms
In [4]: %time np.savez("x.npz", data = x)
CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
Wall time: 7.7 s
Hi,
In my case (python-2.7.3, numpy-1.6.1):

In [23]: %time save("xx.npy", x)
CPU times: user 0.00 s, sys: 0.23 s, total: 0.23 s
Wall time: 4.07 s

In [24]: %time savez("xx.npz", data = x)
CPU times: user 0.42 s, sys: 0.61 s, total: 1.02 s
Wall time: 4.26 s

In my case I don't see the "unbelievable amount of overhead" of the npz thing.

Best
Valentin Haenel
2014-04-17 20:01:04 UTC
Permalink
Hi again,
Post by David Palao
Post by onefire
Hi all,
I have been playing with the idea of using Numpy's binary format as a
lightweight alternative to HDF5 (which I believe is the "right" way to do if
one does not have a problem with the dependency).
I am pretty happy with the npy format, but the npz format seems to be broken
as far as performance is concerned (or I am missing obvious!). The following
ln [1]: import numpy as np
In [2]: x = np.linspace(1, 10, 50000000)
In [3]: %time np.save("x.npy", x)
CPU times: user 40 ms, sys: 230 ms, total: 270 ms
Wall time: 488 ms
In [4]: %time np.savez("x.npz", data = x)
CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
Wall time: 7.7 s
Hi,
In [23]: %time save("xx.npy", x)
CPU times: user 0.00 s, sys: 0.23 s, total: 0.23 s
Wall time: 4.07 s
In [24]: %time savez("xx.npz", data = x)
CPU times: user 0.42 s, sys: 0.61 s, total: 1.02 s
Wall time: 4.26 s
In my case I don't see the "unbelievable amount of overhead" of the npz thing.
When profiling IO operations, there are many factors that can influence
measurements. In my experience on Linux: these may include: the filesystem
cache, the cpu govenor, the system load, power saving features, the type
of hard drive and how it is connected, any powersaving features (e.g.
laptop-mode tools) and any cron-jobs that might be running (e.g.
updating locate DB).

So for example when measuring the time it takes to write something to
disk on Linux, I always at least include a call to ``sync``
which will ensure that all kernel filesystem buffers will be written to
disk. Even then, you may still have a lot of variability.

As part of bloscpack.sysutil I have wrapped this to be available from
Python (needs root though). So, to re-rurn the benchmarks, doing each
one twice:

In [1]: import numpy as np

In [2]: import bloscpack.sysutil as bps

In [3]: x = np.linspace(1, 10, 50000000)

In [4]: %time np.save("x.npy", x)
CPU times: user 12 ms, sys: 356 ms, total: 368 ms
Wall time: 1.41 s

In [5]: %time np.save("x.npy", x)
CPU times: user 0 ns, sys: 368 ms, total: 368 ms
Wall time: 811 ms

In [6]: %time np.savez("x.npz", data = x)
CPU times: user 540 ms, sys: 864 ms, total: 1.4 s
Wall time: 4.74 s

In [7]: %time np.savez("x.npz", data = x)
CPU times: user 580 ms, sys: 808 ms, total: 1.39 s
Wall time: 9.47 s

In [8]: bps.sync()

In [9]: %time np.save("x.npy", x) ; bps.sync()
CPU times: user 0 ns, sys: 368 ms, total: 368 ms
Wall time: 2.2 s

In [10]: %time np.save("x.npy", x) ; bps.sync()
CPU times: user 0 ns, sys: 356 ms, total: 356 ms
Wall time: 2.16 s

In [11]: bps.sync()

In [12]: %time np.savez("x.npz", x) ; bps.sync()
CPU times: user 564 ms, sys: 816 ms, total: 1.38 s
Wall time: 8.21 s

In [13]: %time np.savez("x.npz", x) ; bps.sync()
CPU times: user 588 ms, sys: 772 ms, total: 1.36 s
Wall time: 6.83 s

As you can see, even when using ``sync`` the values might vary, so in
addition it might be worth using %timeit, which will at least run it
three times and select the best one in its default setting:

In [14]: %timeit np.save("x.npy", x)
1 loops, best of 3: 2.4 s per loop

In [15]: %timeit np.savez("x.npz", x)
1 loops, best of 3: 7.1 s per loop

In [16]: %timeit np.save("x.npy", x) ; bps.sync()
1 loops, best of 3: 3.11 s per loop

In [17]: %timeit np.savez("x.npz", x) ; bps.sync()
1 loops, best of 3: 7.36 s per loop

So, anyway, given these readings, I would tend to support the claim
that there is something slowing down writing when using plain NPZ w/o
compression.

FYI: when reading, the kernel keeps files that were recently read in the
filesystem buffers and so when measuring reads, I tend to drop those
caches using ``drop_caches()`` from bloscpack.sysutil (which wraps using
the linux proc fs).

best,

V-
Valentin Haenel
2014-04-17 22:45:53 UTC
Permalink
Hello,
Post by Valentin Haenel
As part of bloscpack.sysutil I have wrapped this to be available from
Python (needs root though). So, to re-rurn the benchmarks, doing each
Actually, I just realized, that doing a ``sync`` doesn't require root.

my bad,

V-
onefire
2014-04-18 00:09:59 UTC
Permalink
Interesting! Using sync() as you suggested makes every write slower, and
it decreases the time difference between save and savez,
so maybe I was observing the 10 times difference because the file system
buffers were being flushed immediately after a call to savez, but not right
after a call to np.save.

I think your workaround might help, but a better solution would be to not
use Python's zipfile module at all. This would make it possible to, say,
let the user choose the checksum algorithm or to turn that off.
Or maybe the compression stuff makes this route too complicated to be worth
the trouble? (after all, the zip format is not that hard to understand)

Gilberto
Post by Valentin Haenel
Hello,
Post by Valentin Haenel
As part of bloscpack.sysutil I have wrapped this to be available from
Python (needs root though). So, to re-rurn the benchmarks, doing each
Actually, I just realized, that doing a ``sync`` doesn't require root.
my bad,
V-
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
onefire
2014-04-18 00:12:43 UTC
Permalink
I found this github issue (https://github.com/numpy/numpy/pull/3465) where
someone mentions the idea of forking the zip library.

Gilberto
Post by onefire
Interesting! Using sync() as you suggested makes every write slower, and
it decreases the time difference between save and savez,
so maybe I was observing the 10 times difference because the file system
buffers were being flushed immediately after a call to savez, but not right
after a call to np.save.
I think your workaround might help, but a better solution would be to not
use Python's zipfile module at all. This would make it possible to, say,
let the user choose the checksum algorithm or to turn that off.
Or maybe the compression stuff makes this route too complicated to be
worth the trouble? (after all, the zip format is not that hard to
understand)
Gilberto
Post by Valentin Haenel
Hello,
Post by Valentin Haenel
As part of bloscpack.sysutil I have wrapped this to be available from
Python (needs root though). So, to re-rurn the benchmarks, doing each
Actually, I just realized, that doing a ``sync`` doesn't require root.
my bad,
V-
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Valentin Haenel
2014-04-18 10:16:32 UTC
Permalink
Hi Gilberto,
Post by onefire
Interesting! Using sync() as you suggested makes every write slower, and
it decreases the time difference between save and savez,
so maybe I was observing the 10 times difference because the file system
buffers were being flushed immediately after a call to savez, but not right
after a call to np.save.
I am happy that you found my suggestion useful! Given that the current
savez implementation first writes temporary arrays to disk and then
copies them from their temporary location to the zipfile, one might
argue that this is what causes the buffers to be flushed, since it does
more IO than the save implementation. Then again I don't really now the
gory details of the how the filesystem buffers behave and how they can
be configured.

best,

V-
Valentin Haenel
2014-04-18 11:01:09 UTC
Permalink
Hi again,
Post by onefire
I think your workaround might help, but a better solution would be to not
use Python's zipfile module at all. This would make it possible to, say,
let the user choose the checksum algorithm or to turn that off.
Or maybe the compression stuff makes this route too complicated to be worth
the trouble? (after all, the zip format is not that hard to understand)
Just to give you an idea of what my aforementioned Bloscpack library can
do in the case of linspace:

In [1]: import numpy as np

In [2]: import bloscpack as bp

In [3]: import bloscpack.sysutil as bps

In [4]: x = np.linspace(1, 10, 50000000)

In [5]: %timeit np.save("x.npy", x) ; bps.sync()
1 loops, best of 3: 2.12 s per loop

In [6]: %timeit bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
1 loops, best of 3: 627 ms per loop

In [7]: %timeit -n 3 -r 3 np.save("x.npy", x) ; bps.sync()
3 loops, best of 3: 1.92 s per loop

In [8]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
3 loops, best of 3: 564 ms per loop

In [9]: ls -lah x.npy x.blp
-rw-r--r-- 1 root root 49M Apr 18 12:53 x.blp
-rw-r--r-- 1 root root 382M Apr 18 12:52 x.npy

However, this is a bit of special case, since Blosc does extremely well
-- both speed and size wise -- on the linspace data, your milage may
vary.

best,

V-
Francesc Alted
2014-04-18 12:03:00 UTC
Permalink
Post by Valentin Haenel
Hi again,
Post by onefire
I think your workaround might help, but a better solution would be to not
use Python's zipfile module at all. This would make it possible to, say,
let the user choose the checksum algorithm or to turn that off.
Or maybe the compression stuff makes this route too complicated to be worth
the trouble? (after all, the zip format is not that hard to understand)
Just to give you an idea of what my aforementioned Bloscpack library can
In [1]: import numpy as np
In [2]: import bloscpack as bp
In [3]: import bloscpack.sysutil as bps
In [4]: x = np.linspace(1, 10, 50000000)
In [5]: %timeit np.save("x.npy", x) ; bps.sync()
1 loops, best of 3: 2.12 s per loop
In [6]: %timeit bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
1 loops, best of 3: 627 ms per loop
In [7]: %timeit -n 3 -r 3 np.save("x.npy", x) ; bps.sync()
3 loops, best of 3: 1.92 s per loop
In [8]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
3 loops, best of 3: 564 ms per loop
In [9]: ls -lah x.npy x.blp
-rw-r--r-- 1 root root 49M Apr 18 12:53 x.blp
-rw-r--r-- 1 root root 382M Apr 18 12:52 x.npy
However, this is a bit of special case, since Blosc does extremely well
-- both speed and size wise -- on the linspace data, your milage may
vary.
Exactly, and besides, Blosc can use different codes inside it. Just for
completeness, here it is a small benchmark of what you can expect from
them (my laptop does not have a SSD, so my figures are a bit slow
compared with Valentin's):

In [50]: %timeit -n 3 -r 3 np.save("x.npy", x) ; bps.sync()
3 loops, best of 3: 5.7 s per loop

In [51]: cargs = bp.args.DEFAULT_BLOSC_ARGS

In [52]: cargs['cname'] = 'blosclz'

In [53]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-blosclz.blp',
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.12 s per loop

In [54]: cargs['cname'] = 'lz4'

In [55]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-lz4.blp',
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 985 ms per loop

In [56]: cargs['cname'] = 'lz4hc'

In [57]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-lz4hc.blp',
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.95 s per loop

In [58]: cargs['cname'] = 'snappy'

In [59]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-snappy.blp',
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.11 s per loop

In [60]: cargs['cname'] = 'zlib'

In [61]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-zlib.blp',
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 3.12 s per loop

so all the codecs can make the storage go faster than a pure np.save(),
and most specially blosclz, lz4 and snappy. However, lz4hc and zlib
achieve the best compression ratios:

In [62]: ls -lht x*.*
-rw-r--r-- 1 faltet users 7,0M 18 abr 13:49 x-zlib.blp
-rw-r--r-- 1 faltet users 54M 18 abr 13:48 x-snappy.blp
-rw-r--r-- 1 faltet users 7,0M 18 abr 13:48 x-lz4hc.blp
-rw-r--r-- 1 faltet users 48M 18 abr 13:47 x-lz4.blp
-rw-r--r-- 1 faltet users 49M 18 abr 13:47 x-blosclz.blp
-rw-r--r-- 1 faltet users 382M 18 abr 13:42 x.npy

But again, we are talking about a specially nice compression case.
--
Francesc Alted
Loading...