Discussion:
[Numpy-discussion] A regression in numpy 1.10: VERY slow memory mapped file generation
Nadav Horesh
2015-10-14 05:23:48 UTC
Permalink
I have binary files of size range between few MB to 1GB, which I read process as memory mapped files (via np.memmap). Until numpy 1.9 the creation  of recarray on an existing file (without reading its content) was instantaneous, and now it takes ~6 seconds (system: archlinux on sandy bridge). A profiling (using ipython %prun) top of the list is:


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       21    3.037    0.145    4.266    0.203 _internal.py:372(_check_field_overlap)
  3713431    1.663    0.000    1.663    0.000 _internal.py:366(<genexpr>)
  3713750    0.790    0.000    0.790    0.000 {range}
  3713709    0.406    0.000    0.406    0.000 {method 'update' of 'set' objects}
      322    0.320    0.001    1.984    0.006 {method 'extend' of 'list' objects}

Nadav.
Allan Haldane
2015-10-14 15:59:57 UTC
Permalink
On 10/14/2015 01:23 AM, Nadav Horesh wrote:
>
> I have binary files of size range between few MB to 1GB, which I read process as memory mapped files (via np.memmap). Until numpy 1.9 the creation of recarray on an existing file (without reading its content) was instantaneous, and now it takes ~6 seconds (system: archlinux on sandy bridge). A profiling (using ipython %prun) top of the list is:
>
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 21 3.037 0.145 4.266 0.203 _internal.py:372(_check_field_overlap)
> 3713431 1.663 0.000 1.663 0.000 _internal.py:366(<genexpr>)
> 3713750 0.790 0.000 0.790 0.000 {range}
> 3713709 0.406 0.000 0.406 0.000 {method 'update' of 'set' objects}
> 322 0.320 0.001 1.984 0.006 {method 'extend' of 'list' objects}
>
> Nadav.

Hi Nadav,

The slowdown is due to a problem in PR I introduced to add safety checks
to views of structured arrays (to prevent segfaults involving object
fields), which will hopefully be fixed quickly. It is being discussed here

https://github.com/numpy/numpy/issues/6467

Also, I do not think the problem is with memmap - as far as I have
tested, memmmap is still fast. Most likely what is slowing your script
down is subsequent access to the fields of the array, which is what has
regressed. Is that right?

Allan
Nadav Horesh
2015-10-15 05:10:48 UTC
Permalink
You right, the delay is not in the memmap:
...
_data = N.memmap(filename, dtype=frame_type, mode=mode, offset=fh_size, shape=nframes)
data = _data['data']

The delay is in the 2nd line which selects a field from a recarray.

I use a common drawing application mypaint that uses numpy, and I think it also suffers from that delay.

Thank you,
Nadav
________________________________________
From: NumPy-Discussion <numpy-discussion-***@scipy.org> on behalf of Allan Haldane <***@gmail.com>
Sent: 14 October 2015 18:59
To: numpy-***@scipy.org
Subject: Re: [Numpy-discussion] A regression in numpy 1.10: VERY slow memory mapped file generation

On 10/14/2015 01:23 AM, Nadav Horesh wrote:
>
> I have binary files of size range between few MB to 1GB, which I read process as memory mapped files (via np.memmap). Until numpy 1.9 the creation of recarray on an existing file (without reading its content) was instantaneous, and now it takes ~6 seconds (system: archlinux on sandy bridge). A profiling (using ipython %prun) top of the list is:
>
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 21 3.037 0.145 4.266 0.203 _internal.py:372(_check_field_overlap)
> 3713431 1.663 0.000 1.663 0.000 _internal.py:366(<genexpr>)
> 3713750 0.790 0.000 0.790 0.000 {range}
> 3713709 0.406 0.000 0.406 0.000 {method 'update' of 'set' objects}
> 322 0.320 0.001 1.984 0.006 {method 'extend' of 'list' objects}
>
> Nadav.

Hi Nadav,

The slowdown is due to a problem in PR I introduced to add safety checks
to views of structured arrays (to prevent segfaults involving object
fields), which will hopefully be fixed quickly. It is being discussed here

https://github.com/numpy/numpy/issues/6467

Also, I do not think the problem is with memmap - as far as I have
tested, memmmap is still fast. Most likely what is slowing your script
down is subsequent access to the fields of the array, which is what has
regressed. Is that right?

Allan
Loading...