Martin Spacek
2010-02-26 22:41:25 UTC
I have a 1D structured ndarray with several different fields in the dtype. I'm
using multiprocessing.Pool.map() to iterate over this structured ndarray,
passing one entry (of type numpy.void) at a time to the function to be called by
each process in the pool. After much confusion about why this wasn't working, I
finally realized that unpickling a previously pickled numpy.void results in
dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
This also seems to be the case for recarrays with their numpy.record entries.
I've tried using pickle and cPickle, with both the oldest and the newest
pickling protocol. This is in numpy 1.4 on win32 and win64, and numpy 1.3 on
32-bit linux. I'm using Python 2.6.4 in all cases. I also just tried it on
Python 2.5.2 with numpy 1.0.4. All have the same result, although the garbage
data is different each time.
I suppose numpy.void is as it suggests, a pointer to a specific place in memory.
I'm just surprised that this pointer isn't dereferenced before pickling Or is
it? I'm not skilled in interpreting the strings returned by pickle.dumps(). I do
see the word "Hello" in the string, so maybe the problem is during unpickling.
I've tried doing a copy, and even a deepcopy of a structured array numpy.void
entry, with no luck.
Is this a known limitation? Any suggestions on how I might get around this?
Pool.map() pickles each numpy.void entry as it iterates over the structured
array, before sending it to the next available process. My structured array only
needs to be read from by my multiple processes (one per core), so perhaps
there's a better way than sending copies of entries. Multithreading (using an
implementation of a ThreadPool I found somewhere) doesn't work because I'm
calling scipy.optimize.leastsq, which doesn't seem to release the GIL.
Thanks!
Martin
using multiprocessing.Pool.map() to iterate over this structured ndarray,
passing one entry (of type numpy.void) at a time to the function to be called by
each process in the pool. After much confusion about why this wasn't working, I
finally realized that unpickling a previously pickled numpy.void results in
import numpy as np
x = np.zeros((2,), dtype=('i4,f4,a10'))
x[:] = [(1,2.,'Hello'), (2,3.,"World")]
x
array([(1, 2.0, 'Hello'), (2, 3.0, 'World')],x = np.zeros((2,), dtype=('i4,f4,a10'))
x[:] = [(1,2.,'Hello'), (2,3.,"World")]
x
dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
x[0]
(1, 2.0, 'Hello')type(x[0])
<type 'numpy.void'>import pickle
s = pickle.dumps(x[0])
newx0 = pickle.loads(s)
newx0
(30917960, 1.6904535998413144e-38, '\xd0\xef\x1c\x1eZ\x03\x00d')s = pickle.dumps(x[0])
newx0 = pickle.loads(s)
newx0
s
type(newx0)
<type 'numpy.void'>type(newx0)
newx0.dtype
dtype([('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])x[0].dtype
dtype([('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])np.version.version
'1.4.0'This also seems to be the case for recarrays with their numpy.record entries.
I've tried using pickle and cPickle, with both the oldest and the newest
pickling protocol. This is in numpy 1.4 on win32 and win64, and numpy 1.3 on
32-bit linux. I'm using Python 2.6.4 in all cases. I also just tried it on
Python 2.5.2 with numpy 1.0.4. All have the same result, although the garbage
data is different each time.
I suppose numpy.void is as it suggests, a pointer to a specific place in memory.
I'm just surprised that this pointer isn't dereferenced before pickling Or is
it? I'm not skilled in interpreting the strings returned by pickle.dumps(). I do
see the word "Hello" in the string, so maybe the problem is during unpickling.
I've tried doing a copy, and even a deepcopy of a structured array numpy.void
entry, with no luck.
Is this a known limitation? Any suggestions on how I might get around this?
Pool.map() pickles each numpy.void entry as it iterates over the structured
array, before sending it to the next available process. My structured array only
needs to be read from by my multiple processes (one per core), so perhaps
there's a better way than sending copies of entries. Multithreading (using an
implementation of a ThreadPool I found somewhere) doesn't work because I'm
calling scipy.optimize.leastsq, which doesn't seem to release the GIL.
Thanks!
Martin