Burlen Loring
2016-06-14 00:23:53 UTC
Hi All,
I'm working on a threaded pipeline where we want the end user to be able
to code up Python functions to do numerical work. Threading is all done
in C++11 and in each thread we've acquired gill before we invoke the
user provided Python callback and release it only when the callback
returns. We've used SWIG to expose bindings to C++ objects in Python.
When run with more than 1 thread I get inconsistent segv's in various
numpy routines, and occasionally see a *** Reference count error
detected: an attempt was made to deallocate 11 (f) ***.
To pass data from C++ to the Python callback as numpy array we have
206 //
****************************************************************************
207 template <typename NT>
208 PyArrayObject *new_object(teca_variant_array_impl<NT> *varrt)
209 {
210 // allocate a buffer
211 npy_intp n_elem = varrt->size();
212 size_t n_bytes = n_elem*sizeof(NT);
213 NT *mem = static_cast<NT*>(malloc(n_bytes));
214 if (!mem)
215 {
216 PyErr_Format(PyExc_RuntimeError,
217 "failed to allocate %lu bytes", n_bytes);
218 return nullptr;
219 }
220
221 // copy the data
222 memcpy(mem, varrt->get(), n_bytes);
223
224 // put the buffer in to a new numpy object
225 PyArrayObject *arr = reinterpret_cast<PyArrayObject*>(
226 PyArray_SimpleNewFromData(1, &n_elem,
numpy_tt<NT>::code, mem));
227 PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);
228
229 return arr;
230 }
This is the only place we create numpy objects in the C++ side.
In my demo the Python callback is as follows:
33 def get_execute(rank, var_names):
34 def execute(port, data_in, req):
35 sys.stderr.write('descriptive_stats::execute MPI
%d\n'%(rank))
36
37 mesh = as_teca_cartesian_mesh(data_in[0])
38
39 table = teca_table.New()
40 table.copy_metadata(mesh)
41
42 table.declare_columns(['step','time'], ['ul','d'])
43 table << mesh.get_time_step() << mesh.get_time()
44
45 for var_name in var_names:
46
47 table.declare_columns(['min '+var_name, 'avg
'+var_name, \
48 'max '+var_name, 'std '+var_name, 'low_q
'+var_name, \
49 'med '+var_name, 'up_q '+var_name], ['d']*7)
50
*51 **var = mesh.get_point_arrays().get(var_name).as_array()*
52
53 table << float(np.min(var)) << float(np.average(var)) \
54 << float(np.max(var)) << float(np.std(var)) \
55 << map(float, np.percentile(var, [25.,50.,75.]))
56
57 return table
58 return execute
this callback is the only spot where numpy is used. the as_array call is
implemented by new_object template above.
Further, If I remove our use of PyArray_SimpleNewFromData, by replacing
line 51 in the Python code above with var = np.array(range(1, 1100),
'f'), the problem disappears. It must have something to do with use of
PyArray_SimpleNewFromData.
I'm at a loss to see why things are going south. I'm using the GIL and I
thought that would serialize the Python code. I suspect that numpy is
using global or static variables some where internally and that it's
inherently thread unsafe. Can anyone confirm/deny? maybe point me in the
right direction?
I'm working on a threaded pipeline where we want the end user to be able
to code up Python functions to do numerical work. Threading is all done
in C++11 and in each thread we've acquired gill before we invoke the
user provided Python callback and release it only when the callback
returns. We've used SWIG to expose bindings to C++ objects in Python.
When run with more than 1 thread I get inconsistent segv's in various
numpy routines, and occasionally see a *** Reference count error
detected: an attempt was made to deallocate 11 (f) ***.
To pass data from C++ to the Python callback as numpy array we have
206 //
****************************************************************************
207 template <typename NT>
208 PyArrayObject *new_object(teca_variant_array_impl<NT> *varrt)
209 {
210 // allocate a buffer
211 npy_intp n_elem = varrt->size();
212 size_t n_bytes = n_elem*sizeof(NT);
213 NT *mem = static_cast<NT*>(malloc(n_bytes));
214 if (!mem)
215 {
216 PyErr_Format(PyExc_RuntimeError,
217 "failed to allocate %lu bytes", n_bytes);
218 return nullptr;
219 }
220
221 // copy the data
222 memcpy(mem, varrt->get(), n_bytes);
223
224 // put the buffer in to a new numpy object
225 PyArrayObject *arr = reinterpret_cast<PyArrayObject*>(
226 PyArray_SimpleNewFromData(1, &n_elem,
numpy_tt<NT>::code, mem));
227 PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);
228
229 return arr;
230 }
This is the only place we create numpy objects in the C++ side.
In my demo the Python callback is as follows:
33 def get_execute(rank, var_names):
34 def execute(port, data_in, req):
35 sys.stderr.write('descriptive_stats::execute MPI
%d\n'%(rank))
36
37 mesh = as_teca_cartesian_mesh(data_in[0])
38
39 table = teca_table.New()
40 table.copy_metadata(mesh)
41
42 table.declare_columns(['step','time'], ['ul','d'])
43 table << mesh.get_time_step() << mesh.get_time()
44
45 for var_name in var_names:
46
47 table.declare_columns(['min '+var_name, 'avg
'+var_name, \
48 'max '+var_name, 'std '+var_name, 'low_q
'+var_name, \
49 'med '+var_name, 'up_q '+var_name], ['d']*7)
50
*51 **var = mesh.get_point_arrays().get(var_name).as_array()*
52
53 table << float(np.min(var)) << float(np.average(var)) \
54 << float(np.max(var)) << float(np.std(var)) \
55 << map(float, np.percentile(var, [25.,50.,75.]))
56
57 return table
58 return execute
this callback is the only spot where numpy is used. the as_array call is
implemented by new_object template above.
Further, If I remove our use of PyArray_SimpleNewFromData, by replacing
line 51 in the Python code above with var = np.array(range(1, 1100),
'f'), the problem disappears. It must have something to do with use of
PyArray_SimpleNewFromData.
I'm at a loss to see why things are going south. I'm using the GIL and I
thought that would serialize the Python code. I suspect that numpy is
using global or static variables some where internally and that it's
inherently thread unsafe. Can anyone confirm/deny? maybe point me in the
right direction?