[Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)

Discussion:

Malcolm Reynolds

2012-01-18 14:59:26 UTC

Hi,

I've built a system which allocates numpy arrays and processes them in
C++ code (this is because I'm building a native code module using
boost.python and it makes sense to use numpy data storage to then deal
with outputs in python, without having to do any copying). Everything
seems fine except when I parallelise the main loop, (openmp and TBB
give the same results) in which case I see a whole bunch of messages
saying

"reference count error detected: an attempt was made to deallocate 12 (d)"

sometimes during the running of the program, sometimes all at the end
(presumably when all the destructors in my program run).

To clarify, the loop I am now running parallel takes read-only
parameters (enforced by the C++ compiler using 'const') and as far as
I can tell there are no race conditions with multiple threads writing
to the same numpy arrays at once or anything obvious like that.

I recompiled numpy (I'm using 1.6.1 from the official git repository)
to print out some extra information with the reference count message,
namely a pointer to the thing which is being erroneously deallocated.
Surprisingly, it is always the same address for any run of the
program, considering this is a message printed out hundreds of times.

I've looked into this a little with GDB and as far as I can see the
object which the message pertains to is an "array descriptor", or at
least that's what I conclude from backtraces similar to the following:

Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
1501 fprintf(stderr, "*** Reference count error detected: \n" \
(gdb) bt
#0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
#1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271
#2 0x0000000103e592d7 in
boost::detail::sp_counted_impl_p<garf::multivariate_normal<double>
const>::dispose (this=<value temporarily unavailable, due to
optimizations>) at refcount.hpp:36
#3 .... my code

Obviously I can turn off the parallelism to make this problem go away,
but since my underlying algorithm is trivially parallelisable I was
counting on being able to achieve linear speedup across cores..
Currently I can, and as far as I know there are no actual incorrect
results being produced by the program. However, in my field (Machine
Learning) it's difficult enough to know whether the numbers calculated
are sensible even without the presence of these kind of warnings, so
I'd like to get a handle on at least why this is happening so I'd know
know whether I can safely ignore it.

My guess at what might be happening is that the multiple threads are
dealing with some object concurrently and the updates to the reference
count are not processed atomically, meaning that there are too many
DECREFs which happen later on. I had presumed that allocated different
numpy matrices in different threads, and then all reading from central
numpy matrices would work fine, but apparently there is something I
missed, pertaining to descriptors..

Can anyone offer any guidance, or at least tell me this is safe to
ignore? I can reproduce the problem reliably, so if you need me to do
some digging with GDB at the point the error takes place I can do
that.

Many thanks,

Malcolm

Robert Kern

2012-01-18 15:15:31 UTC

Permalink

On Wed, Jan 18, 2012 at 14:59, Malcolm Reynolds

Post by Malcolm Reynolds
Hi,
I've built a system which allocates numpy arrays and processes them in
C++ code (this is because I'm building a native code module using
boost.python and it makes sense to use numpy data storage to then deal
with outputs in python, without having to do any copying). Everything
seems fine except when I parallelise the main loop, (openmp and TBB
give the same results) in which case I see a whole bunch of messages
saying
"reference count error detected: an attempt was made to deallocate 12 (d)"
sometimes during the running of the program, sometimes all at the end
(presumably when all the destructors in my program run).
To clarify, the loop I am now running parallel takes read-only
parameters (enforced by the C++ compiler using 'const') and as far as
I can tell there are no race conditions with multiple threads writing
to the same numpy arrays at once or anything obvious like that.
I recompiled numpy (I'm using 1.6.1 from the official git repository)
to print out some extra information with the reference count message,
namely a pointer to the thing which is being erroneously deallocated.
Surprisingly, it is always the same address for any run of the
program, considering this is a message printed out hundreds of times.
I've looked into this a little with GDB and as far as I can see the
object which the message pertains to is an "array descriptor", or at
Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
1501 fprintf(stderr, "*** Reference count error detected: \n" \
(gdb) bt
#0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
#1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271
#2 0x0000000103e592d7 in
boost::detail::sp_counted_impl_p<garf::multivariate_normal<double>
const>::dispose (this=<value temporarily unavailable, due to
optimizations>) at refcount.hpp:36
#3 .... my code

I suspect there is some problem with the reference counting that you
are doing at the C++ level that is causing you to do too many
Py_DECREFs to the numpy objects, and this is being identified by the
arraydescr_dealloc() routine. (By the way, arraydescrs are the C-level
implementation of dtype objects.) Reading the comments just before
descriptor.c:1501 points out that this warning is being printed
because something is trying to deallocate the builtin np.dtype('d') ==
np.dtype('float64') dtype. This should never happen. The refcount for
these objects should always be > 0 because numpy itself holds
references to them.

I suspect that you are obtaining the numpy object (1 Py_INCREF) before
you split into multiple threads but releasing them in each thread
(multiple Py_DECREFs). This is probably being hidden from you by the
boost.python interface and/or the boost::detail::sp_counted_impl_p<>
smart(ish) pointer. Check the backtrace where your code starts to
verify if this looks to be the case.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco

Malcolm Reynolds

2012-01-18 16:14:32 UTC

Permalink

Post by Robert Kern
I suspect that you are obtaining the numpy object (1 Py_INCREF) before
you split into multiple threads but releasing them in each thread
(multiple Py_DECREFs). This is probably being hidden from you by the
boost.python interface and/or the boost::detail::sp_counted_impl_p<>
smart(ish) pointer. Check the backtrace where your code starts to
verify if this looks to be the case.

Thankyou for your quick reply. This makes a lot of sense, I'm just
having trouble seeing where this could be happening as everything I
pass into each parallel computation strand is pass down as either
pointer-to-consts or reference-to-const - the only things that need to
be modified (for example random number generator objects) are created
uniquely inside each iteration of the for loop so it can't be that.

This information about which object has the reference count problem
helps though, I will keep digging. I'm vaguely planning on trying to
track every incref and decref so I can pin down which object has an
unbalanced amount - to do this I want to know the address of the
array, rather than the associated datatype descriptor - I assume I
want to pay attention to the (self=0x117e0e850) in this line, and that
is the address of the array I am mishandling?

#1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271

Malcolm

Post by Robert Kern
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

2012-01-18 16:54:53 UTC

Permalink

On Wed, Jan 18, 2012 at 16:14, Malcolm Reynolds

Post by Malcolm Reynolds

My C++-fu is fairly weak, so I'm never really sure what the smart
pointers are doing when. If there are tracing features that you can
turn on, try that. Is this deallocation of the smart pointer to the
"garf::multivariate_normal<double> const" being done inside the loop
or outside back in the main thread? Where did it get created?

Post by Malcolm Reynolds
This information about which object has the reference count problem
helps though, I will keep digging. I'm vaguely planning on trying to
track every incref and decref so I can pin down which object has an
unbalanced amount - to do this I want to know the address of the
array, rather than the associated datatype descriptor - I assume I
want to pay attention to the (self=0x117e0e850) in this line, and that
is the address of the array I am mishandling?
#1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271

Yes.