Discussion:
[Numpy-discussion] constructing record dtypes from the c-api
Jason Newton
2015-07-24 02:55:28 UTC
Permalink
Hi folks,

The moderator for the ML approved my subscription so I can now post
this back in the numpy list rather than scipy. Apologies for the
duplicate/cross posting.


I was trying to figure out how to make a dtype for a c-struct on the
c-side and storing that in some boost python libraries I'm making.

Imagine the following c-struct, greatly simplified of course from the
real ones I need to expose:

struct datum{
double position[3];
float velocity[3];
int32_t d0;
uint64_t time_of_receipt;
};


How would you make the dtype/PyArray_Descr for this?

I have as a point of reference compound types in HDF for similar
struct descriptions (h5py makes these nearly 1:1 and converts back and
forth to dtypes and hdf5 types, it uses Cython to accomplish this) -
but I don't want to bring in hdf for this task - I'm not sure how well
the offsets would go over in that translation to h5py too.

Proper/best solutions would make use of offsetof as we insert entries
to the dtype/PyArray_Descr. It's fine if you do this in straight C -
I just need to figure out how to accomplish this in a practical way.

The language I'm working in is C++11. The end goal is probably going
to be to create a helper infrastructure to allow this to be made
automatically-ish provided implementation of a [static] visitor
pattern in C++. The idea is to make numpy compatible c++ POD types
rather than use Boost.Python wrapped proxies for every object which
will cut down on some complicated and time consuming code (both of
computer and developer) when ndarrays are what's called for.

Related - would one work with record dtypes passed to C? How would
one lookup fields and offsets within a structure?

Thanks for any advisement!

-Jason
Jason Newton
2015-07-25 04:26:01 UTC
Permalink
After drilling through the sources a second time, I found it was
numpy/core/src/multiarray/descriptor.c was the file to consult with
the primary routine being PyArray_DescrConverter and _convert_from_*
functions being the most interesting to read and glean the
capabilities of this with.

So in particular it looks like the _convert_from_dict based path is
the way to go to allow custom field offsets to safely and
transparently map C POD structs with the offset information generated
at compile time to hopefully keep dtype's in perfect sync with C
sources vs declaring on the python source side. I plan on building a
helper class to generate the dictionaries for this subroutine since
something akin to the list dtype specification is more user-friendly
(even towards me).

-Jason
Post by Jason Newton
Hi folks,
The moderator for the ML approved my subscription so I can now post
this back in the numpy list rather than scipy. Apologies for the
duplicate/cross posting.
I was trying to figure out how to make a dtype for a c-struct on the
c-side and storing that in some boost python libraries I'm making.
Imagine the following c-struct, greatly simplified of course from the
struct datum{
double position[3];
float velocity[3];
int32_t d0;
uint64_t time_of_receipt;
};
How would you make the dtype/PyArray_Descr for this?
I have as a point of reference compound types in HDF for similar
struct descriptions (h5py makes these nearly 1:1 and converts back and
forth to dtypes and hdf5 types, it uses Cython to accomplish this) -
but I don't want to bring in hdf for this task - I'm not sure how well
the offsets would go over in that translation to h5py too.
Proper/best solutions would make use of offsetof as we insert entries
to the dtype/PyArray_Descr. It's fine if you do this in straight C -
I just need to figure out how to accomplish this in a practical way.
The language I'm working in is C++11. The end goal is probably going
to be to create a helper infrastructure to allow this to be made
automatically-ish provided implementation of a [static] visitor
pattern in C++. The idea is to make numpy compatible c++ POD types
rather than use Boost.Python wrapped proxies for every object which
will cut down on some complicated and time consuming code (both of
computer and developer) when ndarrays are what's called for.
Related - would one work with record dtypes passed to C? How would
one lookup fields and offsets within a structure?
Thanks for any advisement!
-Jason
Allan Haldane
2015-07-25 17:31:19 UTC
Permalink
Hi Jason,

As I understand numpy has been set up to mirror C-structs as long as you
use the 'align' flag. For example, your struct can be represented as
Post by Jason Newton
Post by Jason Newton
np.dtype('f8,f4,i4,u8', align=True)
(assuming 32 bit floats). The offsets of the fields should be exactly
the offsets of the elements of the struct. In C you can create the dtype
using PyArray_DescrAlignConverter.

If you want to get the field offsets given a numpy dtype in C, you need
to iterate through the 'names' and 'fields' attributes of the dtype, eg


PyObject *key, *tup, *offobj;
n = PyTuple_Size(descr->names);
for (i = 0; i < n; i++) {
key = PyTuple_GetItem(descr->names, i);
tup = PyDict_GetItem(descr->fields, key);
offobj = PyTuple_GetItem(tup, 1);
offset = PyInt_AsSsize_t(offobj);
// do something with offset
}

(error checks & DECREFS might be needed)

Allan
Post by Jason Newton
After drilling through the sources a second time, I found it was
numpy/core/src/multiarray/descriptor.c was the file to consult with
the primary routine being PyArray_DescrConverter and _convert_from_*
functions being the most interesting to read and glean the
capabilities of this with.
So in particular it looks like the _convert_from_dict based path is
the way to go to allow custom field offsets to safely and
transparently map C POD structs with the offset information generated
at compile time to hopefully keep dtype's in perfect sync with C
sources vs declaring on the python source side. I plan on building a
helper class to generate the dictionaries for this subroutine since
something akin to the list dtype specification is more user-friendly
(even towards me).
-Jason
Post by Jason Newton
Hi folks,
The moderator for the ML approved my subscription so I can now post
this back in the numpy list rather than scipy. Apologies for the
duplicate/cross posting.
I was trying to figure out how to make a dtype for a c-struct on the
c-side and storing that in some boost python libraries I'm making.
Imagine the following c-struct, greatly simplified of course from the
struct datum{
double position[3];
float velocity[3];
int32_t d0;
uint64_t time_of_receipt;
};
How would you make the dtype/PyArray_Descr for this?
I have as a point of reference compound types in HDF for similar
struct descriptions (h5py makes these nearly 1:1 and converts back and
forth to dtypes and hdf5 types, it uses Cython to accomplish this) -
but I don't want to bring in hdf for this task - I'm not sure how well
the offsets would go over in that translation to h5py too.
Proper/best solutions would make use of offsetof as we insert entries
to the dtype/PyArray_Descr. It's fine if you do this in straight C -
I just need to figure out how to accomplish this in a practical way.
The language I'm working in is C++11. The end goal is probably going
to be to create a helper infrastructure to allow this to be made
automatically-ish provided implementation of a [static] visitor
pattern in C++. The idea is to make numpy compatible c++ POD types
rather than use Boost.Python wrapped proxies for every object which
will cut down on some complicated and time consuming code (both of
computer and developer) when ndarrays are what's called for.
Related - would one work with record dtypes passed to C? How would
one lookup fields and offsets within a structure?
Thanks for any advisement!
-Jason
_______________________________________________
NumPy-Discussion mailing list
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...