[Numpy-discussion] ANN: xarray v0.9 released

Discussion:

Stephan Hoyer

2017-02-01 04:19:08 UTC

I'm pleased to announce the release of the latest major version of xarray,
v0.9.

xarray is an open source project and Python package that provides a toolkit
and data structures for N-dimensional labeled arrays. Its approach combines
an API inspired by pandas with the Common Data Model for self-described
scientific data.

This release includes five months worth of enhancements and bug fixes from
24 contributors, including some significant enhancements to the data model
that are not fully backwards compatible.

Highlights include:
- Coordinates are now optional in the xarray data model, even for
dimensions.
- Changes to caching, lazy loading and pickling to improve xarrayâs
experience for parallel computing.
- Improvements for accessing and manipulating pandas.MultiIndex levels.
- Many new methods and functions, including quantile(), cumsum(),
cumprod(), combine_firstset_index(), reset_index(), reorder_levels(),
full_like(), zeros_like(), ones_like(), open_dataarray(), compute(),
Dataset.info(), testing.assert_equal(), testing.assert_identical(), and
testing.assert_allclose().

For more details, read the full release notes:
http://xarray.pydata.org/en/latest/whats-new.html

You can install xarray with pip or conda:
pip install xarray
conda install -c conda-forge xarray

Best,
Stephan

Marmaduke Woodman

2017-02-01 08:55:31 UTC

Permalink

This release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant enhancements to the data model that are not fully backwards compatible.

Looks very nice; is the API stable or are you waiting for a v1.0 release?

Is there significant overhead compared to plain ndarray?

Stephan Hoyer

2017-02-01 17:33:51 UTC

Permalink

Post by Marmaduke Woodman
Looks very nice; is the API stable or are you waiting for a v1.0 release?

We are pretty close to full API stability but not quite there yet. Enough
people are using xarray in production that breaking changes are made with
serious caution (and deprecation cycles whenever feasible).

The only major backwards-incompatible change planned is an overhaul of
indexing to use labeled broadcasting and alignment:
https://github.com/pydata/xarray/issues/974

There are a few other "nice to have" features for v1.0 but that's the only
one that has the potential to change functionality in a way that we can't
cleanly deprecate.

Post by Marmaduke Woodman
Is there significant overhead compared to plain ndarray?

Xarray is implemented in Python (not C), so it does have significant
overhead for every operation. Adding two arrays takes ~100 us, rather than
<1 us in NumPy. So you don't want to use it in your inner loop.

That said, the overhead is independent of the size of the array. So if you
work with large arrays, it is negligible.