[Numpy-discussion] [Suggestion] Labelled Array

Seems like you are talking about xarray: https://github.com/pydata/xarray

Cheers!
Ben Root

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

Benjamin Root

2016-02-12 14:52:54 UTC

Re-reading your post, I see you are talking about something different. Not
exactly sure what your use-case is.

Ben Root

Post by Benjamin Root
Seems like you are talking about xarray: https://github.com/pydata/xarray
Cheers!
Ben Root

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

Lluís Vilanova

2016-02-15 21:28:12 UTC

Post by Benjamin Root
Seems like you are talking about xarray: https://github.com/pydata/xarray

Oh, I wasn't aware of xarray, but there's also this:

https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#basic-indexing
https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#dimension-oblivious-indexing

Cheers,
Lluis

Post by Benjamin Root
Cheers!
Ben Root
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

Paul Hobson

2016-02-15 22:31:12 UTC

Just for posterity -- any future readers to this thread who need to do
pandas-like on record arrays should look at matplotlib's mlab submodule.

I've been in situations (::cough:: Esri production ::cough::) where I've
had one hand tied behind my back and unable to install pandas. mlab was a
big help there.

https://goo.gl/M7Mi8B

-paul

Post by Benjamin Root
https://github.com/pydata/xarray
https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#basic-indexing
https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#dimension-oblivious-indexing
Cheers,
Lluis

Post by Benjamin Root
Cheers!
Ben Root
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array

Post by Benjamin Root
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

loop.

Post by Benjamin Root
Just wondering...
SÃ©rgio.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Benjamin Root

2016-02-19 18:44:16 UTC

matplotlib would be more than happy if numpy could take those functions off
our hands! They don't get nearly the correct visibility in matplotlib
because no one is expecting them to be in a plotting library, and they
don't have any useful unit-tests. None of us made them, so we are very
hesitant to update them because of that.

Cheers!
Ben Root

I also want to add a historical note here, that 'groupby' has been
discussed a couple times before.
Travis Oliphant even made an NEP for it, and Wes McKinney lightly hinted
at adding it to numpy.
http://thread.gmane.org/gmane.comp.python.numeric.general/37480/focus=37480
http://thread.gmane.org/gmane.comp.python.numeric.general/38272/focus=38299
http://docs.scipy.org/doc/numpy-1.10.1/neps/groupby_additions.html
Travis's idea for a ufunc method 'reduceby' is more along the lines of
what I was originally thinking. Just musing about it, it might cover few
small cases pandas groupby might not: It could work on arbitrary ufuncs,
and over particular axes of multidimensional data. Eg, to sum over
pixels from NxNx3 image data. But maybe pandas can cover the
multidimensional case through additional index columns or with Panel.

xarray is now covering that area.
There are also recfunctions in numpy.lib that never got a lot of attention
and expansion.
There were plans to cover more of the matplotlib versions in numpy, but I
have no idea and didn't check what happened to it..
Josef

Cheers,
Allan

Post by Paul Hobson
Just for posterity -- any future readers to this thread who need to do
pandas-like on record arrays should look at matplotlib's mlab submodule.
I've been in situations (::cough:: Esri production ::cough::) where I've
had one hand tied behind my back and unable to install pandas. mlab was
a big help there.
https://goo.gl/M7Mi8B
-paul

https://github.com/pydata/xarray
https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#basic-indexing
https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#dimension-oblivious-indexing

Post by Paul Hobson
Cheers,
Lluis

Post by Benjamin Root
Cheers!
Ben Root
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label

array to

Post by Benjamin Root
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

array([9, 12, 15])
The operations would create a new axis for label indexing.
You could think of it as a collection of masks, one for each

label.

Post by Benjamin Root
I don't know a way to make something like this efficiently

without a loop.

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Allan Haldane

2016-02-13 17:11:07 UTC

I've had a pretty similar idea for a new indexing function
'split_classes' which would help in your case, which essentially does

def split_classes(c, v):
return [v[c == u] for u in unique(c)]

Your example could be coded as

[sum(c) for c in split_classes(label, data)]

[9, 12, 15]

I feel I've come across the need for such a function often enough that
it might be generally useful to people as part of numpy. The
implementation of split_classes above has pretty poor performance
because it creates many temporary boolean arrays, so my plan for a PR
was to have a speedy version of it that uses a single pass through v.
(I often wanted to use this function on large datasets).

If anyone has any comments on the idea (good idea. bad idea?) I'd love
to hear.

I have some further notes and examples here:
https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21

Allan

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

Allan Haldane

2016-02-13 18:01:53 UTC

Sorry, to reply to myself here, but looking at it with fresh eyes maybe
the performance of the naive version isn't too bad. Here's a comparison
of the naive vs a better implementation:

def split_classes_naive(c, v):
return [v[c == u] for u in unique(c)]

def split_classes(c, v):
perm = c.argsort()
csrt = c[perm]
div = where(csrt[1:] != csrt[:-1])[0] + 1
return [v[x] for x in split(perm, div)]

c = randint(0,32,size=100000)
v = arange(100000)
%timeit split_classes_naive(c,v)

100 loops, best of 3: 8.4 ms per loop

%timeit split_classes(c,v)

100 loops, best of 3: 4.79 ms per loop

In any case, maybe it is useful to Sergio or others.

Allan

Post by Allan Haldane
I've had a pretty similar idea for a new indexing function
'split_classes' which would help in your case, which essentially does
return [v[c == u] for u in unique(c)]
Your example could be coded as

[sum(c) for c in split_classes(label, data)]

[9, 12, 15]
I feel I've come across the need for such a function often enough that
it might be generally useful to people as part of numpy. The
implementation of split_classes above has pretty poor performance
because it creates many temporary boolean arrays, so my plan for a PR
was to have a speedy version of it that uses a single pass through v.
(I often wanted to use this function on large datasets).
If anyone has any comments on the idea (good idea. bad idea?) I'd love
to hear.
https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21
Allan

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

j***@gmail.com

2016-02-13 18:29:44 UTC

Post by Allan Haldane
Sorry, to reply to myself here, but looking at it with fresh eyes maybe
the performance of the naive version isn't too bad. Here's a comparison of
return [v[c == u] for u in unique(c)]
perm = c.argsort()
csrt = c[perm]
div = where(csrt[1:] != csrt[:-1])[0] + 1
return [v[x] for x in split(perm, div)]

c = randint(0,32,size=100000)
v = arange(100000)
%timeit split_classes_naive(c,v)

100 loops, best of 3: 8.4 ms per loop

%timeit split_classes(c,v)

100 loops, best of 3: 4.79 ms per loop

The usecases I recently started to target for similar things is 1 Million
or more rows and 10000 uniques in the labels.
The second version should be faster for large number of uniques, I guess.

Overall numpy is falling far behind pandas in terms of simple groupby
operations. bincount and histogram (IIRC) worked for some cases but are
rather limited.

reduce_at looks nice for cases where it applies.

In contrast to the full sized labels in the original post, I only know of
applications where the labels are 1-D corresponding to rows or columns.

Josef

Post by Allan Haldane
In any case, maybe it is useful to Sergio or others.
Allan

[sum(c) for c in split_classes(label, data)]

[9, 12, 15]
I feel I've come across the need for such a function often enough that
it might be generally useful to people as part of numpy. The
implementation of split_classes above has pretty poor performance
because it creates many temporary boolean arrays, so my plan for a PR
was to have a speedy version of it that uses a single pass through v.
(I often wanted to use this function on large datasets).
If anyone has any comments on the idea (good idea. bad idea?) I'd love
to hear.
https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21
Allan

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Jeff Reback

2016-02-13 18:39:34 UTC

In [10]: pd.options.display.max_rows=10

In [13]: np.random.seed(1234)

In [14]: c = np.random.randint(0,32,size=100000)

In [15]: v = np.arange(100000)

In [16]: df = DataFrame({'v' : v, 'c' : c})

In [17]: df
Out[17]:
c v
0 15 0
1 19 1
2 6 2
3 21 3
4 12 4
... .. ...
99995 7 99995
99996 2 99996
99997 27 99997
99998 28 99998
99999 7 99999

[100000 rows x 2 columns]

In [19]: df.groupby('c').count()
Out[19]:
v
c
0 3136
1 3229
2 3093
3 3121
4 3041
.. ...
27 3128
28 3063
29 3147
30 3073
31 3090

[32 rows x 1 columns]

In [20]: %timeit df.groupby('c').count()
100 loops, best of 3: 2 ms per loop

In [21]: %timeit df.groupby('c').mean()
100 loops, best of 3: 2.39 ms per loop

In [22]: df.groupby('c').mean()
Out[22]:
v
c
0 49883.384885
1 50233.692165
2 48634.116069
3 50811.743992
4 50505.368629
.. ...
27 49715.349425
28 50363.501469
29 50485.395933
30 50190.155223
31 50691.041748

[32 rows x 1 columns]

Post by j***@gmail.com

c = randint(0,32,size=100000)
v = arange(100000)
%timeit split_classes_naive(c,v)

100 loops, best of 3: 8.4 ms per loop

%timeit split_classes(c,v)

100 loops, best of 3: 4.79 ms per loop

The usecases I recently started to target for similar things is 1 Million
or more rows and 10000 uniques in the labels.
The second version should be faster for large number of uniques, I guess.
Overall numpy is falling far behind pandas in terms of simple groupby
operations. bincount and histogram (IIRC) worked for some cases but are
rather limited.
reduce_at looks nice for cases where it applies.
In contrast to the full sized labels in the original post, I only know of
applications where the labels are 1-D corresponding to rows or columns.
Josef

Post by Allan Haldane
In any case, maybe it is useful to Sergio or others.
Allan

[sum(c) for c in split_classes(label, data)]

[9, 12, 15]
I feel I've come across the need for such a function often enough that
it might be generally useful to people as part of numpy. The
implementation of split_classes above has pretty poor performance
because it creates many temporary boolean arrays, so my plan for a PR
was to have a speedy version of it that uses a single pass through v.
(I often wanted to use this function on large datasets).
If anyone has any comments on the idea (good idea. bad idea?) I'd love
to hear.
https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21
Allan

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Jeff Reback

2016-02-13 18:42:20 UTC

These operations get slower as the number of groups increase, but with a
faster function (e.g. the standard ones which are cythonized), the constant
on
the increase is pretty low.

In [23]: c = np.random.randint(0,10000,size=100000)

In [24]: df = DataFrame({'v' : v, 'c' : c})

In [25]: %timeit df.groupby('c').count()
100 loops, best of 3: 3.18 ms per loop

In [26]: len(df.groupby('c').count())
Out[26]: 10000

In [27]: df.groupby('c').count()
Out[27]:
v
c
0 9
1 11
2 7
3 8
4 16
... ..
9995 11
9996 13
9997 13
9998 7
9999 10

[10000 rows x 1 columns]

Post by Jeff Reback
In [10]: pd.options.display.max_rows=10
In [13]: np.random.seed(1234)
In [14]: c = np.random.randint(0,32,size=100000)
In [15]: v = np.arange(100000)
In [16]: df = DataFrame({'v' : v, 'c' : c})
In [17]: df
c v
0 15 0
1 19 1
2 6 2
3 21 3
4 12 4
... .. ...
99995 7 99995
99996 2 99996
99997 27 99997
99998 28 99998
99999 7 99999
[100000 rows x 2 columns]
In [19]: df.groupby('c').count()
v
c
0 3136
1 3229
2 3093
3 3121
4 3041
.. ...
27 3128
28 3063
29 3147
30 3073
31 3090
[32 rows x 1 columns]
In [20]: %timeit df.groupby('c').count()
100 loops, best of 3: 2 ms per loop
In [21]: %timeit df.groupby('c').mean()
100 loops, best of 3: 2.39 ms per loop
In [22]: df.groupby('c').mean()
v
c
0 49883.384885
1 50233.692165
2 48634.116069
3 50811.743992
4 50505.368629
.. ...
27 49715.349425
28 50363.501469
29 50485.395933
30 50190.155223
31 50691.041748
[32 rows x 1 columns]

Post by j***@gmail.com

c = randint(0,32,size=100000)
v = arange(100000)
%timeit split_classes_naive(c,v)

100 loops, best of 3: 8.4 ms per loop

%timeit split_classes(c,v)

100 loops, best of 3: 4.79 ms per loop

The usecases I recently started to target for similar things is 1 Million
or more rows and 10000 uniques in the labels.
The second version should be faster for large number of uniques, I guess.
Overall numpy is falling far behind pandas in terms of simple groupby
operations. bincount and histogram (IIRC) worked for some cases but are
rather limited.
reduce_at looks nice for cases where it applies.
In contrast to the full sized labels in the original post, I only know of
applications where the labels are 1-D corresponding to rows or columns.
Josef

Post by Allan Haldane
In any case, maybe it is useful to Sergio or others.
Allan

[sum(c) for c in split_classes(label, data)]

[9, 12, 15]
I feel I've come across the need for such a function often enough that
it might be generally useful to people as part of numpy. The
implementation of split_classes above has pretty poor performance
because it creates many temporary boolean arrays, so my plan for a PR
was to have a speedy version of it that uses a single pass through v.
(I often wanted to use this function on large datasets).
If anyone has any comments on the idea (good idea. bad idea?) I'd love
to hear.
https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21
Allan

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

j***@gmail.com

2016-02-13 18:51:44 UTC

Post by Jeff Reback
These operations get slower as the number of groups increase, but with a
faster function (e.g. the standard ones which are cythonized), the
constant on
the increase is pretty low.
In [23]: c = np.random.randint(0,10000,size=100000)
In [24]: df = DataFrame({'v' : v, 'c' : c})
In [25]: %timeit df.groupby('c').count()
100 loops, best of 3: 3.18 ms per loop
In [26]: len(df.groupby('c').count())
Out[26]: 10000
In [27]: df.groupby('c').count()
v
c
0 9
1 11
2 7
3 8
4 16
... ..
9995 11
9996 13
9997 13
9998 7
9999 10
[10000 rows x 1 columns]

One other difference across usecases is whether this is a single operation,
or we want to optimize the data format for a large number of different
calculations. (We have both cases in statsmodels.)

In the latter case it's worth spending some extra computational effort on
rearranging the data to be either sorted or in lists of arrays, (I guess
without having done any timings).

Josef

Post by j***@gmail.com

c = randint(0,32,size=100000)
v = arange(100000)
%timeit split_classes_naive(c,v)

100 loops, best of 3: 8.4 ms per loop

%timeit split_classes(c,v)

100 loops, best of 3: 4.79 ms per loop

The usecases I recently started to target for similar things is 1
Million or more rows and 10000 uniques in the labels.
The second version should be faster for large number of uniques, I guess.
Overall numpy is falling far behind pandas in terms of simple groupby
operations. bincount and histogram (IIRC) worked for some cases but are
rather limited.
reduce_at looks nice for cases where it applies.
In contrast to the full sized labels in the original post, I only know
of applications where the labels are 1-D corresponding to rows or columns.
Josef

Post by Allan Haldane
In any case, maybe it is useful to Sergio or others.
Allan

[sum(c) for c in split_classes(label, data)]

[9, 12, 15]
I feel I've come across the need for such a function often enough that
it might be generally useful to people as part of numpy. The
implementation of split_classes above has pretty poor performance
because it creates many temporary boolean arrays, so my plan for a PR
was to have a speedy version of it that uses a single pass through v.
(I often wanted to use this function on large datasets).
If anyone has any comments on the idea (good idea. bad idea?) I'd love
to hear.
https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21
Allan

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Allan Haldane

2016-02-14 03:41:13 UTC

Impressive!

Possibly there's still a case for including a 'groupby' function in
numpy itself since it's a generally useful operation, but I do see less
of a need given the nice pandas functionality.

At least, next time someone asks a stackoverflow question like the ones
below someone should tell them to use pandas!

(copied from my gist for future list reference).

http://stackoverflow.com/questions/4373631/sum-array-by-number-in-numpy
http://stackoverflow.com/questions/31483912/split-numpy-array-according-to-values-in-the-array-a-condition/31484134#31484134
http://stackoverflow.com/questions/31863083/python-split-numpy-array-based-on-values-in-the-array
http://stackoverflow.com/questions/28599405/splitting-an-array-into-two-smaller-arrays-in-python
http://stackoverflow.com/questions/7662458/how-to-split-an-array-according-to-a-condition-in-numpy

Allan

c = randint(0,32,size=100000)
v = arange(100000)
%timeit split_classes_naive(c,v)

100 loops, best of 3: 8.4 ms per loop

%timeit split_classes(c,v)

100 loops, best of 3: 4.79 ms per loop
The usecases I recently started to target for similar things is 1
Million or more rows and 10000 uniques in the labels.
The second version should be faster for large number of uniques, I guess.
Overall numpy is falling far behind pandas in terms of simple
groupby operations. bincount and histogram (IIRC) worked for some
cases but are rather limited.
reduce_at looks nice for cases where it applies.
In contrast to the full sized labels in the original post, I only
know of applications where the labels are 1-D corresponding to rows
or columns.
Josef
In any case, maybe it is useful to Sergio or others.
Allan
I've had a pretty similar idea for a new indexing function
'split_classes' which would help in your case, which
essentially does
return [v[c == u] for u in unique(c)]
Your example could be coded as

[sum(c) for c in split_classes(label, data)]

[9, 12, 15]
I feel I've come across the need for such a function often
enough that
it might be generally useful to people as part of numpy. The
implementation of split_classes above has pretty poor performance
because it creates many temporary boolean arrays, so my plan
for a PR
was to have a speedy version of it that uses a single pass
through v.
(I often wanted to use this function on large datasets).
If anyone has any comments on the idea (good idea. bad
idea?) I'd love
to hear.
https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21
Allan
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a
label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

array([9, 12, 15])
The operations would create a new axis for label indexing.
You could think of it as a collection of masks, one for
each label.
I don't know a way to make something like this
efficiently without a
loop. Just wondering...
Sérgio.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Nathaniel Smith

2016-02-13 18:16:19 UTC

I believe this is basically a groupby, which is one of pandas's core
competencies... even if numpy were to add some utilities for this kind of
thing, then I doubt we'd do as well as them, so you might check whether
pandas works for you first :-)

Post by SÃ©rgio
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.

labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])

sum(x)

Sérgio

2016-02-16 14:05:51 UTC

image

array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],

[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],

[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])

label

array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7]])