[Numpy-discussion] How to find indices of values in an array (indirect in1d) ?

Discussion:

Nicolas P. Rougier

2015-12-30 14:45:40 UTC

I’m scratching my head around a small problem but I can’t find a vectorized solution.

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

[1,2,0]

# A[0] == 2 is in B and 2 == B[1] -> 1
# A[1] == 0 is in B and 0 == B[2] -> 2
# A[2] == 1 is in B and 1 == B[0] -> 0

Any idea ? I tried numpy.in1d with no luck.

Nicolas

Andy Ray Terrel

2015-12-30 15:02:00 UTC

Permalink

A = np.array([2,0,1,4])
B = np.array([1,2,0])
s = pd.Series(range(len(B)), index=B)
s[A].values

array([ 1., 2., 0., nan])

On Wed, Dec 30, 2015 at 8:45 AM, Nicolas P. Rougier <

Iâm scratching my head around a small problem but I canât find a
vectorized solution.
I have 2 arrays A and B and I would like to get the indices (relative to

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

Benjamin Root

2015-12-30 15:31:00 UTC

Permalink

Maybe use searchsorted()? I will note that I have needed to do something
like this once before, and I found that the list comprehension form of
calling .index() for each item was faster than jumping through hoops to
vectorize it using searchsorted (needing to sort and then map the sorted
indices to the original indices), and was certainly clearer, but that might
depend upon the problem size.

Cheers!
Ben Root

Post by Andy Ray Terrel

A = np.array([2,0,1,4])
B = np.array([1,2,0])
s = pd.Series(range(len(B)), index=B)
s[A].values

array([ 1., 2., 0., nan])
On Wed, Dec 30, 2015 at 8:45 AM, Nicolas P. Rougier <

Iâm scratching my head around a small problem but I canât find a
vectorized solution.
I have 2 arrays A and B and I would like to get the indices (relative to

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Nicolas P. Rougier

2015-12-30 16:12:40 UTC

Permalink

Thanks for the quick answers. I think I will go with the .index and list comprehension.
But if someone finds with a vectorised solution for the numpy 100 exercises...

Nicolas

Maybe use searchsorted()? I will note that I have needed to do something like this once before, and I found that the list comprehension form of calling .index() for each item was faster than jumping through hoops to vectorize it using searchsorted (needing to sort and then map the sorted indices to the original indices), and was certainly clearer, but that might depend upon the problem size.
Cheers!
Ben Root

A = np.array([2,0,1,4])
B = np.array([1,2,0])
s = pd.Series(range(len(B)), index=B)
s[A].values

array([ 1., 2., 0., nan])
I’m scratching my head around a small problem but I can’t find a vectorized solution.

A = np.array([2,0,1,4])
B = np.array([0,2,0])
print (some_function(A,B))

[1,2,0]
# A[0] == 2 is in B and 2 == B[1] -> 1
# A[1] == 0 is in B and 0 == B[2] -> 2
# A[2] == 1 is in B and 1 == B[0] -> 0
Any idea ? I tried numpy.in1d with no luck.
Nicolas
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

2015-12-30 16:47:44 UTC

Permalink

Post by Nicolas P. Rougier
Thanks for the quick answers. I think I will go with the .index and list comprehension.
But if someone finds with a vectorised solution for the numpy 100 exercises...

Yeah, I doubt you can get very pretty, though maybe there is some great
trick. This is one way:

In [67]: A = np.array([2,0,1,4])
In [68]: B = np.array([1,2,0])
In [69]: B_sorter = np.argsort(B)
In [70]: B_index = np.searchsorted(B, A, sorter=B_sorter)
In [71]: invalid = B[B_sorter].take(s, mode='clip') != A
In [72]: B_index[invalid] = -1 # mark invalids with -1
In [73]: B_index
Out[73]: array([ 2, 0, 1, -1])

Anyway, I guess the arrays would likely have to be quite large for this
to beat list comprehension. And maybe doing the searchsorted the other
way around could be faster, no idea.

- Sebastian

Post by Nicolas P. Rougier
Nicolas

Post by Benjamin Root
Maybe use searchsorted()? I will note that I have needed to do
something like this once before, and I found that the list
comprehension form of calling .index() for each item was faster
than jumping through hoops to vectorize it using searchsorted
(needing to sort and then map the sorted indices to the original
indices), and was certainly clearer, but that might depend upon the
problem size.
Cheers!
Ben Root
On Wed, Dec 30, 2015 at 10:02 AM, Andy Ray Terrel <

A = np.array([2,0,1,4])
B = np.array([1,2,0])
s = pd.Series(range(len(B)), index=B)
s[A].values

array([ 1., 2., 0., nan])
On Wed, Dec 30, 2015 at 8:45 AM, Nicolas P. Rougier <
Iâm scratching my head around a small problem but I canât find a
vectorized solution.
I have 2 arrays A and B and I would like to get the indices

A = np.array([2,0,1,4])
B = np.array([0,2,0])
print (some_function(A,B))

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Nicolas P. Rougier

2015-12-30 18:14:46 UTC

Permalink

Thanks, I will make some benchmark and post results.

Post by Sebastian Berg

Post by Nicolas P. Rougier
Thanks for the quick answers. I think I will go with the .index and list comprehension.
But if someone finds with a vectorised solution for the numpy 100 exercises...

Yeah, I doubt you can get very pretty, though maybe there is some great
In [67]: A = np.array([2,0,1,4])
In [68]: B = np.array([1,2,0])
In [69]: B_sorter = np.argsort(B)
In [70]: B_index = np.searchsorted(B, A, sorter=B_sorter)
In [71]: invalid = B[B_sorter].take(s, mode='clip') != A
In [72]: B_index[invalid] = -1 # mark invalids with -1
In [73]: B_index
Out[73]: array([ 2, 0, 1, -1])
Anyway, I guess the arrays would likely have to be quite large for this
to beat list comprehension. And maybe doing the searchsorted the other
way around could be faster, no idea.
- Sebastian

Post by Nicolas P. Rougier
Nicolas

A = np.array([2,0,1,4])
B = np.array([1,2,0])
s = pd.Series(range(len(B)), index=B)
s[A].values

array([ 1., 2., 0., nan])
On Wed, Dec 30, 2015 at 8:45 AM, Nicolas P. Rougier <
I’m scratching my head around a small problem but I can’t find a
vectorized solution.
I have 2 arrays A and B and I would like to get the indices

A = np.array([2,0,1,4])
B = np.array([0,2,0])
print (some_function(A,B))

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Mark Miller

2015-12-30 17:42:37 UTC

Permalink

I'm not 100% sure that I get the question, but does this help at all?

a = numpy.array([3,2,8,7])
b = numpy.array([1,3,2,4,5,7,6,8,9])
c = set(a) & set(b)
c #contains elements of a that are in b (and vice versa)

set([8, 2, 3, 7])

indices = numpy.where([x in c for x in b])[0]
indices #indices of b where the elements of a in b occur

array([1, 2, 5, 7], dtype=int64)

-Mark

On Wed, Dec 30, 2015 at 6:45 AM, Nicolas P. Rougier <

Iâm scratching my head around a small problem but I canât find a
vectorized solution.
I have 2 arrays A and B and I would like to get the indices (relative to

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

Nicolas P. Rougier

2015-12-30 18:17:16 UTC

Permalink

Yes, it is the expected result. Thanks.
Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?

Post by Mark Miller
I'm not 100% sure that I get the question, but does this help at all?

a = numpy.array([3,2,8,7])
b = numpy.array([1,3,2,4,5,7,6,8,9])
c = set(a) & set(b)
c #contains elements of a that are in b (and vice versa)

set([8, 2, 3, 7])

indices = numpy.where([x in c for x in b])[0]
indices #indices of b where the elements of a in b occur

array([1, 2, 5, 7], dtype=int64)
-Mark
I’m scratching my head around a small problem but I can’t find a vectorized solution.

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

Mark Miller

2015-12-30 18:40:15 UTC

Permalink

I was not familiar with the .in1d function. That's pretty handy.

Yes...it looks like numpy.where(numpy.in1d(b, a)) does what you need.

Post by Nicolas P. Rougier

Post by Mark Miller

numpy.where(numpy.in1d(b, a))

(array([1, 2, 5, 7], dtype=int64),)
It would be interesting to see the benchmarks.

On Wed, Dec 30, 2015 at 10:17 AM, Nicolas P. Rougier <

Post by Nicolas P. Rougier
Yes, it is the expected result. Thanks.
Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?

Post by Mark Miller
I'm not 100% sure that I get the question, but does this help at all?

a = numpy.array([3,2,8,7])
b = numpy.array([1,3,2,4,5,7,6,8,9])
c = set(a) & set(b)
c #contains elements of a that are in b (and vice versa)

set([8, 2, 3, 7])

indices = numpy.where([x in c for x in b])[0]
indices #indices of b where the elements of a in b occur

array([1, 2, 5, 7], dtype=int64)
-Mark
On Wed, Dec 30, 2015 at 6:45 AM, Nicolas P. Rougier <
Iâm scratching my head around a small problem but I canât find a

vectorized solution.

Post by Mark Miller
I have 2 arrays A and B and I would like to get the indices (relative to

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Nicolas P. Rougier

2015-12-30 18:51:02 UTC

Permalink

Unfortunately, this does not handle repeated entries in a.

Post by Mark Miller
I was not familiar with the .in1d function. That's pretty handy.
Yes...it looks like numpy.where(numpy.in1d(b, a)) does what you need.

Post by Mark Miller

Post by Mark Miller
numpy.where(numpy.in1d(b, a))

(array([1, 2, 5, 7], dtype=int64),)
It would be interesting to see the benchmarks.
Yes, it is the expected result. Thanks.
Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?

Post by Mark Miller
I'm not 100% sure that I get the question, but does this help at all?

Post by Mark Miller

a = numpy.array([3,2,8,7])
b = numpy.array([1,3,2,4,5,7,6,8,9])
c = set(a) & set(b)
c #contains elements of a that are in b (and vice versa)

set([8, 2, 3, 7])

Post by Mark Miller

indices = numpy.where([x in c for x in b])[0]
indices #indices of b where the elements of a in b occur

array([1, 2, 5, 7], dtype=int64)
-Mark
I’m scratching my head around a small problem but I can’t find a vectorized solution.

Post by Mark Miller

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Nicolas P. Rougier

2015-12-30 19:21:48 UTC

Permalink

In the end, I’ve only the list comprehension to work as expected

A = [0,0,1,3]
B = np.arange(8)
np.random.shuffle(B)
I = [list(B).index(item) for item in A if item in B]

But Mark's and Sebastian's methods do not seem to work...

Post by Nicolas P. Rougier
Unfortunately, this does not handle repeated entries in a.

Post by Mark Miller
I was not familiar with the .in1d function. That's pretty handy.
Yes...it looks like numpy.where(numpy.in1d(b, a)) does what you need.

Post by Mark Miller

Post by Mark Miller
numpy.where(numpy.in1d(b, a))

(array([1, 2, 5, 7], dtype=int64),)
It would be interesting to see the benchmarks.
Yes, it is the expected result. Thanks.
Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?

Post by Mark Miller
I'm not 100% sure that I get the question, but does this help at all?

Post by Mark Miller

a = numpy.array([3,2,8,7])
b = numpy.array([1,3,2,4,5,7,6,8,9])
c = set(a) & set(b)
c #contains elements of a that are in b (and vice versa)

set([8, 2, 3, 7])

Post by Mark Miller

indices = numpy.where([x in c for x in b])[0]
indices #indices of b where the elements of a in b occur

array([1, 2, 5, 7], dtype=int64)
-Mark
I’m scratching my head around a small problem but I can’t find a vectorized solution.

Post by Mark Miller

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

2015-12-30 20:13:39 UTC

Permalink

In the end, Iâve only the list comprehension to work as expected
A = [0,0,1,3]
B = np.arange(8)
np.random.shuffle(B)
I = [list(B).index(item) for item in A if item in B]
But Mark's and Sebastian's methods do not seem to work...

Yeah, sorry had a mind slip with the sorter since it returns the sorted
version. I think this should do the correct thing (throws away invalid
ones as default, though I think it is a bad idea in general).

def index(A, B, fill_invalid=None):
B_sorter = np.argsort(B)
B_sorted = B[B_sorter]
B_sorted_index = np.searchsorted(B_sorted, A)
# Go back into the original index:
B_index = B_sorter[B_sorted_index]

if fill_invalid is None:
valid = B.take(B_index, mode='clip') == A
return B_index[valid]
else:
invalid = B.take(B_index, mode='clip') != A

B_index[invalid] = fill_invalid
return B_index

On 30 Dec 2015, at 19:51, Nicolas P. Rougier <
Unfortunately, this does not handle repeated entries in a.

Post by Mark Miller
I was not familiar with the .in1d function. That's pretty handy.
Yes...it looks like numpy.where(numpy.in1d(b, a)) does what you need.

Post by Mark Miller
numpy.where(numpy.in1d(b, a))

(array([1, 2, 5, 7], dtype=int64),)
It would be interesting to see the benchmarks.
On Wed, Dec 30, 2015 at 10:17 AM, Nicolas P. Rougier <
Yes, it is the expected result. Thanks.
Maybe the set(a) & set(b) can be replaced by
np.where[np.in1d(a,b)], no ?

On 30 Dec 2015, at 18:42, Mark Miller <
I'm not 100% sure that I get the question, but does this help at all?

Post by Mark Miller

a = numpy.array([3,2,8,7])
b = numpy.array([1,3,2,4,5,7,6,8,9])
c = set(a) & set(b)
c #contains elements of a that are in b (and vice versa)

set([8, 2, 3, 7])

Post by Mark Miller

indices = numpy.where([x in c for x in b])[0]
indices #indices of b where the elements of a in b occur

array([1, 2, 5, 7], dtype=int64)
-Mark
On Wed, Dec 30, 2015 at 6:45 AM, Nicolas P. Rougier <
Iâm scratching my head around a small problem but I canât find
a vectorized solution.
I have 2 arrays A and B and I would like to get the indices

Post by Mark Miller

A = np.array([2,0,1,4])
B = np.array([1,2,0])
print (some_function(A,B))

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Peter Creasey

2015-12-30 20:08:51 UTC

Permalink

In the end, I?ve only the list comprehension to work as expected
A = [0,0,1,3]
B = np.arange(8)
np.random.shuffle(B)
I = [list(B).index(item) for item in A if item in B]
But Mark's and Sebastian's methods do not seem to work...

The function you want is also in the open source astronomy package
iccpy ( https://github.com/Lowingbn/iccpy ), which essentially does a
variant of Sebastian’s code (which I also couldn’t quite get working),
and handles a few things like old numpy versions (pre 1.4) and allows
you to specify if B is already sorted.

from iccpy.utils import match
print match(A,B)

[ 1 2 0 -1]

Peter