Numpy array masking(Python) [duplicate] - python

This question already has answers here:
check for identical rows in different numpy arrays
(7 answers)
Closed 19 days ago.
I would like to ask a question with numpy array masking.
For instance given the array below:
a b
1 2
3 4
5 6
6 5
I have another array which is
a b
1 2
3 4
I want to compare two arrays and find the index numbers of second array in the first array.
For instance, the solution should be index=[0,1]
I have tried with
np.where np.where(~(np.abs(a - b[:,None]).sum(-1)==0).any(0))
but does not give me the final result
thanks for suggestions!

A possible solution, based on Broadcasting, where ar1 and ar2 are the two arrays, respectively:
np.nonzero(np.any(np.all(ar1 == ar2[:,None], axis=2), axis=0))[0]
Output:
array([0, 1])

a = np.array([[1,2],[3,4],[5,6],[6,5]])
b = np.array([[1,2],[3,4]])
np.where(np.all(a == b[:,None], axis=2))[1] # np.array([0,1])

Related

Vector from other two vectors pandas python [duplicate]

This question already has answers here:
Creating an element-wise minimum Series from two other Series in Python Pandas
(10 answers)
Closed 11 months ago.
I can't find solution for my problem. I have two vectors pandas.Series type T = [a1, a2, a3,....,an] M = [b1, b2, b3,...bn] I need to create new vector in which every element should be the minimum between two elements in the given vector. It should looks like new_vector = [min(a1, b1), min(a2, b2), ....min(an, bn)
Is this possible with the functions in pandas?
Yes, can use pandas min() function to return lowest value per element position
Minimum Value Comparing Each Element of Pandas Series
import pandas as pd
T = pd.Series([6,7,4,1,4,1,6,8,0])
M = pd.Series([5,3,8,1,3,7,1,7,1])
new_vector = pd.DataFrame([T,M]).min()
print(new_vector)
Results:
idx minValue
0 5
1 3
2 4
3 1
4 3
5 1
6 1
7 7
8 0

How to slice numpy array starting from x-n elements? [duplicate]

This question already has answers here:
Numpy slicing from variable
(2 answers)
Closed 1 year ago.
If i have a numpy array:
arr = np.array([1,2,3,4,5,6,7,8,9,10])
x = 3 # index
n = 5
m = 2
Is there a way to get an output like this?
output: np.array([1,2,3,4,5,6])
We start at 4 which is index x=3. The output consists of n=5 elements before said index, but does not wrap around (doesn't go beyond the 1 in this case). And also consists of m=2 elements after said index.
Thank you.
You can use this:
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10])
x = 3
n = 5
m = 2
arr[max(0, x-n):x+m+1]
# array([1, 2, 3, 4, 5, 6])

Python three numpy arrays, combine columns [duplicate]

This question already has answers here:
Stacking arrays in numpy
(2 answers)
Closed 2 years ago.
I have three arrays:
a = array([1,2,3,4])
b = array([5,6,7,8])
c = array([9,10,11,12])
I would like a single array:
result = array([1,5,9],
[2,6,10],
[3,7,11],
[4,8,12])
i.e. take the first column of every array and make it as the first row and so on.
I know it might sound trivial, but have been scratching my head.
Use the numpy module:
import numpy as np
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
c = np.array([9,10,11,12])
result = np.stack((a,b,c), axis = 1) # axis = 1 transposes the stacked matrix
print(result)
The code above gives the following output:
[[ 1 5 9]
[ 2 6 10]
[ 3 7 11]
[ 4 8 12]]
Which is what you wanted.

Reverse sort of matrix using numpy based on a specific column [duplicate]

This question already has answers here:
Is it possible to use argsort in descending order?
(10 answers)
Closed 4 years ago.
import numpy as np
mat = np.array([[1,21,3],[5,4,2],[56,12,4]])
mat_sort = mat[mat[:,2].argsort()]
print(mat_sort)
Output:
[[ 5 4 2]
[56 12 4]
[ 1 21 3]]
If I wish to get the reverse sorting based on any column, say 3rd, what changes do i make to the code? Meaning, I wish to get:
[[56 12 4]
[ 1 21 3]
[ 5 4 2]]
P.s Yes I understand this is an easy question but I couldn't find an answer that I understood and was based for matrix and not an array or vector. TIA :)
Just reverse the argsort indices:
mat_sort = mat[mat[:, 2].argsort()[::-1]]
print(mat_sort[::-1]) #just print in reverse

Mapping rows of a Pandas dataframe to numpy array

Sorry, I know there are so many questions relating to indexing, and it's probably starring me in the face, but I'm having a little trouble with this. I am familiar with .loc, .iloc, and .index methods and slicing in general. The method .reset_index may not have been (and may not be able to be) called on our dataframe and therefore index lables may not be in order. The dataframe and numpy array(s) are actually different length subsets of the dataframe, but for this example I'll keep them the same size (I can handle offsetting once I have an example).
Here is a picture that show's what I'm looking for:
I can pull cols of rows from the dataframe based on some search criteria.
idxlbls = df.index[df['timestamp'] == dt]
stuff = df.loc[idxlbls, 'col3':'col5']
But how do I map that to row number (array indices, not label indices) to be used as an array index in numpy (assuming same row length)?
stuffprime = array[?, ?]
The reason I need it is because the dataframe is much larger and more complete and contains the column searching criteria, but the numpy arrays are subsets that have been extracted and modified prior in the pipeline (and do not have the same searching criteria in them). I need to search the dataframe and pull the equivalent data from the numpy arrays. Basically I need to correlate specific rows from a dataframe to the corresponding rows of a numpy array.
I would map pandas indices to numpy indicies:
keys_dict = dict(zip(idxlbls, range(len(idxlbls))))
Then you may use the dictionary keys_dict to address the array elements by a pandas index: array[keys_dict[some_df_index], :]
I believe need get_indexer for positions by filtered columns names, for index is possible use same way or numpy.where for positions by boolean mask:
df = pd.DataFrame({'timestamp':list('abadef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4]}, index=list('ABCDEF'))
print (df)
timestamp B C D E
A a 4 7 1 5
B b 5 8 3 3
C a 4 9 5 6
D d 5 4 7 9
E e 5 2 1 2
F f 4 3 0 4
idxlbls = df.index[df['timestamp'] == 'a']
stuff = df.loc[idxlbls, 'C':'E']
print (stuff)
C D E
A 7 1 5
C 9 5 6
a = df.index.get_indexer(stuff.index)
Or get positions by boolean mask:
a = np.where(df['timestamp'] == 'a')[0]
print (a)
[0 2]
b = df.columns.get_indexer(stuff.columns)
print (b)
[2 3 4]

Categories