Compare and store elements of multidimensional array to two new arrays - python

Assume I have the following simple array:
my_array = np.array([[1,2],[2,4],[3,6],[2,1]])
which corresponds to another parent array:
parent_array = np.array([0,1,2,3])
Of course, there is a function that maps parent_array to np.array but it is not important what function this is.
Goal:
I want to use this my_array so as to create two new arrays A and B by iterating each row of my_array: for row i if the value of the first column of my_array[i] is greater than the value of the second column I will store parent_array[i] in A . Otherwise I will store parent_array[i] in B (if the value of the second column in my_array[i] if bigger).
So for the case above the result would be:
A = [3]
because only in the 4-th value of my_array the first column has greater value and
B = [0,1,2]
because the in the first three rows the second column has greater value.
Now, although I know how to save the greater element in a row of columns to a new array, the fact that each row in my_array is associated with a row in parent_array is confusing me. I don't know how to correlate them.
Summary:
I need therefore to associate each row of parent_array to each row of my_array and then if check row by row the latter and if the value of the first column is greater in my_array[i] I save parent_row[i] in A while if the second column is greater in my_array[i] I save parent_row[i] in B.

Use boolean array indexing for this: create boolean condition array by comparing values from 1st and 2nd column of my_array and then use it to select values from parent_array:
cond = my_array[:,0] > my_array[:,1]
A, B = parent_array[cond], parent_array[~cond]
A
# [3]
B
# [0 1 2]

Related

Python: grab subset of array where value of first column is equal to specific value

I have a an array like the following
[[1,20,25],[1,45,16],[2,67,81],[3,1,1],[3,23,22]]
I want to create a new array from the first array above but taking only the rows where the value of the first column is 1. How can I loop through the entire array checking if the first column of each row is 1 and then adding that row to a new array so that it will look like the following:
[[1,20,25],[1,45,16]]
Another, not so fancy, way would be this:
arr = [[1,20,25],[1,45,16],[2,67,81],[3,1,1],[3,23,22]]
new_arr = []
for sub_arr in arr:
# check if the first element in sub_arr is 1
if sub_arr[0] == 1:
# if so append it to the new array
new_arr.append(sub_arr)
print(new_arr)

Sort np array based on summed selected values of each row

I have a 2D numpy array, filled with floats.
I want to take a selected chunk of each row (say item 2nd to 3rd), sum these values and sort all the rows based on that sum in a descending order.
For example:
array([[0.80372444, 0.35468653, 0.9081662 , 0.69995566],
[0.53712474, 0.90619077, 0.69068265, 0.73794143],
[0.14056974, 0.34685164, 0.87505744, 0.56927803]])
Here's what I tried:
a = np.array(sorted(a, key = sum))
But that just sums all values from each row, rather that, say, only 2nd to 6th element.
You can start by using take to get elements at indices [1,2] from each row (axis = 1). Then sum across those element for each row (again axis = 1), and use argsort to get the order of the sums. This gives a set of row indices, which you can use to slice the array in the desired order.
import numpy as np
a = np.array([[0.80372444, 0.35468653, 0.9081662 , 0.69995566],
[0.53712474, 0.90619077, 0.69068265, 0.73794143],
[0.14056974, 0.34685164, 0.87505744, 0.56927803]])
a[a.take([1, 2], axis=1).sum(axis=1).argsort()]
# returns:
array([[0.14056974, 0.34685164, 0.87505744, 0.56927803],
[0.80372444, 0.35468653, 0.9081662 , 0.69995566],
[0.53712474, 0.90619077, 0.69068265, 0.73794143]])
Replace key with the function you actually want:
a = np.array(sorted(d, key = lambda v : sum(v[1:3])))

NumPy - Finding and printing non-zero elements in each column of a n-d array

Suppose I have the following Numpy nd array:
array([[['a',0,0,0],
[0,'b','c',0],
['e','d',0,0]]])
Now I would like to define 'double connections' of elements as follows:
We consider each column in this array as a time instant, and all elements in this instant are considered to happen at the same time. 0 means nothing happens. For example, a and e happens at the first time instant, b and d happens at the second time instant, and c itself happens in the third time instant.
If two elements, I believe it has 'double connections', and I would like to print the connections like this(if there is no such pair in one column, just move on to the next column until the end):
('a','e')
('e','a')
('b','d')
('d','b')
I tried to come up with solutions on iterating all the columns but did not work.Can anyone share some tips on this?
You can recreate the original array by the following commands
array = np.array([['a',0,0,0],
[0,'b','c',0],
['e','d',0,0],dtype=object)
You could count how many non-zero elements you have for each column. You select the columns with two non-zero elements, repeat them and inverse every second column:
pairs = np.repeat(array[(array[:, (array != 0).sum(axis=0) == 2]).nonzero()].reshape((2, -1)).T, 2, axis=0)
pairs[1::2] = pairs[1::2, ::-1]
If you want to convert these to tuples like in your desired output you could just do a list comprehension:
output = [tuple(pair) for pair in pairs]

Array reclassification with numpy

I have a large (50000 x 50000) 64-bit integer NumPy array containing 10-digit numbers. There are about 250,000 unique numbers in the array.
I have a second reclassification table which maps each unique value from the first array to an integer between 1 and 100. My hope would be to reclassify the values from the first array to the corresponding values in the second.
I've tried two methods of doing this, and while they work, they are quite slow. In both methods I create a blank (zeros) array of the same dimensions.
new_array = np.zeros(old_array.shape)
First method:
for old_value, new_value in lookup_array:
new_array[old_array == old_value] = new_value
Second method, where lookup_array is in a pandas dataframe with the headings "Old" and "New:
for new_value, old_values in lookup_table.groupby("New"):
new_array[np.in1d(old_array, old_values)] = new_value
Is there a faster way to reclassify values
Store the lookup table as a 250,000 element array where for each index you have the mapped value. For example, if you have something like:
lookups = [(old_value_1, new_value_1), (old_value_2, new_value_2), ...]
Then you can do:
idx, val = np.asarray(lookups).T
lookup_array = np.zeros(idx.max() + 1)
lookup_array[idx] = val
When you get that, you can get your transformed array simply as:
new_array = lookup_array[old_array]

Python Numpy: Coalesce and return first nonzero observation

I am currently new to NumPy, but very proficient with SQL.
I used a function called coalesce in SQL, which I was disappointed not to find in NumPy. I need this function to create a third array want from 2 arrays i.e. array1 and array2, where zero/ missing observations in array1 are replaced by observations in array2 under the same address/Location. I can't figure out how to use np.where?
Once this task is accomplished, I would like to take the lower diagonal of this array want and then populate a final array want2 noting the first non-zero observation. If all observations i.e. coalesce(array1, array2) returns missing or 0 in want2, then assign by default zero.
I have written an example demonstrating the desired behavior.
import numpy as np
array1= np.array(([-10,0,20],[-1,0,0],[0,34,-50]))
array2= np.array(([10,10,50],[10,0,25],[50,45,0]))
# Coalesce array1,array2 i.e. get the first non-zero value from array1, then from array2.
# if array1 is empty or zero, then populate table want with values from array2 under same address
want=np.tril(np.array(([-10,10,20],[-1,0,25],[50,34,-50])))
print(array1)
print(array2)
print(want)
# print first instance of nonzero observation from each column of table want
want2=np.array([-10,34,-50])
print(want2)
"Coalesce": use putmask to replace values equal to zero with values from array2:
want = array1.copy()
np.putmask(array1.copy(), array1==0, array2)
First nonzero element of each column of np.tril(want):
where_nonzero = np.where(np.tril(want) != 0)
"""For the where array, get the indices of only
the first index for each column"""
first_indices = np.unique(where_nonzero[1], return_index=True)[1]
# Get the values from want for those indices
want2 = want[(where_nonzero[0][first_indices], where_nonzero[1][first_indices])]

Categories