is there a way to Binary Tree search an unsorted matrix? If yes, could you explain it as I am new to programming? I have tried implementing it using nested for i for j for loops but was wondering if there is a faster way.
import numpy as np
matrix = [[3, 6, 7], [9, 1, 2], [8, 4, 5]]
matrix = np.array(matrix)
matrix
array([[3, 6, 7],
[9, 1, 2],
[8, 4, 5]]) # how does one perform a binary tree search on an unsorted matrix?
You can use np.where for this.
matrix = np.array([[3,6,7],[9,1,2],[8, 8, 8]])
dim_1, dim_2 = np.where(matrix == 8)
#dim_1 = array([2, 2, 2], dtype=int64)
#dim_2 = array([0, 1, 2], dtype=int64)
#dim_1, dim_2, dim_3 = np.where(matrix == 8) if matrix had shape (3, )
num_8 = len(ret[0]) #total number of 8's
np.where returns a tuple of arrays separated by indexes based on the shape of your array. If you have a 3D array you will get 3 arrays in your tuple.
ret = (array([2, 2, 2], dtype=int64), array([0, 1, 2], dtype=int64))
ret[0] corresponds to the row values, and ret[1] corresponds to the column values.
So this means that the element 8 is present in matrix[2][0], matrix[2][1], matrix[2][2]
Does that help? You won't have to write your own routine for this. Pretty sure this will be faster than any search routine you will implement in pure python because NumPy built-in functions are highly optimized. You should consider using NumPy methods for NumPy arrays wherever possible.
Related
If I have a set of indices stored in two Numpy arrays, my goal is to slice a given input array based on corresponding indices in those index arrays. For eg.
index_arr1 = np.asarray([2,3,4])
index_arr2 = np.asarray([5,5,6])
input_arr = np.asarray([1,2,3,4,4,5,7,2])
The output to my code should be [[3,4,4],[4,4],[4,5]] which is basically [input_arr[2:5], input_arr[3:5], input_arr[4:6]]
Can anybody suggest a way to solve this problem using numpy functions and avoiding any for loops to be as efficient as possible.
Do you mean:
[input_arr[x:y] for x,y in zip(index_arr1, index_arr2)]
Output:
[array([3, 4, 4]), array([4, 4]), array([4, 5])]
Or if you really want list of lists:
[[input_arr[x:y].tolist() for x,y in zip(index_arr1, index_arr2)]
Output:
[[3, 4, 4], [4, 4], [4, 5]]
I am trying to find an efficient way to multiply specific values within a matrix for a given scalar. Let's see a quick example.
Given a matrix M of values between 1 and 10 like so:
I want to multiply every cell that has value smaller than 3, by 2. Now I know I can find the coordinates of all items that have value 1 in tensorflow with tf.where(M < 3) but I am struggling to find a good scalable solution to attain what I want. The transformation should be something like this:
How can I leverage this info to multiply only the cells at the given coordinates by 2 ?
Keep in mind that my matrices might be mich bigger than 3x3
M = np.array([[1, 5, 8],[2, 2, 2], [9, 7, 6]])
M[M==1] = 2
print(M)
array([[2, 5, 8],
[2, 2, 2],
[9, 7, 6]])
I found out how to do this in tensorflow without having to do any transformation from numpy to tensorflow and vice-versa.
my_matrix = tf.constant([[1, 5, 8], [2, 2, 2], [9, 7, 6]])
result = tf.where(
tf.less(my_matrix, tf.constant(3)),
tf.scalar_mul(2, my_matrix),
my_matrix
)
#Josh answer helped me look into the right direction
I have a numpy array . I want rescale the elements in the array, so that the smallest number in array is represented by 1, and largest number in array is represented by the number of unique elements in array.
For example
A=[ [2,8,8],[3,4,5] ]
would become
[ [1,5,5],[2,3,4] ]
Use np.unique with its return_inverse param -
np.unique(A, return_inverse=1)[1].reshape(A.shape)+1
Sample run -
In [10]: A
Out[10]:
array([[2, 8, 8],
[3, 4, 5]])
In [11]: np.unique(A, return_inverse=1)[1].reshape(A.shape)+1
Out[11]:
array([[1, 5, 5],
[2, 3, 4]])
If you're not opposed to using scipy, you could use rankdata, with method='dense' (judging by the tags on your question):
from scipy.stats import rankdata
rankdata(A, 'dense').reshape(A.shape)
array([[1, 5, 5],
[2, 3, 4]])
Note that in your case, method='min' would achieve the same results, see linked documentation for more details
I have a 2d and 1d array. I am looking to find the two rows that contain at least once the values from the 1d array as follows:
import numpy as np
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
results = []
for elem in B:
results.append(np.where(A==elem)[0])
This works and results in the following array:
[array([0, 5], dtype=int64),
array([0, 3], dtype=int64),
array([2, 4], dtype=int64),
array([0, 2], dtype=int64)]
But this is probably not the best way of proceeding. Following the answers given in this question (Search Numpy array with multiple values) I tried the following solutions:
out1 = np.where(np.in1d(A, B))
num_arr = np.sort(B)
idx = np.searchsorted(B, A)
idx[idx==len(num_arr)] = 0
out2 = A[A == num_arr[idx]]
But these give me incorrect values:
In [36]: out1
Out[36]: (array([ 0, 1, 2, 6, 8, 9, 13, 17], dtype=int64),)
In [37]: out2
Out[37]: array([0, 3, 1, 2, 3, 1, 2, 0])
Thanks for your help
If you need to know whether each row of A contains ANY element of array B without interest in which particular element of B it is, the following script can be used:
input:
np.isin(A,B).sum(axis=1)>0
output:
array([ True, False, True, True, True, True])
Since you're dealing with a 2D array* you can use broadcasting to compare B with raveled version of A. This will give you the respective indices in a raveled shape. Then you can reverse the result and get the corresponding indices in original array using np.unravel_index.
In [50]: d = np.where(B[:, None] == A.ravel())[1]
In [51]: np.unravel_index(d, A.shape)
Out[51]: (array([0, 5, 0, 3, 2, 4, 0, 2]), array([0, 2, 2, 0, 0, 1, 1, 2]))
^
# expected result
* From documentation: For 3-dimensional arrays this is certainly efficient in terms of lines of code, and, for small data sets, it can also be computationally efficient. For large data sets, however, the creation of the large 3-d array may result in sluggish performance.
Also, Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm. In these cases, it is better to write the algorithm's outer loop in Python. This may also produce more readable code, as algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast increases.
Is something like this what you are looking for?
import numpy as np
from itertools import combinations
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
for i in combinations(A, 2):
if np.all(np.isin(B, np.hstack(i))):
print(i[0], ' ', i[1])
which prints the following:
[0 3 1] [2 7 3]
[0 3 1] [6 2 7]
note: this solution does NOT require the rows be consecutive. Please let me know if that is required.
So, I have been browsing stackoverflow for quite some time now, but I can't seem to find the solution for my problem
Consider this
import numpy as np
coo = np.array([[1, 2], [2, 3], [3, 4], [3, 4], [1, 2], [5, 6], [1, 2]])
values = np.array([1, 2, 4, 2, 1, 6, 1])
The coo array contains the (x, y) coordinate positions
x = (1, 2, 3, 3, 1, 5, 1)
y = (2, 3, 4, 4, 2, 6, 2)
and the values array some sort of data for this grid point.
Now I want to get the average of all values for each unique grid point.
For example the coordinate (1, 2) occurs at the positions (0, 4, 6), so for this point I want values[[0, 4, 6]].
How could I get this for all unique grid points?
You can sort coo with np.lexsort to bring the duplicate ones in succession. Then run np.diff along the rows to get a mask of starts of unique XY's in the sorted version. Using that mask, you can create an ID array that would have the same ID for the duplicates. The ID array can then be used with np.bincount to get the summation of all values with the same ID and also their counts and thus the average values, as the final output. Here's an implementation to go along those lines -
# Use lexsort to bring duplicate coo XY's in succession
sortidx = np.lexsort(coo.T)
sorted_coo = coo[sortidx]
# Get mask of start of each unique coo XY
unqID_mask = np.append(True,np.any(np.diff(sorted_coo,axis=0),axis=1))
# Tag/ID each coo XY based on their uniqueness among others
ID = unqID_mask.cumsum()-1
# Get unique coo XY's
unq_coo = sorted_coo[unqID_mask]
# Finally use bincount to get the summation of all coo within same IDs
# and their counts and thus the average values
average_values = np.bincount(ID,values[sortidx])/np.bincount(ID)
Sample run -
In [65]: coo
Out[65]:
array([[1, 2],
[2, 3],
[3, 4],
[3, 4],
[1, 2],
[5, 6],
[1, 2]])
In [66]: values
Out[66]: array([1, 2, 4, 2, 1, 6, 1])
In [67]: unq_coo
Out[67]:
array([[1, 2],
[2, 3],
[3, 4],
[5, 6]])
In [68]: average_values
Out[68]: array([ 1., 2., 3., 6.])
You can use where:
>>> values[np.where((coo == [1, 2]).all(1))].mean()
1.0
It is very likely going to be faster to flatten your indices, i.e.:
flat_index = coo[:, 0] * np.max(coo[:, 1]) + coo[:, 1]
then use np.unique on it:
unq, unq_idx, unq_inv, unq_cnt = np.unique(flat_index,
return_index=True,
return_inverse=True,
return_counts=True)
unique_coo = coo[unq_idx]
unique_mean = np.bincount(unq_inv, values) / unq_cnt
than the similar approach using lexsort.
But under the hood the method is virtually the same.
This is a simple one-liner using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
unique, mean = npi.group_by(coo).mean(values)
Should be comparable to the currently accepted answer in performance, as it does similar things under the hood; but all in a well tested package with a nice interface.
Another way to do it is using JAX unique and grad. This approach might be particularly fast because it allows you to run on an accelerator (CPU, GPU, or TPU).
import functools
import jax
import jax.numpy as jnp
#jax.grad
def _unique_sum(unique_values: jnp.ndarray, unique_inverses: jnp.ndarray, values: jnp.ndarray):
errors = unique_values[unique_inverses] - values
return -0.5*jnp.dot(errors, errors)
#functools.partial(jax.jit, static_argnames=['size'])
def unique_mean(indices, values, size):
unique_indices, unique_inverses, unique_counts = jnp.unique(indices, axis=0, return_inverse=True, return_counts=True, size=size)
unique_values = jnp.zeros(unique_indices.shape[0], dtype=float)
return unique_indices, _unique_sum(unique_values, unique_inverses, values) / unique_counts
coo = jnp.array([[1, 2], [2, 3], [3, 4], [3, 4], [1, 2], [5, 6], [1, 2]])
values = jnp.array([1, 2, 4, 2, 1, 6, 1])
unique_coo, unique_mean = unique_mean(coo, values, size=4)
print(unique_mean.block_until_ready())
The only weird thing is the size argument since JAX requires all array sizes to be fixed / known beforehand. If you make size too small it will throw out good results, too large it will return nan's.