numpy unique always the same? - python

I'm using the following code snippet to get a list of unique arrays, but it reorders the list in a strange way. Is uniquecoords bound to be in the same order every time or is there any random factor?
for c in coordiantes:
coords.extend(c)
a = np.array(coords)
uniquecoords = np.unique(
a.view(
np.dtype( (np.void, a.dtype.itemsize*a.shape[1]) ))
).view(a.dtype).reshape(-1, a.shape[1])

According to the doc of numpy.unique(), the function "Returns the sorted unique elements of an array.". So the order should always be the same.
If you want to keep the original order, you can do
_, idx = np.unique(your_array_of_views, return_index=True)
uniquecoords = a[idx]

Related

How can I assign strings to different elements in an array, sort the array, then display the strings based on the sort in Python/Numpy?

I have an array P = np.array([2,3,1]) and I want to assign three strings to each element respectively.
So,:
"str1", "str2", "str3" = P
and after sorting the array:
In[]: P = -np.sort(-P)
Out[]: [3,2,1]
I want to then be able to display the strings based on this sort, as:
Out[]: "str2","str1","str3",
Tried assigning variable names to the elements but it won't display on output as intended.
Tried defining an array of objects with the strings as elements but have trouble assigning them to the numerical values of P.
You can use numpy.argsort.
import numpy as np
P = np.array([2,3,1])
S = np.array(["str1", "str2", "str3"])
sort_idx = np.argsort(-P)
print(S[sort_idx])
# ['str2' 'str1' 'str3']

Appending numpy array of arrays

I am trying to append an array to another array but its appending them as if it was just one array. What I would like to have is have each array appended on its own index, (withoug having to use a list, i want to use np arrays) i.e
temp = np.array([])
for i in my_items
m = get_item_ids(i.color) #returns an array as [1,4,20,5,3] (always same number of items but diff ids
temp = np.append(temp, m, axis=0)
On the second iteration lets suppose i get [5,4,15,3,10]
then i would like to have temp as
array([1,4,20,5,3][5,4,15,3,10])
But instead i keep getting [1,4,20,5,3,5,4,15,3,10]
I am new to python but i am sure there is probably a way to concatenate in this way with numpy without using lists?
You have to reshape m in order to have two dimension with
m.reshape(-1, 1)
thus adding the second dimension. Then you could concatenate along axis=1.
np.concatenate(temp, m, axis=1)
List append is much better - faster and easier to use correctly.
temp = []
for i in my_items
m = get_item_ids(i.color) #returns an array as [1,4,20,5,3] (always same number of items but diff ids
temp = m
Look at the list to see what it created. Then make an array from that:
arr = np.array(temp)
# or `np.vstack(temp)

Searching an array for all matches and returning indices of the match

I am trying to create an array that holds all of the rows where one (very large array) matches with a set of unique values. The problem is that the large array will have multiple rows where it will match and I need all of them stored in the same row of the new array.
Using a for loop to loop through each of the unique values works but is way too slow to be usable. I have been searching for a vectorized solution but have not been successful. Any help would be greatly appreciated!
arrStart = []
startRavel = startInforce['pol_id'].ravel()
for policy in unique_policies:
arrStart.append(np.argwhere(startRavel == policy))
The new array would have the same length as the unique values array but each element would be a list of all of the rows that match that unique value in the large array.
Sample Input would be something like this:
startRavel = [1,2,2,2,3,3]
unique_policies = [1,2,3]
Output:
arrStart = [[0], [1,2,3],[4,5]]
One possible option with NumPy, similar to your but flattened in list comprehension:
startRavel = np.array([1,2,2,2,3,3])
unique_policies = np.array([1,2,3])
[np.argwhere(startRavel == policy).flatten() for policy in unique_policies]
#=> [array([0]), array([1, 2, 3]), array([4, 5])]
Alternative, using flatnonzero():
[np.flatnonzero(startRavel == policy) for policy in unique_policies]
Generator version:
def matches_indexes(startRavel, unique_policies):
for policy in unique_policies:
yield np.flatnonzero(startRavel == policy)

appending indices to numpy array

I make a variable corr_matrix by iterating over rows and columns and correlating values.
import numpy as np
import random
enc_dict = {k: int(random.uniform(1,24)) for k in range(24)}
ret_dict = {k: int(random.uniform(1,24)) for k in range(24)}
corr_matrix=np.zeros((24,24))
ind_matrix = np.zeros((24,24))
data = np.random.rand(24,24)
for enc_row in range(0,24):
for ret_col in range(0,24):
corr_matrix[enc_row, ret_col] = np.corrcoef(data[enc_row,:], data[ret_col,:])[0,1]
if enc_dict[enc_row] == ret_dict[ret_col]:
ind_matrix = np.append(ind_matrix, [[enc_row, ret_col]])
I want to store the indices in the matrix where enc_dict[enc_row] == ret_dict[ret_col] as a variable to use for indexing corr_matrix. I can print the values, but I can't figure out how to store them in a variable in a way that allows me to use them for indexing later.
I want to:
make a variable, ind_matrix that is the indices where the above statement is true.
I want to use ind_matrix to index within my correlation matrix. I want to be able to index the whole row as well as the exact value where the above statement is true (enc_dict[enc_row] == ret_dict[ret_col])
I tried ind_matrix = np.append(ind_matrix, [[enc_row, ret_col]]) which gives me the correct values but it has a lot of 0s before the #s for some reason. Also it doesn't allow me to call each pair of points together to use for indexing. I want to be able to do something like corr_matrix[ind_matrix[1]]
Here is a modified version of your code containing a couple of suggestions and comments:
import numpy as np
# when indices are 0, 1, 2, ... don't use dictionary
# also for integer values use randint
enc_ = np.random.randint(1, 24, (24,))
ret_ = np.random.randint(1, 24, (24,))
data = np.random.rand(24,24)
# np.corrcoef is vectorized, no need to loop:
corr_matrix = np.corrcoef(data)
# the following is the clearest, but maybe not the fastest way of generating
# your index array:
ind_matrix = np.argwhere(np.equal.outer(enc_, ret_))
# this can't be used for indexing directly, you'll have to choose
# one of the following idioms
# EITHER spread to two index arrays
I, J = ind_matrix.T
# or directly I, J = np.where(np.equal.outer(enc_, ret_))
# single index
print(corr_matrix[I[1], J[1]])
# multiple indices
print(corr_matrix[I[[1,2,0]], J[[1,2,0]]])
# whole row
print(corr_matrix[I[1]])
# OR use tuple conversion
ind_matrix = np.array(ind_matrix)
# single index
print(corr_matrix[(*ind_matrix[1],)])
# multiple indices
print(corr_matrix[(*zip(*ind_matrix[[1,2,0]],),)])
# whole row
print(corr_matrix[ind_matrix[1, 0]])
# OR if you do not plan to use multiple indices
as_tuple = list(map(tuple, ind_matrix))
# single index
print(corr_matrix[as_tuple[1]])
# whole row
print(corr_matrix[as_tuple[1][0]])

Python & Numpy - create dynamic, arbitrary subsets of ndarray

I am looking for a general way to do this:
raw_data = np.array(somedata)
filterColumn1 = raw_data[:,1]
filterColumn2 = raw_data[:,3]
cartesian_product = itertools.product(np.unique(filterColumn1), np.unique(filterColumn2))
for val1, val2 in cartesian_product:
fixed_mask = (filterColumn1 == val1) & (filterColumn2 == val2)
subset = raw_data[fixed_mask]
I want to be able to use any amount of filterColumns. So what I want is this:
filterColumns = [filterColumn1, filterColumn2, ...]
uniqueValues = map(np.unique, filterColumns)
cartesian_product = itertools.product(*uniqueValues)
for combination in cartesian_product:
variable_mask = ????
subset = raw_data[variable_mask]
Is there a simple syntax to do what I want? Otherwise, should I try a different approach?
Edit: This seems to be working
cartesian_product = itertools.product(*uniqueValues)
for combination in cartesian_product:
variable_mask = True
for idx, fc in enumerate(filterColumns):
variable_mask &= (fc == combination[idx])
subset = raw_data[variable_mask]
You could use numpy.all and index broadcasting for this
filter_matrix = np.array(filterColumns)
combination_array = np.array(combination)
bool_matrix = filter_matrix == combination_array[newaxis, :] #not sure of the newaxis position
subset = raw_data[bool_matrix]
There are however simpler ways of doing the same thing if your filters are within the matrix, notably through numpy argsort and numpy roll over an axis. First you roll axes until your axes until you've ordered your filters as first columns, then you sort on them and slice the array vertically to get the rest of the matrix.
In general if an for loop can be avoided in Python, better avoid it.
Update:
Here is the full code without a for loop:
import numpy as np
# select filtering indexes
filter_indexes = [1, 3]
# generate the test data
raw_data = np.random.randint(0, 4, size=(50,5))
# create a column that we would use for indexing
index_columns = raw_data[:, filter_indexes]
# sort the index columns by lexigraphic order over all the indexing columns
argsorts = np.lexsort(index_columns.T)
# sort both the index and the data column
sorted_index = index_columns[argsorts, :]
sorted_data = raw_data[argsorts, :]
# in each indexing column, find if number in row and row-1 are identical
# then group to check if all numbers in corresponding positions in row and row-1 are identical
autocorrelation = np.all(sorted_index[1:, :] == sorted_index[:-1, :], axis=1)
# find out the breakpoints: these are the positions where row and row-1 are not identical
breakpoints = np.nonzero(np.logical_not(autocorrelation))[0]+1
# finally find the desired subsets
subsets = np.split(sorted_data, breakpoints)
An alternative implementation would be to transform the indexing matrix into a string matrix, sum row-wise, get an argsort over the now unique indexing column and split as above.
For conveniece, it might be more interesting to first roll the indexing matrix until they are all in the beginning of the matrix, so that the sorting done above is clear.
Something like this?
variable_mask = np.ones_like(filterColumns[0]) # select all rows initially
for column, val in zip(filterColumns, combination):
variable_mask &= (column == val)
subset = raw_data[variable_mask]

Categories