ordering an array based on values of another array - python

This question is probably basic for some of you but it I am new to Python. I have an initial array:
initial_array = np.array ([1, 6, 3, 4])
I have another array.
value_array= np.array ([10, 2, 3, 15])
I want an array called output array which looks at the values in value_array and reorder the initial array.
My result should look like this:
output_array = np.array ([4, 1, 3, 6])
Does anyone know if this is possible to do in Python?
So far I have tried:
for i in range(4):
find position of element

You can use numpy.argsort to find sort_index from value_array then rearrange the initial_array base sort_index in the reversing order with [::-1].
>>> idx_sort = value_array.argsort()
>>> initial_array[idx_sort[::-1]]
array([4, 1, 3, 6])

You could use stack to put arrays together - basically adding column to initial array, then sort by that column.
import numpy as np
initial_array = np.array ([1, 6, 3, 4])
value_array = np.array ([10, 2, 3, 15])
output_array = np.stack((initial_array, value_array), axis=1)
output_array=output_array[output_array[:, 1].argsort()][::-1]
print (output_array)
[::-1] part is for descending order. Remove to get ascending.
I am assuming initial_array and values_array will have same length.

Related

need to grab entire first element (array) in a multi-dimentional list of arrays python3

Apologies if this has already been asked, but I searched quite a bit and couldn't find quite the right solution. I'm new to python, but I'll try to be as clear as possible. In short, I have a list of arrays in the following format resulting from a joining a multiprocessing pool:
array = [[[1,2,3], 5, 47, 2515],..... [[4,5,6], 3, 35, 2096]]]
and I want to get all values from the first array element to form a new array in the following form:
print(new_array)
[1,2,3,4,5,6]
In my code, I was trying to get the first value through this function:
new_array = array[0][0]
but this only returns the first value as such:
print(new_array)
[1,2,3]
I also tried np.take after converting the array into a np array:
array = np.array(array)
new_array = np.take(results,0)
print(new_array)
[1,2,3]
I have tried a number of np functions (concatenate, take, etc.) to try and iterate this over the list, but get back the following error (presumably because the size of the array changes):
ValueError: autodetected range of [[], [1445.0, 1445.0, -248.0, 638.0, -108.0, 649.0]] is not finite
Thanks for any help!
You can achieve it without numpy using reduce:
from functools import reduce
l = [[[1,2,3], 5, 47, 2515], [[4,5,6], 3, 35, 2096]]
res = reduce(lambda a, b: [*a, *b], [x[0] for x in l])
Output
[1, 2, 3, 4, 5, 6]
Maybe it is worth mentioning that [*a, *b] is a way to concatenate lists in python, for example:
[*[1, 2, 3], *[4, 5, 6]] # [1, 2, 3, 4, 5, 6]
You could also use itertools' chain() function to flatten an extraction of the first subArray in each element of the list:
from itertools import chain
result = list(chain(*[sub[0] for sub in array]))

numpy.delete not removing column from array

I'm attempting to remove each column one at a time from an array and, based on the documentation and this question, thought the following should work:
print(all_input_data.shape)
for n in range(9):
print(n)
testArray = all_input_data.copy()
print(testArray.shape)
np.delete(testArray,[n],axis=1)
print(testArray.shape)
print(testArray[0:1][:])
The original matrix is all_input_data.
This is not causing any columns to be deleted or generating any other change to the array. The initial output for the snippet above is:
(682120, 9)
0
(682120, 9)
(682120, 9)
[[ 2.37000000e+02 1.60000000e+01 9.90000000e+01 1.04910000e+03
9.29000000e-01 9.86000000e-01 8.43000000e-01 4.99290000e+01
1.97000000e+00]]
The delete command is not changing the shape of the matrix at all.
np.delete returns a copy of the input array with elements removed.
Return a new array with sub-arrays along an axis deleted.
There is no in place deletion of array elements in numpy.
Because np.delete returns a copy and does not modify the input there is no need to manually make a copy of all_input_data:
import numpy as np
all_input_data = np.random.rand(100, 9)
for n in range(9):
print(n)
testArray = np.delete(all_input_data,[n],axis=1)
print(testArray.shape)
print(testArray[0:1][:])
From linked question consider this:
In [2]: a = np.arange(12).reshape(3,4)
In [3]: np.delete(a, [1,3], axis=1)
Out[3]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
In [4]: a
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In other words, if you want to save changes you should save to a new variable, but considering the size of your matrix this wouldn't be practical. What you could do is use slice notation indexing. It is explained here.

Replace values in a large list of arrays (performance)

I have a performance problem with replacing values of a list of arrays using a dictionary.
Let's say this is my dictionary:
# Create a sample dictionary
keys = [1, 2, 3, 4]
values = [5, 6, 7, 8]
dictionary = dict(zip(keys, values))
And this is my list of arrays:
# import numpy as np
# List of arrays
listvalues = []
arr1 = np.array([1, 3, 2])
arr2 = np.array([1, 1, 2, 4])
arr3 = np.array([4, 3, 2])
listvalues.append(arr1)
listvalues.append(arr2)
listvalues.append(arr3)
listvalues
>[array([1, 3, 2]), array([1, 1, 2, 4]), array([4, 3, 2])]
I then use the following function to replace all values in a nD numpy array using a dictionary:
# Replace function
def replace(arr, rep_dict):
rep_keys, rep_vals = np.array(list(zip(*sorted(rep_dict.items()))))
idces = np.digitize(arr, rep_keys, right=True)
return rep_vals[idces]
This function is really fast, however I need to iterate over my list of arrays to apply this function to each array:
replaced = []
for i in xrange(len(listvalues)):
replaced.append(replace(listvalues[i], dictionary))
This is the bottleneck of the process, as it needs to iterate over thousands of arrays.
How could I do achieve the same result without using the for-loop? It is important that the result is in the same format as the input (a list of arrays with replaced values)
Many thanks guys!!
This will do the trick efficiently, using the numpy_indexed package. It can be further simplified if all values in 'listvalues' are guaranteed to be present in 'keys'; but ill leave that as an exercise to the reader.
import numpy_indexed as npi
arr = np.concatenate(listvalues)
idx = npi.indices(keys, arr, missing='mask')
remap = np.logical_not(idx.mask)
arr[remap] = np.array(values)[idx[remap]]
replaced = np.array_split(arr, np.cumsum([len(a) for a in listvalues][:-1]))

How to return all the minimum indices in numpy

I am a little bit confused reading the documentation of argmin function in numpy.
It looks like it should do the job:
Reading this
Return the indices of the minimum values along an axis.
I might assume that
np.argmin([5, 3, 2, 1, 1, 1, 6, 1])
will return an array of all indices: which will be [3, 4, 5, 7]
But instead of this it returns only 3. Where is the catch, or what should I do to get my result?
That documentation makes more sense when you think about multidimensional arrays.
>>> x = numpy.array([[0, 1],
... [3, 2]])
>>> x.argmin(axis=0)
array([0, 0])
>>> x.argmin(axis=1)
array([0, 1])
With an axis specified, argmin takes one-dimensional subarrays along the given axis and returns the first index of each subarray's minimum value. It doesn't return all indices of a single minimum value.
To get all indices of the minimum value, you could do
numpy.where(x == x.min())
See the documentation for numpy.argmax (which is referred to by the docs for numpy.argmin):
In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.
The phrasing of the documentation ("indices" instead of "index") refers to the multidimensional case when axis is provided.
So, you can't do it with np.argmin. Instead, this will work:
np.where(arr == arr.min())
I would like to quickly add that as user grofte mentioned, np.where returns a tuple and it states that it is a shorthand for nonzero which has a corresponding method flatnonzero which returns an array directly.
So, the cleanest version seems to be
my_list = np.array([5, 3, 2, 1, 1, 1, 6, 1])
np.flatnonzero(my_list == my_list.min())
=> array([3, 4, 5, 7])
Assuming that you want the indices of a list, not a numpy array, try
import numpy as np
my_list = [5, 3, 2, 1, 1, 1, 6, 1]
np.where(np.array(my_list) == min(my_list))[0]
The index [0] is because numpy returns a tuple of your answer and nothing (answer as a numpy array). Don't ask me why.
Recommended way (by numpy documents) to get all indices of the minimum value is:
x = np.array([5, 3, 2, 1, 1, 1, 6, 1])
a, = np.nonzero(x == x.min()) # a=>array([3, 4, 5, 7])

extracting indices from sorted list

Suppose you have an unsorted list and you sort it using np.sort. Is there a way to get the indices of the sorted list from the original list using numpy?
The easiest way is to augment the array with the position indices and then sort the 2-D array. That gives you both the sorted data and its original position indicies at the same time.
If you only want the indicies (not the sorted data), the use argsort:
>>> from numpy import array
>>> arr = array([10, 5, 80, 20, 70, 18])
>>> arr.argsort()
array([1, 0, 5, 3, 4, 2])

Categories