Python equivalent to Matlab [a,b] = sort(y) - python

I am pretty new to the Python language and want to know how to the following
(1) y = [some vector]
(2) z = [some other vector]
(3) [ynew,indx] = sort(y)
(4) znew = z(indx)
I can do lines 1,2 and 4 but line 3 is giving me fits. Any suggestions. What I am looking for is not a user written function but something intrinsic to the language itself.
Thanks

using NumPy for line 3, assuming y is a row vector, otherwise axis=0 is needed:
ynew=y.sort(axis=1)
indx=y.argsort(axis=1)

I had the same problem with the following form, and the solution provided didn't work for me. I found a solution that work for me and I thought I could share it here in case anyone has the same problem:
My goal was to sort x in ascending number and move the indices of y in the same manner
x = np.array([00, 44, 22, 33, 11]) # create an array to sort
y = np.array([00, 11, 22, 33, 44]) # another array where you want to move index in the same way you did x
x_sorted = x[x.argsort()] # x sorted in ascending number
y_sorted = y[x.argsort()] # y sorted
The difference is that you don't store the original index position but, in my case, it wasn't a problem since I did that to modify multiple array following one. Thus, by using x.argsort(), it already gives where index have to be moved and, I think, will achieve the same results.

You could try to do something like the following:
import numpy as np
y = [1,3,2]
z = [3,2,1]
indx = [i[0] for i in sorted(enumerate(y), key=lambda x:x[1])]
print(indx)
#convert z to numpy array in order to use np.ix_ function
z = np.asarray(z)
znew = z[np.ix_(indx)]
print(znew)
Results:
#the indx is
[0, 2, 1]
#the znew is
array([3, 1, 2])

Related

How to filter two numpy arrays?

Edit: I fixed y so that x,y have the same length
I don't understand much about programing but I have a giant mass of data to analyze and it has to be done in Python.
Say I have two arrays:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
and say I want to choose the values in y which are greater than 17, and keep only the values in x which has the same index as the left values in y. for example I want to erase the first value of y (25) and accordingly the matching value in x (1).
I tried this:
filter=np.where(y>17, 0, y)
but I don't know how to filter the x values accordingly (the actual data are much longer arrays so doing it "by hand" is basically imposible)
Solution: using #mozway tip, now that x,y have the same length the needed code is:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
x_filtered=x[y>17]
As your question is not fully clear and you did not provide the expected output, here are two possibilities:
filtering
Nunique arrays can be sliced by an array (iterable) of booleans.
If the two arrays were the same length you could do:
x[y>17]
Here, xis longer than y so we first need to make it the same length:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20])
x[:len(y)][y>17]
Output: array([1, 2, 4, 5, 8])
replacement
To select between x and y based on a condition, use where:
np.where(y>17, x[:len(y)], y)
Output:
array([ 1, 2, 16, 4, 5, 5, 9, 8])
As someone with little experience in Numpy specifically, I wrote this answer before seeing #mozway's excellent answer for filtering. My answer works on more generic containers than Numpy's arrays, though it uses more concepts as a result. I'll attempt to explain each concept in enough detail for the answer to make sense.
TL;DR:
Please, definitely read the rest of the answer, it'll help you understand what's going on.
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([25,18,16,19,30,5,9,20])
filtered_x_list = []
filtered_y_list = []
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
# These lines are just for us to see what happened
print(filtered_x) # prints [1 2 4 5 8]
print(filtered_y) # prints [25 18 19 30 20]
Pre-requisite Knowledge
Python containers (lists, arrays, and a bunch of other stuff I won't get into)
Lets take a look at the line:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
What's Python doing?
The first thing it's doing is creating a list:
[1, 2, 3] # and so on
Lists in Python have a few features that are useful for us in this solution:
Accessing elements:
x_list = [ 1, 2, 3 ]
print(x_list[0]) # prints 1
print(x_list[1]) # prints 2, and so on
Adding elements to the end:
x_list = [ 1, 2, 3 ]
x_list.append(4)
print(x_list) # prints [1, 2, 3, 4]
Iteration:
x_list = [ 1, 2, 3 ]
for x in x_list:
print(x)
# prints:
# 1
# 2
# 3
Numpy arrays are slightly different: we can still access and iterate elements in them, but once they're created, we can't modify them - they have no .append, and there are other modifications one can do with lists (like changing one value, or deleting a value) we can't do with numpy arrays.
So the filtered_x_list and the filtered_y_list are empty lists we're creating, but we're going to modify them by adding the values we care about to the end.
The second thing Python is doing is creating a numpy array, using the list to define its contents. The array constructor can take a list expressed as [...], or a list defined by x_list = [...], which we're going to take advantage of later.
A little more on iteration
In your question, for every x element, there is a corresponding y element. We want to test something for each y element, then act on the corresponding x element, too.
Since we can access the same element in both arrays using an index - x[0], for instance - instead of iterating over one list or the other, we can iterate over all indices needed to access the lists.
First, we need to figure out how many indices we're going to need, which is just the length of the lists. len(x) lets us do that - in this case, it returns 10.
What if x and y are different lengths? In this case, I chose the smallest of the two - first, do len(x) and len(y), then pass those to the min() function, which is what min(len(x), len(y)) in the code above means.
Finally, we want to actually iterate through the indices, starting at 0 and ending at len(x) - 1 or len(y) - 1, whichever is smallest. The range sequence lets us do exactly that:
for i in range(10):
print(i)
# prints:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
So range(min(len(x), len(y))), finally, gets us the indices to iterate over, and finally, this line makes sense:
for i in range(min(len(x), len(y))):
Inside this for loop, i now gives us an index we can use for both x and y.
Now, we can do the comparison in our for loop:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
Then, including xs for the corresponding ys is a simple case of just appending the same x value to the x list:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
The filtered lists now contain the numbers you're after. The last two lines, outside the for loop, just create numpy arrays from the results:
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
Which you might want to do, if certain numpy functions expect arrays.
While there are, in my opinion, better ways to do this (I would probably write custom iterators that produce the intended results without creating new lists), they require a somewhat more advanced understanding of programming, so I opted for something simpler.

Condition in numpy array

I just learn about numpy and array today and I am quite confused about something. I don’t get line 3 as I thought np.array() should have a list in the (). Can someone explain that line to me? And for line 5, I know it is comparing array x and y. But can someone explain to me how it works? And what does x[y] mean? Thank you so much.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array(x<4)
print(y)
print(x[y])
What this code snippet does is masking the x array with a condition. x[y] is the masked array, which shows only the elements of x where y is True (in that case where x < 4).
y = np.array(x < 4) has an useless np.array call, as x < 4 is already a numpy array. That being said, you can give many objects to np.array() such as lists, tuples, other arrays...
The whole thing should be simply:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
print(x[x < 4])
# [1 2 3]
Line three tells you the places in x where the value is less than four. You don't need the "np.array" constructor out front though - y = x<4 will return you the same array.
Line five filters the x array, using the values in y that are 'True'. This is often referred to as a 'mask'.

Python - change multiple elements in list based on indices

I'm new to python.. My question: I have a list rt, and a list with indices rtInd, which varies in size and content.
For all the indices in rtInd, I want to change the corresponding elements in rt.
Example: rt = [10,20,30,40]; rtInd = [2,3]
What I want:
rt[2] = 30 + x
rt[3] = 40 + x
Can someone help me?
Thanks in advance!
Amy
If you have a list of indices, and you have x, you can loop through and update the values at the corresponding indices:
rt = [10,20,30,40]
rtInd = [2,3]
x = 10
for i in rtInd:
rt[i] += x
# result:
# rt = [10, 20, 40, 50]
If rt is a numpy array, you could index with the list (or numpy.array) of indices itself (must not be a tuple, to be clear, due to tuples being used for multidimensional access, not a set of indices to modify):
# Setup
rt = numpy.array([10, 20, 30, 40], dtype=np.int64)
rtInd = [2, 3] # Or numpy.array([2, 3])
x = 123 # Arbitrary value picked
# Actual work
rt[rtInd] += x
and that will push the work to a single operation inside numpy (potentially improving performance).
That said, numpy is a pretty heavyweight dependency; if you're not already relying on numpy, I'd stick to the simple loop described in M Z's answer.

Storing these vectors but which data structure to use in Python

Each iteration of the following loop generates a vector of dimension 50x1
Id like to store all the vectors from the loop collectively in a single data structure.
def get_y_hat(y_bar, x_train, theta_Ridge_Matrix):
print theta_Ridge_Matrix.shape
print theta_Ridge_Matrix.shape[0]
for i in range(theta_Ridge_Matrix.shape[0]):
yH = np.dot(x_train, theta_Ridge_Matrix[i].T)
print yH
Which data structure should I use? Im new to Python but based on what Ive researched online there are 2 options: numpy array and list of lists
I will need to access each vector of 50 elements later outside this method. There could be 200 to 500 vectors I will be storing.
Could someone give me sample code of such a data structure as well
Thanks
I think storing the data from your loop in a dict and than convert it to a pandas.Dataframe (which are build on top of numpy arrays) should be an efficient solution, allowing you to further process your data as a whole or as single vectors.
As an example:
import pandas as pd
import numpy as np
data = {}
# this would be your loop
for i in range(50):
data['run_%02d' % i] = np.random.randn(50)
data = pd.DataFrame(data) # sorted keys of the dict will be the columns
You can access single vectors as attribute or via the key:
print data['run_42'].describe() # or data.run_42.describe()
count 50.000000
mean 0.021426
std 1.027607
min -2.472225
25% -0.601868
50% 0.014949
75% 0.641488
max 2.391289
or further analyse the whole data:
print data.mean()
run_00 -0.015224
run_01 -0.006971
..
run_48 -0.115935
run_49 0.147738
or have a look at your data using matplotlib (as you are tagging your question with matplotlib):
data.boxplot(rot=90)
plt.tight_layout()
i suggest using numpy for that you need to install it
On windows from this site :
http://sourceforge.net/projects/numpy/files/NumPy/
some example how you can use it .
import numpy as np
we will create an array , we name it mat
>>> mat = np.random.randn(2,3)
>>> mat
array([[ 1.02063865, 1.52885147, 0.45588211],
[-0.82198131, 0.20995583, 0.31997462]])
The array is transposed using verb 'T'
>>> mat.T
array([[ 1.02063865, -0.82198131],
[ 1.52885147, 0.20995583],
[ 0.45588211, 0.31997462]])
The shape of any array is changed by using the \verb"reshape" method
>>> mat = np.random.randn(3,6)
array([[ 2.01139326, 1.33267072, 1.2947112 , 0.07492725, 0.49765694,
0.01757505],
[ 0.42309629, 0.95921276, 0.55840131, -1.22253606, -0.91811118,
0.59646987],
[ 0.19714104, -1.59446001, 1.43990671, -0.98266887, -0.42292461,
-1.2378431 ]])
>>> mat.reshape(2,9)
array([[ 2.01139326, 1.33267072, 1.2947112 , 0.07492725, 0.49765694,
0.01757505, 0.42309629, 0.95921276, 0.55840131],
[-1.22253606, -0.91811118, 0.59646987, 0.19714104, -1.59446001,
1.43990671, -0.98266887, -0.42292461, -1.2378431 ]])
We can change the shape of variable using the \verb"shape" attributes.
>>> mat = np.random.randn(4,3)
>>> mat.shape
(4, 3)
>>> mat
array([[-1.47446507, -0.46316836, 0.44047531],
[-0.21275495, -1.16089705, -1.14349478],
[-0.83299338, 0.20336677, 0.13460515],
[-1.73323076, -0.66500491, 1.13514327]])
>>> mat.shape = 2,6
>>> mat.shape
(2, 6)
>>> mat
array([[-1.47446507, -0.46316836, 0.44047531, -0.21275495, -1.16089705,
-1.14349478],
[-0.83299338, 0.20336677, 0.13460515, -1.73323076, -0.66500491,
1.13514327]])
I can't comment on a numpy array as I haven't used one before, but for using a list of lists Python already has built in support.
For example to do so:
AList = [1, 2, 3]
BList = [4, 5, 6]
CList = [7, 8, 9]
List_of_Lists = []
List_of_Lists.append(AList)
List_of_Lists.append(BList)
List_of_Lists.append(CList)
print(List_of_Lists)
which would yeild:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
There are also others ways you can go about creating the lists instead of intializing them all from the start for instance:
ListCreator = int(input('Input how many lists are needed: '))
ListofLists = [[] for index in range(ListCreator)]
There are more ways to go about it but I don't know how you plan on implementing it.
You could simply do
import numpy as np
def get_y_hat(y_bar, x_train, theta_Ridge_Matrix):
print theta_Ridge_Matrix.shape
print theta_Ridge_Matrix.shape[0]
yH = np.empty(theta_Ridge_Matrix.shape[0], theta_Ridge_Matrix[0].shape[0])
for i in range(theta_Ridge_Matrix.shape[0]):
yH[i, :] = np.dot(x_train, theta_Ridge_Matrix[i].T)
print yH
if you store the theta_Ridge_Matrix in a 3D array, you can also let np.dot do the work by using yH = np.dot(x_train, theta_Ridge_Matrix), which would sum over the second last dimension of the matrix.

Get the position of the largest value in a multi-dimensional NumPy array

How can I get get the position (indices) of the largest value in a multi-dimensional NumPy array?
The argmax() method should help.
Update
(After reading comment) I believe the argmax() method would work for multi dimensional arrays as well. The linked documentation gives an example of this:
>>> a = array([[10,50,30],[60,20,40]])
>>> maxindex = a.argmax()
>>> maxindex
3
Update 2
(Thanks to KennyTM's comment) You can use unravel_index(a.argmax(), a.shape) to get the index as a tuple:
>>> from numpy import unravel_index
>>> unravel_index(a.argmax(), a.shape)
(1, 0)
(edit) I was referring to an old answer which had been deleted. And the accepted answer came after mine. I agree that argmax is better than my answer.
Wouldn't it be more readable/intuitive to do like this?
numpy.nonzero(a.max() == a)
(array([1]), array([0]))
Or,
numpy.argwhere(a.max() == a)
You can simply write a function (that works only in 2d):
def argmax_2d(matrix):
maxN = np.argmax(matrix)
(xD,yD) = matrix.shape
if maxN >= xD:
x = maxN//xD
y = maxN % xD
else:
y = maxN
x = 0
return (x,y)
An alternative way is change numpy array to list and use max and index methods:
List = np.array([34, 7, 33, 10, 89, 22, -5])
_max = List.tolist().index(max(List))
_max
>>> 4

Categories