replace element by element different arrays - python

I have an array :
a = np.array([1,2,3,4,5,6,7,8])
The array may be reshaped to a = np.array([[1,2,3,4],[5,6,7,8]]), whatever is more convenient.
Now, I have an array :
b = np.array([[11,22,33,44], [55,66,77,88]])
I want to replace to each of these elements the corresponding elements from a.
The a array will always hold as many elements as b has.
So, array b will be :
[1,2,3,4], [5,6,7,8]
Note, that I must keep each b subarray dimension to (4,). I don't want to change it.So, the idx will take values from 0 to 3.I want to make a fit to every four values.
I am struggling with reshape, split,mask ..etc and I can't figure a way to do it.
import numpy as np
#a = np.array([[1,2,3,4],[5,6,7,8]])
a = np.array([1,2,3,4,5,6,7,8])
b = np.array([[11,22,33,44], [55,66,77,88]])
for arr in b:
for idx, x in enumerate(arr):
replace every arr[idx] with corresponding a value

For your current case, what you want is probably:
b, c = list(a.reshape(2, -1))
This isn't the cleanest, but it is a one-liner. Turn your 1D array into a 2D array with with the first dimension as 2 with reshape(2, -1), then list splits it along the first dimension so you can directly assign them to b, c
You can also do it with the specialty function numpy.split
b, c = np.split(a, 2)
EDIT: Based on accepted solution, vectorized way to do this is
b = a.reshape(b.shape)

The following worked for me:
i = 0
for arr in b:
for idx, x in enumerate(arr):
arr[idx] = a[i]
print(arr[idx])
i += 1
Output (arr[idx]): 1 2 3 4 5 6 7 8
If you type print(b) it'll output [[1 2 3 4] [5 6 7 8]]

b = a[:len(a)//2]
c = a[len(a)//2:]

Well, I'm quite new to Python but this worked for me:
for i in range (0, len(a)//2):
b[i] = a[i]
for i in range (len(a)//2,len(a)):
c[i-4] = a[i]
by printing the 3 arrays I have the following output:
[1 2 3 4 5 6 7 8]
[1 2 3 4]
[5 6 7 8]
But I would go for Daniel's solution (the split one): 1 liner, using numpy API, ...

Related

Creating submatrix in python

Given a matrix S and a binary matrix W, I want to create a submatrix of S corresponding to the non zero coordinates of W.
For example:
S = [[1,1],[1,2],[1,3],[1,4],[1,5]]
W = [[1,0,0],[1,1,0],[1,1,1],[0,1,1],[0,0,1]]
I want to get matrices
S_1 = [[1,1],[1,2],[1,3]]
S_2 = [[1,2],[1,3],[1,4]]
S_3 = [[1,3],[1,4],[1,5]]
I couldn't figure out a slick way to do this in python. The best I could do for each S_i is
S_1 = S[0,:]
for i in range(np.shape(W)[0]):
if W[i, 0] == 1:
S_1 = np.vstack((S_1, S[i, :]))
but if i want to change the dimensions of the problem and have, say, 100 S_i's, writing a for loop for each one seems a bit ugly. (Side note: S_1 should be initialized to some empty 2d array but I couldn't get that to work, so initialized it to S[0,:] as a placeholder).
EDIT: To clarify what I mean:
I have a matrix S
1 1
1 2
1 3
1 4
1 5
and I have a binary matrix
1 0 0
1 1 0
1 1 1
0 1 1
0 0 1
Given the first column of the binary matrix W
1
1
1
0
0
The 1's are in the first, second, and third positions. So I want to create a corresponding submatrix of S with just the first, second and third positions of every column, so S_1 (corresponding to the 1st column of W) is
1 1
1 2
1 3
Similarly, if we look at the third column of W
0
0
1
1
1
The 1's are in the last three coordinates and so I want a submatrix of S with just the last three coordinates of every column, called S_3
1 3
1 4
1 5
So given any ith column of the binary matrix, I'm looking to generate a submatrix S_i where the columns of S_i contain the columns of S, but only the entries corresponding to the positions of the 1's in the ith column of the binary matrix.
It probably is more useful to work with the transpose of W rather than W itself, both for human-readability and to facilitate writing the code. This means that the entries that affect each S_i are grouped together in one of the inner parentheses of W, i.e. in a row of W rather than a column as you have it now.
Then, S_i = np.array[S[j,:] for j in np.shape(S)[0] if W_T[i,j] == 1], where W_T is the transpose of W. If you need/want to stick with W as is, you need to reverse the indices i and j.
As for the outer loop, you could try to nest this in another similar comprehension without an if statement--however this might be awkward since you aren't actually building one output matrix (the S_i can easily be different dimensions, unless you're somehow guaranteed to have the same number of 1s in every column of W). This in fact raises the question of what you want--a list of these arrays S_i? Otherwise if they are separate variables as you have it written, there's no good way to refer to them in a generalizable way as they don't have indices.
Numpy can do this directly.
import numpy as np
S = np.array([[1,1],[1,2],[1,3],[1,4],[1,5]])
W = np.array([[1,0,0],[1,1,0],[1,1,1],[0,1,1],[0,0,1]])
for row in range(W.shape[1]):
print(S[W[:,row]==1])
Output:
[[1 1]
[1 2]
[1 3]]
[[1 2]
[1 3]
[1 4]]
[[1 3]
[1 4]
[1 5]]

Sorting a random array using permutation

I tried to sort an array by permuting it with itself
(the array contain all the numbers in range between 0 to its length-1)
so to test it I used random.shuffle but it had some unexpected results
a = np.array(range(10))
random.shuffle(a)
a = a[a]
a = a[a]
print(a)
# not a sorted array
# [9 5 2 3 1 7 6 8 0 4]
a = np.array([2,1,4,7,6,5,0,3,8,9])
a = a[a]
a = a[a]
print(a)
# [0 1 2 3 4 5 6 7 8 9]
so for some reason the permutation when using the second example of an unsorted array returns the sorted array as expected but the shuffled array doesn't work the same way.
Does anyone know why? Or if there is an easier way to sort using permutation or something similar it would be great.
TL;DR
There is no reason to expect a = a[a] to sort the array. In most cases it won't. In case of a coincidence it might.
What is the operation c = b[a]? or Applying a permutation
When you use an array a obtained by shuffling range(n) as a mask for an array b of same size n, you are applying a permutation, in the mathematical sense, to the elements of b. For instance:
a = [2,0,1]
b = np.array(['Alice','Bob','Charlie'])
print(b[a])
# ['Charlie' 'Alice' 'Bob']
In this example, array a represents the permutation (2 0 1), which is a cycle of length 3. Since the length of the cycle is 3, if you apply it three times, you will end up where you started:
a = [2,0,1]
b = np.array(['Alice','Bob','Charlie'])
c = b
for i in range(3):
c = c[a]
print(c)
# ['Charlie' 'Alice' 'Bob']
# ['Bob' 'Charlie' 'Alice']
# ['Alice' 'Bob' 'Charlie']
Note that I used strings for the elements of b ton avoid confusing them with indices. Of course, I could have used numbers from range(n):
a = [2,0,1]
b = np.array([0,1,2])
c = b
for i in range(3):
c = c[a]
print(c)
# [2 0 1]
# [1 2 0]
# [0 1 2]
You might see an interesting, but unsurprising fact: The first line is equal to a; in other words, the first result of applying a to b is equal to a itself. This is because b was initialised to [0 1 2], which represent the identity permutation id; thus, the permutations that we find by repeatedly applying a to b are:
id == a^0
a
a^2
a^3 == id
Can we always go back where we started? or The rank of a permutation
It is a well-known result of algebra that if you apply the same permutation again and again, you will eventually end up on the identity permutation. In algebraic notations: for every permutation a, there exists an integer k such that a^k == id.
Can we guess the value of k?
The minimum value of k is called the rank of a permutation.
If a is a cycle, then the minimum possible k is the length of the cycle. In our previous example, a was a cycle of length 3, so it took three applications of a before we found the identity permutation again.
How about a cycle of length 2? A cycle of length 2 is just "swapping two elements". For instance, swapping elements 0 and 1:
a = [1,0,2]
b = np.array([0,1,2])
c = b
for i in range(2):
c = c[a]
print(c)
# [1 0 2]
# [0 1 2]
We swap 0 and 1, then we swap them back.
How about two disjoint cycles? Let's try a cycle of length 3 on the first three elements, simultaneously with swapping the last two elements:
a = [2,0,1,3,4,5,7,6]
b = np.array([0,1,2,3,4,5,6,7])
c = b
for i in range(6):
c = c[a]
print(c)
# [2 0 1 3 4 5 7 6]
# [1 2 0 3 4 5 6 7]
# [0 1 2 3 4 5 7 6]
# [2 0 1 3 4 5 6 7]
# [1 2 0 3 4 5 7 6]
# [0 1 2 3 4 5 6 7]
As you can see by carefully examining the intermediary results, there is a period of length 3 on the first three elements, and a period of length 2 on the last two elements. The overall period is the least common multiple of the two periods, which is 6.
What is k in general? A well-known theorem of algebra states: every permutation can be written as a product of disjoint cycles. The rank of a cycle is the length of the cycle. The rank of a product of disjoint cycles is the least common multiple of the ranks of cycles.
A coincidence in your code: sorting [2,1,4,7,6,5,0,3,8,9]
Let us go back to your python code.
a = np.array([2,1,4,7,6,5,0,3,8,9])
a = a[a]
a = a[a]
print(a)
# [0 1 2 3 4 5 6 7 8 9]
How many times did you apply permutation a? Note that because of the assignment a =, array a changed between the first and the second lines a = a[a]. Let us dissipate some confusion by using a different variable name for every different value. Your code is equivalent to:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a2 = a[a]
a4 = a2[a2]
print(a4)
Or equivalently:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = (a[a])[a[a]]
This last line looks a little bit complicated. However, a cool result of algebra is that composition of permutations is associative. You already knew that addition and multiplication were associative: x+(y+z) == (x+y)+z and x(yz) == (xy)z. Well, it turns out that composition of permutations is associative as well! Using numpy's masks, this means that:
a[b[c]] == (a[b])[c]
Thus your python code is equivalent to:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = ((a[a])[a])[a]
print(a4)
Or without the unneeded parentheses:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = a[a][a][a]
print(a4)
Since a4 is the identity permutation, this tells us that the rank of a divides 4. Thus the rank of a is 1, 2 or 4. This tells us that a can be written as a product of swaps and length-4 cycles. The only permutation of rank 1 is the identity itself. Permutations of rank 2 are products of disjoint swaps, and we can see that this is not the case of a. Thus the rank of a must be exactly 4.
You can find the cycles by choosing an element, and following its orbit: what values is that element successively transformed into? Here we see that:
0 is transformed into 2; 2 is transformed into 4; 4 is transformed into 6; 6 is transformed into 0;
1 remains untouched;
3 becomes 7; 7 becomes 3;
5 is untouched; 8 and 9 are untouched.
Conclusion: Your numpy array represents the permutation (0 -> 2 -> 4 -> 6 -> 0)(3 <-> 7), and its rank is the least common multiple of 4 and 2, lcm(4,2) == 4.
it's took some time but I figure a way to do it.
numpy doesn't have this fiture but panda does have.
by using df.reindex I can sort a data frame by it indexes
import pandas as pd
import numpy as np
train_df = pd.DataFrame(range(10))
train_df = train_df.reindex(np.random.permutation(train_df.index))
print(train_df) # random dataframe contaning all values up to 9
train_df = train_df.reindex(range(10))
print(train_df) # sort data frame

How to compare numpy arrays in terms of similarity

I am given two numpy-arrays: One of dimensions i x mand the other of dimensions j x m. What I want to do is, loop through the FirstArray and compare each of its elements with each of the elements of the SecondArray. When I say 'compare', I mean: I want to compute the Euclidean distance between the elements of FirstArray and SecondArray. Then, I want to store the index of the element of SecondArray that is closest to the corresponding element of FirstArray, and I also want to store the index of the element of SecondArray that is second closest to the element of the FirstArray.
In code this would look somewhat similar to this:
smallest = None
idx = 0
for i in range(0, FirstArrayRows):
for j in range(0, SecondArrayRows):
EuclideanDistance = np.sqrt(np.sum(np.square(FirstArray[i,:] - SecondArray[j,:])))
if smallest is None or EuclideanDistance < smallest:
smallest = EuclideanDistance
idx_second = idx
idx = j
Closest[i] = idx
SecondClosest[i] = idx_second
And I think this works. However, there are two cases when this code fails to give the correct index for the second closest element of SecondArray:
when the element of SecondArray that is closest to the element of FirstArray is at j = 0.
when the element of SecondArray that is closest to the element of FirstArray is at j = 1.
So I wonder: Is there a better way of implementing this?
I know there is. Maybe someone can help me see it?
You could use numpy's broadcasting to your advantage. Compute the Euclidean distance with all elements of the second array in a single operation. Then, you can find the two smallest distances using argpartition.
import numpy as np
i, j, m = 3, 4, 5
a = np.random.choice(10,(i,m))
b = np.random.choice(10,(j,m))
print('First array:\n',a)
print('Second array:\n',b)
closest, second_closest = np.zeros(i), np.zeros(i)
for i in range(a.shape[0]):
dist = np.sqrt(((a[i,:] - b)**2).sum(axis=1))
closest[i], second_closest[i] = np.argpartition(dist, 2)[:2]
print('Closest:', closest)
print('Second Closest:', second_closest)
Output:
First array:
[[3 9 0 2 2]
[1 2 9 9 7]
[4 0 6 6 4]]
Second array:
[[9 9 2 2 3]
[9 9 0 2 3]
[1 1 6 7 7]
[5 7 0 4 4]]
Closest: [3. 2. 2.]
Second Closest: [1. 3. 3.]

How does the following code work for Matrix Multiplication in Python

I'm trying to do a matrix multiplication in python I have found the following code that I am trying to understand. (I know how to multiply matrices by hand and I want to understand how the following code performs the same action and by that i mean the first element in BA (row 1 column 1) is calculated by doing (1*1 + 3*3 + 3*3 + 1*1) etc.
from numpy import array
A= array([[ 1, 4, 5 ],[ 3, 2, 9], [ 3,6, 2], [ 1,6, 8]])
B=A.T
BA= array([[ 0, 0, 0 ],[ 0,0, 0], [ 0,0, 0] ])
for i in range(len(B)):
for j in range(len(A[0])):
for k in range(len(A)):
BA[i][j] += B[i][k] * A[k][j]
I know that the length command for a list returns how many elements there are in that list. I am not sure how it works here since B is a matrix, I assume it returns how many rows there are.
range of len(B) would be (0,3) corresponding to row 1,2 and 3.
for i in range would correspond to i=0, i=1, i= 2
next confusing thing is for j in range len(A[0])
The first element of A is the first row, the length here would thus correspond how many elements there are in the first element of A.
Basically I have a basic understanding of what range and len etc put out for this example but I would like to get a better understand of each value of i, j, k as a result of these as well as the last line which I really don't understand.
BA[i][j] += B[i][k] * A[k][j]
Please explain as basic as possible because I am new to programming and so at this point nothing is trivial to me. Thank you for your time to help others :)
Here is the actual result from your code:
B * A = AB
1 3 3 1 1 4 5 20 34 46
4 2 6 6 3 2 9 34 92 98
5 9 2 8 3 6 2 46 98 174
1 6 8
Assuming i = 0 and j = 0 lets calculate the BA[0][0], which is the first element from matrix BA.
BA[0][0] = B[0][k] * A[k][0]
B[0][k] means the line 0 from matrix B. As k is iterating over all lines of A, which is the same size as the number of columns in B.
A[k][0] means the column 0 from matrix A.
The loop for k in range(len(A)): will reproduce:
B[0][0]*A[0][0] + B[0][1]*A[1][0] + B[0][2]*A[2][0] + B[0][3]*A[3][0]
Resulting in:
1×1 + 3×3 + 3×3 + 1×1 = 20
Which is the value for BA[0][0] resulted from your code.
The following nested loops will iterate over all columns of A as j for every line of B as i in order to perform the multiplication for all (line) x (column) pairs:
for i in range(len(B)):
for j in range(len(A[0])):
Consider here the array as just a more convenient representation of the list you gave to the function.
A is built on top of the list [[ 1, 4, 5 ],[ 3, 2, 9], [ 3,6, 2], [ 1,6, 8]], a list of length 4.
range(start, stop) is a function which returns a generator which produces a sequence of integers, from the start to the stop point, stop not being included. If not provided, start defaults to 0.
B has a length of 4 rows so range(len(B)) will be like range(0, 3), which will produce 0,1 and 2 integers, when "asked" by the for loop. i will subsequently be 0,1 and 2.
A[0] returns the first row of A, which has a length of 3, so the same way, j will subsequently be 0, 1 and 2 (in this case) ; and k will subsequently be 0, 1, 2 and 3 as A has a length of 4.
BA[i] returns the row of index i. Which can also be indexed by j
So BA[i][j] is the element of row i and column j, which we increments by the product of the element of row i and index k of matrix B; and the element of row k and index j of matrix A.
In you code sample a matrix is represented a a list of sublists, where each sublist is a row.
So the most outer loop goes over the rows of B:
for i in range(len(B)):
A[0] is the first row of A, and the number of elements in it is the number of A's columns.
So the second loop goes over the columns of A:
for j in range(len(A[0])):
The most inner loop simply sums the products of the elements in the j-th row of B and the i-th row of A.
BA[i][j] += B[i][k] * A[k][j]
This add to BA[i][j] the product. += adds its right argument to its left one.

Slicing at runtime

can someone explain me how to slice a numpy.array at runtime?
I don't know the rank (number of dimensions) at 'coding time'.
A minimal example:
import numpy as np
a = np.arange(16).reshape(4,4) # 2D matrix
targetsize = [2,3] # desired shape
b_correct = dynSlicing(a, targetsize)
b_wrong = np.resize(a, targetsize)
print a
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
print b_correct
[[0 1 2]
[4 5 6]]
print b_wrong
[[0 1 2]
[3 4 5]]
And my ugly dynSlicing():
def dynSlicing(data, targetsize):
ndims = len(targetsize)
if(ndims==1):
return data[:targetsize[0]],
elif(ndims==2):
return data[:targetsize[0], :targetsize[1]]
elif(ndims==3):
return data[:targetsize[0], :targetsize[1], :targetsize[2]]
elif(ndims==4):
return data[:targetsize[0], :targetsize[1], :targetsize[2], :targetsize[3]]
Resize() will not do the job since it flats the array before dropping elements.
Thanks,
Tebas
Passing a tuple of slice objects does the job:
def dynSlicing(data, targetsize):
return data[tuple(slice(x) for x in targetsize)]
Simple solution:
b = a[tuple(map(slice,targetsize))]
You can directly 'change' it. This is due to the nature of arrays only allowing backdrop.
Instead you can copy a section, or even better create a view of the desired shape:
Link

Categories