Avoid nested for with numpy arrays - python

I need to implement a generic operation on the elements of some np 2D arrays (A,B,C). In pseudo-code
for i in A.height:
for j in A.width:
A[i,j] = f(B[i,j],C[i,j])
where f() is concatenating the bits of the two variables by means of struct.pack(), struct.unpack()
x = struct.pack('2B', B[i, j], C[i, j])
y = struct.unpack('H', x)
This code takes a really long time to execute (0.25 secs for 640*480 matrices ... maybe is normal yet I could use something faster ), so I was wondering if anybody could suggest me some pythonic way of achieving the same result which could also improve the performance

Your function:
In [310]: def foo(a,b):
...: x = struct.pack('2B', a,b)
...: return struct.unpack('H',x)[0]
np.vectorize is a convenient way of broadcasting arrays. It passes scalar values to the functions. It does not speed up the code (related frompyfunc may give a 2x speed up relative to plain iteration)
In [311]: fv = np.vectorize(foo)
In [312]: fv(np.arange(5)[:,None],np.arange(10))
Out[312]:
array([[ 0, 256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2304],
[ 1, 257, 513, 769, 1025, 1281, 1537, 1793, 2049, 2305],
[ 2, 258, 514, 770, 1026, 1282, 1538, 1794, 2050, 2306],
[ 3, 259, 515, 771, 1027, 1283, 1539, 1795, 2051, 2307],
[ 4, 260, 516, 772, 1028, 1284, 1540, 1796, 2052, 2308]])
I can replicate those values with a simple math expression on the same arrays:
In [313]: np.arange(5)[:,None]+np.arange(10)*256
Out[313]:
array([[ 0, 256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2304],
[ 1, 257, 513, 769, 1025, 1281, 1537, 1793, 2049, 2305],
[ 2, 258, 514, 770, 1026, 1282, 1538, 1794, 2050, 2306],
[ 3, 259, 515, 771, 1027, 1283, 1539, 1795, 2051, 2307],
[ 4, 260, 516, 772, 1028, 1284, 1540, 1796, 2052, 2308]])
This probably only works for limited ranges of values, but it gives an idea of how you can properly 'vectorize' calculations in numpy.

Depends on what 'f' does... Not sure if this is what you mean
b = np.arange(3*4).reshape(3,4)
c = np.arange(3*4).reshape(3,4)[::-1]
b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
c
array([[ 8, 9, 10, 11],
[ 4, 5, 6, 7],
[ 0, 1, 2, 3]])
def f(b, c):
"""some function"""
a = b + c
return a
a = f(b, c)
a
array([[ 8, 10, 12, 14],
[ 8, 10, 12, 14],
[ 8, 10, 12, 14]])

Related

How to concat corresponding elements (which are integers) of two 2D arrays of the same shape?

I have two 10x8 arrays, C and D. I need to concat the corresponding elements of these two arrays and store the result in another 10x8 array. For example, if C = [[1, 2, 3, 4, 5, 6, 7, 8],[9, 10, 11, 12, 13, 14, 15, 16],[8 elements],... [10th row which has 8 elements]] and D = [[100, 99, 98, 97, 96, 95, 94, 93],[92, 90, 89, 88, 87, 86, 85, 84],[8 elements],... [10th row which has 8 elements]]. I need another 10x8 array, E, which looks like E = [[1100, 299, 398, 497, 596, 695, 794, 893], [992, 1090, 1189, 1288, 1387, 1486, 1585, 1684],... [10th row which contains concatenation of the corresponding 8 elements in the 10th row of C and D]]. How do I obtain this? Appreciate your help!
Nested list comprehension:
>>> C = [[1, 2, 3, 4, 5, 6, 7, 8],[9, 10, 11, 12, 13, 14, 15, 16]]
>>> D = [[100, 99, 98, 97, 96, 95, 94, 93],[92, 90, 89, 88, 87, 86, 85, 84]]
>>> [[int(f'{c}{d}') for c, d in zip(lc, ld)] for lc, ld in zip(C, D)]
[[1100, 299, 398, 497, 596, 695, 794, 893],
[992, 1090, 1189, 1288, 1387, 1486, 1585, 1684]]
Just for fun, here is a functional solution:
>>> from functools import partial
>>> from itertools import starmap
>>> list(map(list, map(partial(map, int), map(partial(starmap, '{0}{1}'.format), map(zip, C, D)))))
[[1100, 299, 398, 497, 596, 695, 794, 893],
[992, 1090, 1189, 1288, 1387, 1486, 1585, 1684]]
Just run a loop and concatenate with the concatenation method. Do not run two loop just run one loop and concatenate with the loop if they are of same dimensions. This is a very easy method if they are of same dimensions.

Using np.newaxis to compute sum of squared differences

In chapter 2 of "Python Data Science Handbook" by Jake VanderPlas, he computes the sum of squared differences of several 2-d points using the following code:
rand = np.random.RandomState(42)
X = rand.rand(10,2)
dist_sq = np.sum(X[:,np.newaxis,:] - X[np.newaxis,:,:]) ** 2, axis=-1)
Two questions:
Why is a third axis created? What is the best way to visualize what is going on?
Is there a more intuitive way to perform this calculation?
Why is a third axis created? What is the best way to visualize what is going on?
The adding new dimensions before adding/subtracting trick is a relatively common one to generate all pairs, by using broadcasting (None is the same as np.newaxis here):
>>> a = np.arange(10)
>>> a[:,None]
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
>>> a[None,:]
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> a[:,None] + 100*a[None,:]
array([[ 0, 100, 200, 300, 400, 500, 600, 700, 800, 900],
[ 1, 101, 201, 301, 401, 501, 601, 701, 801, 901],
[ 2, 102, 202, 302, 402, 502, 602, 702, 802, 902],
[ 3, 103, 203, 303, 403, 503, 603, 703, 803, 903],
[ 4, 104, 204, 304, 404, 504, 604, 704, 804, 904],
[ 5, 105, 205, 305, 405, 505, 605, 705, 805, 905],
[ 6, 106, 206, 306, 406, 506, 606, 706, 806, 906],
[ 7, 107, 207, 307, 407, 507, 607, 707, 807, 907],
[ 8, 108, 208, 308, 408, 508, 608, 708, 808, 908],
[ 9, 109, 209, 309, 409, 509, 609, 709, 809, 909]])
Your example does the same, just with 2-vectors instead of scalars at the innermost level:
>>> X[:,np.newaxis,:].shape
(10, 1, 2)
>>> X[np.newaxis,:,:].shape
(1, 10, 2)
>>> (X[:,np.newaxis,:] - X[np.newaxis,:,:]).shape
(10, 10, 2)
Thus we find that the 'magical subtraction' is just all combinations of the coordinate X subtracted from each other.
Is there a more intuitive way to perform this calculation?
Yes, use scipy.spatial.distance.pdist for pairwise distances. To get an equivalent result to your example:
from scipy.spatial.distance import pdist, squareform
dist_sq = squareform(pdist(X))**2

How to modify rows of numpy arrays stored as a list

I want to modify rows of numpy arrays stored in a list. length of my numpy arrays are not the same. I have several huge numpy arrays stored as list. This is my data (for simplicity I copied only a small list of array):
elements= [array([[971, 466, 697, 1, 15, 18, 28],
[5445, 4, 301, 2, 12, 47, 5]]),
array([[5883, 316, 377, 2, 9, 87, 1]])]
Then, I want to replace the fourth column of each row with the last one and then delete the last column. I want to have the following result:
[array([[971, 466, 697, 1, 28, 18],
[5445, 4, 301, 2, 5, 47]]),
array([[5883, 316, 377, 2, 1, 87]])]
I tried the following code but it was not successful:
length=[len(i) for i in elements] # To find the length of each array
h=sum(length) # to find the total number of rows
for i in range (h):
elements[:,[4,-1]] = elements[:,[-1,4]]
elements=np.delete(elements,[-1],1)
I am facing the following error:
TypeError: list indices must be integers or slices, not tuple
I appreciate ay help in advance.
You can do it without loops but it's still slower (1.75 times on large data) than accepted solution:
counts = list(map(len, elements))
arr = np.concatenate(elements)
arr[:, 4] = arr[:, -1]
new_elements = np.split(arr[:,:-1], np.cumsum(counts)[:-1])
Concatenation is quite slow in numpy.
A simple inefficient solution:
import numpy as np
elements= [np.array([[971, 466, 697, 1, 15, 18, 28],
[5445, 4, 301, 2, 12, 47, 5]]),
np.array([[5883, 316, 377, 2, 9, 87, 1]])]
new_elements = list()
for arr in elements:
arr[:, 4] = arr[:, -1]
new_elements.append(arr[:, :-1])
The new list output is:
new_elements
Out[11]:
[array([[ 971, 466, 697, 1, 28, 18],
[5445, 4, 301, 2, 5, 47]]),
array([[5883, 316, 377, 2, 1, 87]])]
Try this one
p=[]
for x in range(len(elements)):
for y in range(len(elements[x])):
p.append(list(elements[x][y][:4])+[elements[x][y][-1]]+[elements[x][y][-2]])
print(p)
[[971, 466, 697, 1, 28, 18],
[5445, 4, 301, 2, 5, 47],
[5883, 316, 377, 2, 1, 87]]

Cant seem to flatten numpy array

I have a numpy array which when print-ed looks like this:
print(a.shape)
(21,)
print(a)
[array([8.55570588e+03, 4.23078573e+05, 2.81254715e+07, 2.10356201e+09,
4.24558286e+05, 2.10032147e+07, 1.39638949e+09, 1.04453957e+11,
2.81593475e+07, 1.39354786e+09, 9.26480296e+10, 6.92992796e+12,
2.10047682e+09, 1.03982525e+11, 6.91296507e+12, 5.17021191e+14])
array([8.55404706e+03, 4.23328400e+05, 2.80891690e+07, 2.09651453e+09,
4.23874124e+05, 2.09628073e+07, 1.39044370e+09, 1.03745119e+11,
2.81060928e+07, 1.38935279e+09, 9.21288996e+10, 6.87207671e+12,
2.09626303e+09, 1.03584989e+11, 6.86712650e+12, 5.12107449e+14])
array([6.71569608e+03, 3.32364057e+05, 2.20526342e+07, 1.64564735e+09,
3.32826578e+05, 1.64539763e+07, 1.09116635e+09, 8.13888141e+10,
2.20612069e+07, 1.08976996e+09, 7.22409501e+10, 5.38629510e+12,
1.64474898e+09, 8.11907944e+10, 5.38026989e+12, 4.01021156e+14])
array([ 97, 120, 147, 106, 115, 151, 300, 268, 326, 454, 684,
1594, 2202, 2229, 1205, 2])
array([ 1, 0, 0, 0, 0, 1, 0, 1, 0, 2, 1,
11, 359, 1355, 3921, 4348])
array([ 1, 0, 0, 1, 0, 0, 6, 11, 31, 644, 2312,
3046, 3618, 321, 7, 2])
625.0 625.0 625.0 537178.875 1874648.75 1373895.875 1.275734191674592
2.066594119913508 1.6749058704798478 0.11276410212887233 2.55304393588347
1.1167704949278905 2.177796835501801 1.1323869527951895
1.3940068452456151]
Ideally I would want all of these values in one big array of length (3*16 + 3*16 + 15)
np.concatenate did not work, flatten also did bring the desired result.
One simple way is to use np.hstack in order to flatten list of array and float. Example usage is as follows:
import numpy as np
a = [np.array([1, 2, 3]), np.array([4, 5, 6]), 7, 8, 9]
np.hstack(a)
>> array([1, 2, 3, 4, 5, 6, 7, 8, 9])

How to get list of all possible sums of n*m matrix rows

I have this 4x10 (nxm) data matrix in csv:
1, 5, 19, 23, 7, 51, 18, 20, 35, 41
15, 34, 17, 8, 11, 93, 13, 46, 3, 10
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
10, 9, 8, 7, 6, 5, 4, 3, 2, 1
First, I try to get a list of all possible sums from the first n/2 rows. With remaining last n/2 rows I do the same.
Under all possible sums of first rows I mean the following:
Example:
Row 1: 1, 2, 3
Row 2: 3, 2, 1
All possible sums list: 1 + [3, 2, 1]; 2 + [3, 2, 1]; 3 + [3, 2, 1]
Final list: [4, 3, 2, 5, 4, 3, 6, 5, 4]
(At the moment I do not want to remove duplicates)
For my logic I have this code:
import csv
def loadCsv(filename):
lines = csv.reader(open(filename, "rb"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
data = loadCsv('btest2.txt')
divider = len(data)/2
firstPossibleSumsList = []
secondPossibleSumsList = []
#Possible sum list for the first n/2 rows:
for i in range(len(data[0])):
for j in range(len(data[0])):
firstPossibleSumsList.append(data[0][i] + data[1][j])
#Possible sum list for the last n/2 rows:
for i in range(len(data[0])):
for j in range(len(data[0])):
secondPossibleSumsList.append(data[2][i] + data[3][j])
The problem is that I divided rows manually by using data[0][i], data[1][i], data[2][i] and so on. I want to do it more efficiently and by involving divider variable, but I can't figure out how. In my code I depend on integers 0, 1, 2, 3, but I wanted to split matrix rows into halves regardless of matrix dimensions.
One option is to think of it as a sum of a vector and transposed vector. Then you could do:
import numpy as np
data = np.array(loadCsv('btest2.txt'))
firstPossibleSumsArray = (data[0,:,np.newaxis] + data[1]).flatten()
#output for first two columns:
array([ 15, 34, 17, 8, 11, 93, 13, 46, 3, 10, 75,
170, 85, 40, 55, 465, 65, 230, 15, 50, 285, 646,
323, 152, 209, 1767, 247, 874, 57, 190, 345, 782, 391,
184, 253, 2139, 299, 1058, 69, 230, 105, 238, 119, 56,
77, 651, 91, 322, 21, 70, 765, 1734, 867, 408, 561,
4743, 663, 2346, 153, 510, 270, 612, 306, 144, 198, 1674,
234, 828, 54, 180, 300, 680, 340, 160, 220, 1860, 260,
920, 60, 200, 525, 1190, 595, 280, 385, 3255, 455, 1610,
105, 350, 615, 1394, 697, 328, 451, 3813, 533, 1886, 123,
410])
The last flatten is to turn it from a 10x10 array to a 100x1 array, which should not be necessary.
Downside of using arrays is that they are not as flexible when it comes to resizing/appending data.
Edit:
The full code could be something like:
div = int(data.shape[0])
row_len_squared = int(data.shape[1]**2)
firstPossibleSumsArray = np.empty( int((div*(div-1))/2 * row_len_squared), dtype=int )
idx = 0
for row in range(div):
for col in range(row+1,div):
firstPossibleSumsArray[idx:idx+row_len_squared] = \
(data[row,:,np.newaxis] + data[col]).flatten()
idx += row_len_squared
#reapeat process for second possible sums array by replacing the range
#in the first loop from range(div) to range(div,2*div)
This will go through each row, and sum it with the remaining rows in matrix half (row #1 + row #2, ..., row #1 + row #n, row #2 + row #3 etc.)

Categories