Efficiently iterate over nested lists to find sum - python

I have an array of arrays and want to check if the sum equals 40. The problem is that the array has around 270,000,000 elements and doing in sequentially is out of the picture. The problem that I am having is finding the sums in a reasonable amount of time. I have ran this program overnight and it is still running in the morning. How can I make this program more efficient and run decently fast?
Here is my code so far:
import numpy as np
def cartesianProduct(arrays):
la = arrays.shape[0]
arr = np.empty([la] + [a.shape[0] for a in arrays], dtype="int32")
for i, a in enumerate(np.ix_(*arrays)):
arr[i, ...] = a
return arr.reshape(la, -1).T
rows = np.array(
[
[2, 15, 23, 19, 3, 2, 3, 27, 20, 11, 27, 10, 19, 10, 13, 10],
[22, 9, 5, 10, 5, 1, 24, 2, 10, 9, 7, 3, 12, 24, 10, 9],
[16, 0, 17, 0, 2, 0, 2, 0, 10, 0, 15, 0, 6, 0, 9, 0],
[11, 27, 14, 5, 5, 7, 8, 24, 8, 3, 6, 15, 22, 6, 1, 1],
[10, 0, 2, 0, 22, 0, 2, 0, 17, 0, 15, 0, 14, 0, 5, 0],
[1, 6, 10, 6, 10, 2, 6, 10, 4, 1, 5, 5, 4, 8, 6, 3],
[6, 0, 13, 0, 3, 0, 3, 0, 6, 0, 10, 0, 10, 0, 10, 0],
],
dtype="int32",
)
product = cartesianProduct(rows)
combos = []
for row in product:
if sum(row) == 40:
combos.append(row)
print(combos)

I believe what you are trying to do is called NP-hard. Look into "dynamic programming" and "subset sum"
Examples:
https://www.geeksforgeeks.org/subset-sum-problem-dp-25/
https://www.techiedelight.com/subset-sum-problem/

As suggested in the comments one way to optimize this is to check if the sum of a sub array already exceeds your threshold (40 in this case). and as another optimization to this you can even sort the arrays incrementally from largest to smallest.
Check heapq.nlargest() for incremental partial sorting.

Related

Filtering nested lists with python conditions

how are you?
I have a distance matrix and need to perform a filter based on another list before applying some functions.
The matrix has 10 elements that represent machines and the distances between them, I need to filter this list by getting only the distances between some chosen machines.
matrix = [[0, 1, 3, 17, 24, 12, 18, 16, 17, 15],
[1, 0, 2, 2, 5, 6, 13, 11, 12, 10],
[3, 2, 0, 1, 6, 12, 18, 12, 17, 15],
[17, 2, 1, 0, 3, 12, 17, 15, 16, 14],
[24, 5, 6, 3, 0, 1, 24, 22, 23, 21],
[12, 6, 12, 12, 1, 0, 12, 10, 11, 9],
[18, 13, 18, 17, 24, 12, 0, 3, 4, 5],
[16, 11, 12, 15, 22, 10, 3, 0, 1, 2],
[17, 12, 17, 16, 23, 11, 4, 1, 0, 1],
[15, 10, 15, 14, 21, 9, 5, 2, 1, 0]]
The list used for filtering, for example, is:
filter_list = [1, 2, 7, 10]
The idea is to use this list to filter the rows and the indices of the sublists to get the final matrix:
final_matrix = [[0, 1, 18, 15],
[1, 0, 13, 10],
[18, 13, 0, 5],
[15, 10, 5, 0]]
It is worth noting that the filter list elements vary. Can someone please help me?
That's what I tried:
final_matrix = []
for i in range(0, len(filter_list)):
for j in range(0,len(filter_list[i])):
a = filter_list[i][j]
final_matrix .append(matrix[a-1])
print(final_matrix)
This is because the filter_list can have sublists. I get it:
final_matrix = [[0, 1, 3, 17, 24, 12, 18, 16, 17, 15],
[1, 0, 2, 2, 5, 6, 13, 11, 12, 10],
[18, 13, 18, 17, 24, 12, 0, 3, 4, 5],
[15, 10, 15, 14, 21, 9, 5, 2, 1, 0]]
I could not remove the spare elements.
You forgot to filter by column ids. You can do this using nested list comprehensions.
final_matrix = [[matrix[row-1][col-1] for col in filter_list] for row in filter_list]
final_matrix = []
for i in filter_list:
to_append = []
for j in filter_list:
to_append.append(matrix[i-1][j-1])
final_matrix.append(to_append)
or with list comprehension
final_matrix = [[matrix[i-1][j-1] for j in filter_list] for i in filter_list]

Get the unique value of the different part of an array

I have an array with two rows, each rows repeated 4 columns.
a = np.array([[ 0, 0, 0, 0, 4, 4, 4, 4, 7, 7, 7, 7, 1, 1, 1, 1],
[ 10, 10, 10, 10, 14, 14, 14, 14, 17, 17, 17, 17, 21, 21, 21, 21]])
I want to consider one value for 4 columns. For example, 0 for the 4 columns of the first row. I can not use the unique(), The output of a is:
b = np.array([[ 0,4, 7, 1],
[ 10,14, 17, 21]])
You can simply take every 4th column like so:
>>> a = np.array([[ 0, 0, 0, 0, 4, 4, 4, 4, 7, 7, 7, 7, 1, 1, 1, 1],
... [ 10, 10, 10, 10, 14, 14, 14, 14, 17, 17, 17, 17, 21, 21, 21, 21]])
>>> a[:,::4]
array([[ 0, 4, 7, 1],
[10, 14, 17, 21]])
For more info, see numpy slicing.
You can remove duplicates in a row
def remove_duplicates(arr):
"""
remove duplicates in a row from array
"""
if len(arr) == 0:
return arr
else:
i = 0
while i < len(arr) - 1:
if arr[i] == arr[i + 1]:
del arr[i]
else:
i += 1
return arr
print(remove_duplicates([0,0,0,0,1,1,1,1,0,0,0,0]))
[0, 1, 0]
print(remove_duplicates([0,0,0,0,4,4,4,4,7,7,7,7,1,1,1,1]))
[0, 4, 7, 1]
Use np.apply_along_axis, which applies a method across each row:
>>> np.apply_along_axis(lambda x: x[::4], axis=1, arr=a)
array([[ 0, 4, 7, 1],
[10, 14, 17, 21]])
Here, the function we pass in just takes every 4th element of the row (this assumes 4 is always static).
You could use itertools.groupby:
>>> import numpy as np
>>> from itertools import groupby
>>> a = np.array([[0, 0, 0, 0, 4, 4, 4, 4, 7, 7, 7, 7, 1, 1, 1, 1], [10, 10, 10, 10, 14, 14, 14, 14, 17, 17, 17, 17, 21, 21, 21, 21]])
>>> a
array([[ 0, 0, 0, 0, 4, 4, 4, 4, 7, 7, 7, 7, 1, 1, 1, 1],
[10, 10, 10, 10, 14, 14, 14, 14, 17, 17, 17, 17, 21, 21, 21, 21]])
>>> b = np.array([[k for k, _ in groupby(arr)] for arr in a])
>>> b
array([[ 0, 4, 7, 1],
[10, 14, 17, 21]])

Broadcasting 2D array in specific columns in Python

I have an array like this:
A = np.array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]])
What I want to do is add 1 to each value in the first and last column. I want to understand broadcasting (avoid loops), by using this and appropriate vector, but I have tried but it doesn't work. Expected results:
A = np.array([[ 2, 2, 3, 4, 6],
[ 7, 7, 8, 9, 11],
[12, 12, 13, 14, 16],
[17, 17, 18, 19, 21]])
You can use numpy indexing to do this. Try this:
# 0 is the first and -1 is the last column
A[:,[0,-1]] = A[:,[0,-1]]+1
Or
A[:,(0,-1)] = A[:,(0,-1)]+1
Or
A[:,[0,-1]]+=1
Or
A[:,(0,-1)]+=1
Output in either case:
array([[ 2, 2, 3, 4, 6],
[ 7, 7, 8, 9, 11],
[12, 12, 13, 14, 16],
[17, 17, 18, 19, 21]])
You can use vector [1,0,0,0,1] and python will do broadcasting for you.
b = np.array([1,0,0,0,1])
A + b
array([[ 2, 2, 3, 4, 6],
[ 7, 7, 8, 9, 11],
[12, 12, 13, 14, 16],
[17, 17, 18, 19, 21]])
If you would like to know how broadcasting works, you can simply try to broadcast once by yourself.
b = np.array([1,0,0,0,1])
B = np.tile(b,(A.shape[0],1))
array([[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1]])
A + B
Same result.

Zero pad array based on other array's shape

I've got K feature vectors that all share dimension n but have a variable dimension m (n x m). They all live in a list together.
to_be_padded = []
to_be_padded.append(np.reshape(np.arange(9),(3,3)))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
to_be_padded.append(np.reshape(np.arange(18),(3,6)))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
to_be_padded.append(np.reshape(np.arange(15),(3,5)))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I am looking for is a smart way to zero pad the rows of these np.arrays such that they all share the same dimension m. I've tried solving it with np.pad but I have not been able to come up with a pretty solution. Any help or nudges in the right direction would be greatly appreciated!
The result should leave the arrays looking like this:
array([[0, 1, 2, 0, 0, 0],
[3, 4, 5, 0, 0, 0],
[6, 7, 8, 0, 0, 0]])
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
array([[ 0, 1, 2, 3, 4, 0],
[ 5, 6, 7, 8, 9, 0],
[10, 11, 12, 13, 14, 0]])
You could use np.pad for that, which can also pad 2-D arrays using a tuple of values specifying the padding width, ((top, bottom), (left, right)). For that you could define:
def pad_to_length(x, m):
return np.pad(x,((0, 0), (0, m - x.shape[1])), mode = 'constant')
Usage
You could start by finding the ndarray with the highest amount of columns. Say you have two of them, a and b:
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
b = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
m = max(i.shape[1] for i in [a,b])
# 5
And then use this parameter to pad the ndarrays:
pad_to_length(a, m)
array([[0, 1, 2, 0, 0],
[3, 4, 5, 0, 0],
[6, 7, 8, 0, 0]])
I believe there is no very efficient solution for this. I think you will need to loop over the list with a for loop and treat every array individually:
for i in range(len(to_be_padded)):
padded = np.zeros((n, maxM))
padded[:,:to_be_padded[i].shape[1]] = to_be_padded[i]
to_be_padded[i] = padded
where maxM is the longest m of the matrices in your list.

Accessing elements in array Python

This is my first time handling multidimensional arrays and I'm having problems accessing elements. I'm trying to get the red pixels of a picture but just the first 8 elements within the array. Here's the code
import Image
import numpy as np
im = Image.open("C:\Users\Jones\Pictures\1.jpg")
pix = im.load()
r, g, b = np.array(im).T
print r[0:8]
Since you're dealing with images, r is a 2-D array. To get the first 8 pixels in the image, try
r.flatten()[:8]
This will wrap around automatically if the first row has less than 8 pixels.
do you want all rows too? Try this r[:,:8]
only want the first row? Try this r[0,:8]
You can do it like this:
r[0][:8]
Note, however, that this will not work if the first row has less than 8 pixels. To fix that, do this:
from itertools import chain
r = list(chain.from_iterable(r))
r[:8]
or (if you don't want to import an entire module):
r = [val for element in r for val in element]
r[:8]
I think it could be more simple. This example uses a random matrix (this will be your r matrix):
In [7]: from pylab import * # convention
In [8]: r = randint(0,10,(10,10)) # this is your image
In [9]: r
array([[7, 9, 5, 5, 6, 8, 1, 4, 3, 4],
[5, 4, 4, 4, 2, 6, 2, 6, 4, 2],
[1, 4, 9, 9, 2, 6, 1, 9, 0, 6],
[5, 9, 0, 7, 9, 9, 5, 2, 0, 7],
[8, 3, 3, 9, 0, 0, 5, 9, 2, 2],
[5, 3, 7, 8, 8, 1, 6, 3, 2, 0],
[0, 2, 5, 7, 0, 1, 0, 2, 1, 2],
[4, 0, 4, 5, 9, 9, 3, 8, 3, 7],
[4, 6, 9, 9, 5, 9, 3, 0, 5, 1],
[6, 9, 9, 0, 3, 4, 9, 7, 9, 6]])
Then, extract first 8 columns and do something
In [17]: r_8 = r[:,:8] # extract columns
In [18]: r_8
Out[18]:
array([[7, 9, 5, 5, 6, 8, 1, 4],
[5, 4, 4, 4, 2, 6, 2, 6],
[1, 4, 9, 9, 2, 6, 1, 9],
[5, 9, 0, 7, 9, 9, 5, 2],
[8, 3, 3, 9, 0, 0, 5, 9],
[5, 3, 7, 8, 8, 1, 6, 3],
[0, 2, 5, 7, 0, 1, 0, 2],
[4, 0, 4, 5, 9, 9, 3, 8],
[4, 6, 9, 9, 5, 9, 3, 0],
[6, 9, 9, 0, 3, 4, 9, 7]])
In [19]: r_8 = r_8 * 2 # do something
In [20]: r_8
Out[20]:
array([[14, 18, 10, 10, 12, 16, 2, 8],
[10, 8, 8, 8, 4, 12, 4, 12],
[ 2, 8, 18, 18, 4, 12, 2, 18],
[10, 18, 0, 14, 18, 18, 10, 4],
[16, 6, 6, 18, 0, 0, 10, 18],
[10, 6, 14, 16, 16, 2, 12, 6],
[ 0, 4, 10, 14, 0, 2, 0, 4],
[ 8, 0, 8, 10, 18, 18, 6, 16],
[ 8, 12, 18, 18, 10, 18, 6, 0],
[12, 18, 18, 0, 6, 8, 18, 14]])
Now, this is the trick. Replace the first 8 columns in r using hstack:
In [21]: r = hstack((r_8, r[:,8:])) # it replaces the FISRT 8 columns, note the indexing notation
In [22]: r
Out[22]:
array([[14, 18, 10, 10, 12, 16, 2, 8, 3, 4], # it does not touch the last 2 columns
[10, 8, 8, 8, 4, 12, 4, 12, 4, 2],
[ 2, 8, 18, 18, 4, 12, 2, 18, 0, 6],
[10, 18, 0, 14, 18, 18, 10, 4, 0, 7],
[16, 6, 6, 18, 0, 0, 10, 18, 2, 2],
[10, 6, 14, 16, 16, 2, 12, 6, 2, 0],
[ 0, 4, 10, 14, 0, 2, 0, 4, 1, 2],
[ 8, 0, 8, 10, 18, 18, 6, 16, 3, 7],
[ 8, 12, 18, 18, 10, 18, 6, 0, 5, 1],
[12, 18, 18, 0, 6, 8, 18, 14, 9, 6]])
EDIT: as to what DSM pointed out, OP is infact using a numpy array.
i retract my answer as nneonneo's correct

Categories