Calculation of items in array by Python - python

I have some data looks like in this format
2,3,4
3,4,5
5,6,7
I pack the array as:
with open('house_price_data.txt') as data:
substrings = data.read().split()
array = [map(int, substring.split(',')) for substring in substrings]
My task is to do some calculation like this for each data in the set:
(2-3)**2 + (3-3)**2 + (5-3)**2
(3-4)**2 + (4-4)**2 + (5-4)**2
My expected answer is C1 = 5 and C2 = 2
I wrote a code like this
for [a for a, b, c in array] in range (0,2):
C1 = (([a for a, b, c in array]) - 3)**2
C2 = (([b for a, b, c in array]) - 4)**2
But it is not working. For the purpose of for loop, I think it will read the data 2,3,5 one by one minus 3 and square the result one by one and sum the total results. So how can I improve it?
A part from that, I also have problems with this code
[a for a, b, c in array]
[b for a, b, c in array]
[c for a, b, c in array]
I need to call array many times with this code with item a, b and c of the array in the program, when I have such codes in the program error massage come
not enough values to unpack (expected 3, got 0)
How can I do to make changes?

This question is unclear and probably destined for oblivion, but if I understand correctly, which is far from certain, you are trying to do something like this.
array = [[2, 3, 5], [3, 4, 5], [5, 6, 7]]
#initialize the variables C1 and C2
C1 = 0
C2 = 0
#iterate the elements of the FIRST list in your list
#so 2,3,5 (I assume you have indicated 2,3,4 by mistake)
for element in array[0]:
C1+=(element-3)**2
#iterate the elements of the SECOND list in your list
#so 3,4,5
for element in array[1]:
C2+=(element-4)**2
print("C1 =", C1)
print("C2 =", C2)
Output:
C1 = 5
C2 = 2
But your example is ambiguous. Maybe 2,3,5 are the first elements in each sublist ? In this case, the logic is the same.
#iterate the FIRST element in each sublist in your list
for element in array:
C1+=(element[0]-3)**2
If that's what you want to do, then it's best for you to do it like that, with classic loops. List comprehensions (things like [x for x in array if ...]) are shortcuts for advanced Python programmers. They do exactly the same thing, but are less clear and more error prone.

If you have array = [[2,3,4],[3,4,5],[5,6,7]], then you want a = [2,3,5], then that would be
a = [x[0] for x in array]
Otherwise, array[0] is [2,3,4] and you can instead do
a, b, c = array
To unpack the 2D array.
Sidenote: you seem to have a CSV file, so I would strongly suggest using Pandas and Numpy for your numerical calculations

Related

Minimize sum of product of two uneven consecutive arrays

I've got an optimization problem in which I need to minimize the sum product of two uneven but consecutive arrays, say:
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
Shuffling of values is not allowed i.e. the index of the arrays must remain the same.
In other words, it is a distribution minimization of array A over array B in consecutive order.
Or: Given that len(B)>=len(A) Minimize the sum product the values of Array A of length n over n values of array B without changing the order of array A or B.
In this case, the minimum would be:
min_sum = 1*4 + 2*3 + 3*2 = 16
A brute force approach to this problem would be:
from itertools import combinations
sums = [sum(a*b for a,b in zip(A,b)) for b in combinations(B,len(A))]
min_sum = min(sums)
I need to do this for many sets of arrays however. I see a lot of overlap with the knapsack problem and I have the feeling that it should be solved with dynamic programming. I am stuck however in how to write an efficient algorithm to perform this.
Any help would be greatly appreciated!
Having two lists
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
the optimal sum product can be found using:
min_sum = sum(a*b for a,b in zip(sorted(A), sorted(B)[:len(A)][::-1]))
In case A is always given sorted, this simplified version can be used:
min_sum = sum(a*b for a,b in zip(A, sorted(B)[:len(A)][::-1]))
The important part(s) to note:
You need factors of A sorted. sorted(A) will do this job, without modifying the original A (in contrast to A.sort()). In case A is already given sorted, this step can be left out.
You need the N lowest values from B, where N is the length of A. This can be done with sorted(B)[:len(A)]
In order to evaluate the minimal sum of products, you need to multiply the highest number of A with the lowest of B, the second hightst of A with the second lowest of B. That is why after getting the N lowest values of B the order gets reversed with [::-1]
Output
print(min_sum)
# 16
print(A)
# [1, 2, 3] <- The original list A is not modified
print(B)
# [4, 9, 5, 3, 2, 10] <- The original list B is not modified
With Python, you can easily sort and flip sets. The code you are looking for is
A, B = sorted(A), sorted(B)[:len(A)]
min_sum = sum([a*b for a,b in zip(A, B[::-1])])
You may need to get the values one by one from B, and keep the order of the list by having each value assigned to a key.
A = [1, 3, 2]
B = [4, 9, 5, 3, 2, 10]
#create a new dictionary with key value pairs of B array values
new_dict = {}
j=0
for k in B:
new_dict[j] = k
j+= 1
#create a new list of the smallest values in B up to length of array A
min_Bmany =[]
for lp in range(0,len(A)):
#get the smallest remaining value from dictionary new_dict
rmvky= min(zip(new_dict.values(), new_dict.keys()))
#append this item to minimums list
min_Bmany.append((rmvky[1],rmvky[0]))
#delete this key from the dictionary new_dict
del new_dict[rmvky[1]]
#sort the list by the keys(instead of the values)
min_Bmany.sort(key=lambda r: r[0])
#create list of only the values, but still in the same order as they are in original array
min_B =[]
for z in min_Bmany:
min_B.append(z[1])
print(A)
print(min_B)
ResultStr = ""
Result = 0
#Calculate the result
for s in range(0,len(A)):
ResultStr = ResultStr + str(A[s]) +"*" +str(min_B[s])+ " + "
Result = Result + A[s]*min_B[s]
print(ResultStr)
print("Result = ",Result)
The output will be as follows:
A = [1, 3, 2]
B = [4, 9, 5, 3, 2, 10]
1*4 + 3*3 + 2*2 +
Result = 17
Then change the A, and the output becomes:
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
1*4 + 2*3 + 3*2 +
Result = 16
Not sure if this is helpful, but anyway.
This can be formulated as a mixed-integer programming (MIP) problem. Basically, an assignment problem with some side constraints.
min sum((i,j),x(i,j)*a(i)*b(j))
sum(j, x(i,j)) = 1 ∀i "each a(i) is assigned to exactly one b(j)"
sum(i, x(i,j)) ≤ 1 ∀j "each b(j) can be assigned to at most one a(i)"
v(i) = sum(j, j*x(i,j)) "position of each a(i) in b"
v(i) ≥ v(i-1)+1 ∀i>1 "maintain ordering"
x(i,j) ∈ {0,1} "binary variable"
v(i) ≥ 1 "continuous (or integer) variable"
Example output:
---- 40 VARIABLE z.L = 16.000
---- 40 VARIABLE x.L assign
j1 j4 j5
i1 1.000
i2 1.000
i3 1.000
---- 40 VARIABLE v.L position of a(i) in b
i1 1.000, i2 4.000, i3 5.000
Cute little MIP model.
Just as an experiment I generated a random problem with len(a)=50 and len(b)=500. This leads to a MIP with 650 rows and 25k columns. Solved in 50 seconds (to proven global optimality) on my slow laptop.
It turns out using a shortest path algorithm on a direct graph is pretty fast. Erwin did a post showing a MIP model. As you can see in the comments section there, a few of us independently tried shortest path approaches, and on examples with 100 for the length of A and 1000 for the length of B we get optimal solutions in the vicinity of 4 seconds.
The graph can look like:
Nodes are labeled n(i,j) indicating that visiting the node means assigning a(i) to b(j). The costs a(i)*b(j) can be associated with any incoming (or any outgoing) arc. After that calculate the shortest path from src to snk.
BTW can you tell a bit about the background of this problem?

Appending an integer and a list to a list of lists as a single list

I am trying to append an integer and a list of numbers to a list a.
a = 4
b = [3,1]
c = []
c.append([a,b])
The list c is [[4,[3,1]]] but instead, I wanted to append a single list so c should be [[4,3,1]]. It is difficult to use the extend function in this case because the length of c and the index to which the integer and list need to be appended are variable. Also, this operation of appending a list and integer needs to be carried out with several lists of different lengths. Can someone explain how this can be achieved?
This is one simple way to do it, as long as the number of lists and integers you are planning to append is not large,
a = 4
b = [3,1]
b2 = [5,6,7]
b3 = [2]
b.insert(0,a)
c = []
c.append([x for x in b])
c[0] += b2
c[0] += b3
print(c)
Here I extended your example to cover two additional list. The code should print:
[[4, 3, 1, 5, 6, 7, 2]]
Basically, you prepend the integer to the first list, b. Then you use a list comprehension to create a sublist out of the new b in c. Finally, you simply concatenate all the other lists to it with +=. c[0] is the location of the inner - target list.
This method is not very efficient, and is pretty "manual", but again, it seems suitable for small number of lists with not too many elements.
a = 4
b = [3,1]
c = []
for i in b:
c.append(i)
c.append(a)
print(c)
[3, 1, 4]
this is work it.

Optimize testing all combinations of rows from multiple NumPy arrays

I have three NumPy arrays of ints, same number of columns, arbitrary number of rows each. I am interested in all instances where a row of the first one plus a row of the second one gives a row of the third one ([3, 1, 4] + [1, 5, 9] = [4, 6, 13]).
Here is a pseudo-code:
for i, j in rows(array1), rows(array2):
if i + j is in rows(array3):
somehow store the rows this occured at (eg. (1,2,5) if 1st row of
array1 + 2nd row of array2 give 5th row of array3)
I will need to run this for very big matrices so I have two questions:
(1) I can write the above using nested loops but is there a quicker way, perhaps list comprehensions or itertools?
(2) What is the fastest/most memory-efficient way to store the triples? Later I will need to create a heatmap using two as coordinates and the first one as the corresponding value eg. point (2,5) has value 1 in the pseudo-code example.
Would be very grateful for any tips - I know this sounds quite simple but it needs to run fast and I have very little experience with optimization.
edit: My ugly code was requested in comments
import numpy as np
#random arrays
A = np.array([[-1,0],[0,-1],[4,1], [-1,2]])
B = np.array([[1,2],[0,3],[3,1]])
C = np.array([[0,2],[2,3]])
#triples stored as numbers with 2 coordinates in a otherwise-zero matrix
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
if np.array_equal((A[i,] + B[j,]), C[k,]):
output_matrix[j, k] = i+1
print(output_matrix)
We can leverage broadcasting to perform all those summations and comparison in a vectorized manner and then use np.where on it to get the indices corresponding to the matching ones and finally index and assign -
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
mask = ((A[:,None,None,:] + B[None,:,None,:]) == C).all(-1)
I,J,K = np.where(mask)
output_matrix[J,K] = I+1
(1) Improvements
You can use sets for the final result in the third matrix, as a + b = c must hold identically. This already replaces one nested loop with a constant-time lookup. I will show you an example of how to do this below, but we first ought to introduce some notation.
For a set-based approach to work, we need a hashable type. Lists will thus not work, but a tuple will: it is an ordered, immutable structure. There is, however, a problem: tuple addition is defined as appending, that is,
(0, 1) + (1, 0) = (0, 1, 1, 0).
This will not do for our use-case: we need element-wise addition. As such, we subclass the built-in tuple as follows,
class AdditionTuple(tuple):
def __add__(self, other):
"""
Element-wise addition.
"""
if len(self) != len(other):
raise ValueError("Undefined behaviour!")
return AdditionTuple(self[idx] + other[idx]
for idx in range(len(self)))
Where we override the default behaviour of __add__. Now that we have a data-type amenable to our problem, let's prepare the data.
You give us,
A = [[-1, 0], [0, -1], [4, 1], [-1, 2]]
B = [[1, 2], [0, 3], [3, 1]]
C = [[0, 2], [2, 3]]
To work with. I say,
from types import SimpleNamespace
A = [AdditionTuple(item) for item in A]
B = [AdditionTuple(item) for item in B]
C = {tuple(item): SimpleNamespace(idx=idx, values=[])
for idx, item in enumerate(C)}
That is, we modify A and B to use our new data-type, and turn C into a dictionary which supports (amortised) O(1) look-up times.
We can now do the following, eliminating one loop altogether,
from itertools import product
for a, b in product(enumerate(A), enumerate(B)):
idx_a, a_i = a
idx_b, b_j = b
if a_i + b_j in C: # a_i + b_j == c_k, identically
C[a_i + b_j].values.append((idx_a, idx_b))
Then,
>>>print(C)
{(2, 3): namespace(idx=1, values=[(3, 2)]), (0, 2): namespace(idx=0, values=[(0, 0), (1, 1)])}
Where for each value in C, you get the index of that value (as idx), and a list of tuples of (idx_a, idx_b) whose elements of A and B together sum to the value at idx in C.
Let us briefly analyse the complexity of this algorithm. Redefining the lists A, B, and C as above is linear in the length of the lists. Iterating over A and B is of course in O(|A| * |B|), and the nested condition computes the element-wise addition of the tuples: this is linear in the length of the tuples themselves, which we shall denote k. The whole algorithm then runs in O(k * |A| * |B|).
This is a substantial improvement over your current O(k * |A| * |B| * |C|) algorithm.
(2) Matrix plotting
Use a dok_matrix, a sparse SciPy matrix representation. Then you can use any heatmap-plotting library you like on the matrix, e.g. Seaborn's heatmap.

Count the element of array inside a list

I have arrays inside list and I want to count the number of element in an array from two different lists instead of counting the list items.
code
import numpy as np
def count_total(a,b):
#count the total number of element for two arrays in different list
x,y=len(a),len(b)
result=[]
for a1 in a:
for b2 in b:
result.append(x+y)
return result
a=[np.array([2,2,1,2]),np.array([1,3])]
b=[np.array([4,2,1])]
c=[np.array([1,2]),np.array([4,3])]
print(count_total(a,b))
print(count_total(a,c))
print(count_total(b,c))
Actual output
[3, 3]
[4, 4, 4, 4]
[3, 3]
Desired output
[7,5]
[6,6,4,4]
[5,5]
Can anyone help ?
It looks to me from you examples you want all the possible ways to sum the length of the arrays. This can be achieved with itertools.product. Here is my code:
from itertools import product
def count_total(a,b):
return [sum(map(len, i)) for i in product(a, b)]
The product return all possible arrangements for one element each from a and b. Then for each arrangement, we take the len of the parts in the arrangement from each list, then add them together with sum.
Bug is in line 4, x and y are assigned list lengths rather than array lengths.
Replace the line 4-8
x,y=len(a),len(b)
result=[]
for a1 in a:
for b2 in b:
result.append(x+y)
with
y= lambda x:len(x)
result=[]
for a1 in a:
for b1 in b:
result.append(y(a1) + y(b1))

Calculating new entries in array based on entries from another array in python

I have a question based an how to "call" a specific cell in an array, while looping over another array.
Assume, there is an array a:
a = [[a1 a2 a3],[b1 b2 b3]]
and an array b:
b = [[c1 c2] , [d1 d2]]
Now, I want to recalculate the values in array b, by using the information from array a. In detail, each value of array b has to be recalculated by multiplication with the integral of the gauss-function between the borders given in array a. but for the sake of simplicity, lets forget about the integral, and assume a simple calculation is necessary in the form of:
c1 = c1 * (a2-a1) ; c2 = c2 * (a3 - a2) and so on,
with indices it might look like:
b[i,j] = b[i,j] * (a[i, j+1] - a[i,j])
Can anybody tell me how to solve this problem?
Thank you very much and best regards,
Marc
You can use zip function within a nested list comprehension :
>>> [[k*(v[1]-v[0]) for k,v in zip(v,zip(s,s[1:]))] for s,v in zip(a,b)]
zip(s,s[1:]) will gave you the desire pairs of elements that you want, for example :
>>> s =[4, 5, 6]
>>> zip(s,s[1:])
[(4, 5), (5, 6)]
Demo :
>>> b =[[7, 8], [6, 0]]
>>> a = [[1,5,3],[4 ,0 ,6]]
>>> [[k*(v[1]-v[0]) for k,v in zip(v,zip(s,s[1:]))] for s,v in zip(a,b)]
[[28, -16], [-24, 0]]
you can also do this really cleanly with numpy:
import numpy as np
a, b = np.array(a), np.array(b)
np.diff(a) * b
First I would split your a table in a table of lower bound and one of upper bound to work with aligned tables and improve readability :
lowerBounds = a[...,:-1]
upperBounds = a[...,1:]
Define the Gauss function you provided :
def f(x, gs_wdth = 1., mean=0.):
return 1./numpy.sqrt(2*numpy.pi)*gs_wdth * numpy.exp(-(x-mean)**2/(2*gs_wdth**2))
Then, use a nditer (see Iterating Over Arrays) to efficientely iterate over the arrays :
it = numpy.nditer([b, lowerBounds, upperBounds],
op_flags=[['readwrite'], ['readonly'], ['readonly']])
for _b, _lb, _ub in it:
multiplier = scipy.integrate.quad(f, _lb, _ub)[0]
_b[...] *= multiplier
print b
This does the job required in your post, and should be computationnaly efficient. Note that b in modified "in-place" : original values are lost but there is no memory overshoot during calculation.

Categories