Multiplying two arrays in python with different lenghts

Multiplying two arrays in python with different lenghts - python

I want to know if it's possible to solve this problem. I have this values:
yf = (0.23561643, 0.312328767, 0.3506849315, 0.3890410958, 0.4273972602, 0.84931506)
z = (4.10592285e-05, 0.0012005020, 0.00345332906, 0.006367483, 0.0089151571, 0.01109750, 0.01718827)
I want to use this function (Discount factor) but it's not going to work because of the different lenghts between z and yf.
def f(x):
res = 1/( 1 + x * yf)
return res
f(z)
output: ValueError: cannot evaluate a numeric op with unequal lengths
My question is that if it exists a way to solve this. The approximate output values are:
res = (0.99923, 0.99892, 0.99837, 0.99802, 0.99763, 0.99175)
Any help with this will be perfect and I want to thanks in advance to everyone who takes his/her time to read it or try to help.

Do you want array to broadcast to the whichever is the shorter? You can do this
def f(x):
leng = min(len(x), len(yf))
x = x[:leng]
new_yf = yf[:leng] # Don't want to modify global variable.
res = 1/( 1 + x * new_yf)
return res
and it should work.

Find the minimum length and iterate. Can also covert to numpy arrays and that would avoid a step of iteration
import numpy as np
yf = (0.23561643, 0.312328767, 0.3506849315, 0.3890410958, 0.4273972602, 0.84931506)
z = (4.10592285e-05, 0.0012005020, 0.00345332906, 0.006367483, 0.0089151571, 0.01109750, 0.01718827)
x=min(len(yf),len(z))
res = 1/( 1 + np.array(z[:x]) * np.array(yf[:x]))
using numpy.multiply
res = 1/( 1 + np.multiply(np.array(z[:x]),np.array(yf[:x])))

Related

How to speed up an N dimensional interval tree in python?

Consider the following problem: Given a set of n intervals and a set of m floating-point numbers, determine, for each floating-point number, the subset of intervals that contain the floating-point number.
This problem has been addressed by constructing an interval tree (or called range tree or segment tree). Implementations have been done for the one-dimensional case, e.g. python's intervaltree package. Usually, these implementations consider one or few floating-point numbers, namely a small "m" above.
In my problem setting, both n and m are extremely large numbers (from solving an image processing problem). Further, I need to consider the N-dimensional intervals (called cuboid when N=3, because I was modeling human brains with the Finite Element Method). I have implemented a simple N-dimensional interval tree in python, but it run in a loop and can only take one floating-point number at a time. Can anyone help improve the implementation in terms of efficiency? You can change data structure freely.
import sys
import time
import numpy as np
# find the index of a satisfying x > a in one dimension
def find_index_smaller(a, x):
idx = np.argsort(a)
ss = np.searchsorted(a, x, sorter=idx)
res = idx[0:ss]
return res
# find the index of a satisfying x < a in one dimension
def find_index_larger(a, x):
return find_index_smaller(-a, -x)
# find the index of a satisfing amin < x < amax in one dimension
def find_intv_at(amin, amax, x):
idx = find_index_smaller(amin, x)
idx2 = find_index_larger(amax[idx], x)
res = idx[idx2]
return res
# find the index of a satisfying amin < x < amax in N dimensions
def find_intv_at_nd(amin, amax, x):
dim = amin.shape[0]
res = np.arange(amin.shape[-1])
for i in range(dim):
idx = find_intv_at(amin[i, res], amax[i, res], x[i])
res = res[idx]
return res
I also have two test examples for sanity check and performance testing:
def demo1():
print ("By default, we do a correctness test")
n_intv = 2
n_point = 2
# generate the test data
point = np.random.rand(3, n_point)
intv_min = np.random.rand(3, n_intv)
intv_max = intv_min + np.random.rand(3, n_intv)*8
print ("point ")
print (point)
print ("intv_min")
print (intv_min)
print ("intv_max")
print (intv_max)
print ("===Indexes of intervals that contain the point===")
for i in range(n_point):
print (find_intv_at_nd(intv_min,intv_max, point[:, i]))
def demo2():
print ("Performance:")
n_points=100
n_intv = 1000000
# generate the test data
points = np.random.rand(n_points, 3)*512
intv_min = np.random.rand(3, n_intv)*512
intv_max = intv_min + np.random.rand(3, n_intv)*8
print ("point.shape = "+str(points.shape))
print ("intv_min.shape = "+str(intv_min.shape))
print ("intv_max.shape = "+str(intv_max.shape))
starttime = time.time()
for point in points:
tmp = find_intv_at_nd(intv_min, intv_max, point)
print("it took this long to run {} points, with {} interva: {}".format(n_points, n_intv, time.time()-starttime))
My idea would be:
Remove np.argsort() from the algo, because the interval tree does not change, so sorting could have been done in pre-processing.
Vectorize x. The algo runs a loop for each x. It would be nice if we can get rid of the loop over x.
Any contribution would be appreciated.

For loop Python- from Matlab

I am starting to code up in Python and I come from a Matlab background. I have a problem with a for loop that I am trying to do.
So this is my for loop from Matlab,
ix = indoor(1);
idx = indoor(2)-indoor(1);
%Initialize X apply I.C
X = [ix;idx];
for k=(1:1:287)
X(:,k+1) = Abest*X(:,k) + Bbest*outdoor(k+1) + B1best* (cbest4/cbest1);
end
In this code Abest is a 2x2 matrix, Bbest is a 2x1 matrix, outdoor is a 288x1 vector, B1best is a 2x1 matrix. The matricies are found from a function using the matrix expodential command. c4 and c1 are terms defined before, constants.
In Python I have been able to get the matrix exponential command to work in my function but I can't get that for loop to work.
Xo = np.array([[ix],[idx]])
num1 = range(0,276)
for k in num1:
Xo[:,k+1] = Ae*Xo[:,k] + Be*outdoor[k+1] + Be1*(c4/c1)
Again Ae,Be,Be1 are matrices of the same size just like the Matlab ones. Same thing for the outdoor vector.
I have tried everything I can think of to make it work... The only thing that worked for me was,
Xo = np.zeros(())
#Initial COnditions
ix = np.array(indoor[0])
idx = np.array(indoor[1]-indoor[0])
Xo = np.array([[ix],[idx]])
#Range for the for loop
num1 = range(0,1)
for k in num1:
Xo = Ae*Xo[k] + Be*outdoor[k+1] + Be1*(c4/c1)
Now, this thing will work but only give me two points. If I change the range I get an error. I'm assuming this code works because my original Xo is just two states so k goes through those two states but that's not what I want.
If anyone could help me out that would be very helpful! If I'm making some code error, it's honestly because I'm not understanding the 'For loop' in python to well when it comes to data analysis and having it loop through the rows and increment the columns. Thank you for your time.
Upon Request here is my full code:
import scipy.io as sc
import math as m
import numpy as np
import matplotlib.pyplot as plt
import sys
from scipy.linalg import expm, sinm, cosm
import pandas as pd
df = pd.read_excel('datatemp.xlsx')
outdoor = np.array(df[['Outdoor']])
indoor = np.array(df[['Indoor']])
###########################. FUNCTION DEFINE. #################################################
#Progress bar
def progress(count, total, status=''):
percents = round(100.0 * count / float(total), 1)
sys.stdout.write(' %s%s ...%s\r' % ( percents, '%', status))
sys.stdout.flush()
#Define Matrix for Model
def Matrixbuild(c1,c2,c3):
A = np.array([[0,1],[-c3/c1,-c2/c1]])
B = np.array([[0],[1/c1]])
B1 = np.array([[1],[0]])
C = np.zeros((2,2))
D = np.zeros((2,2))
F = np.array([[0,1,0,1],[-c3/c1,-c2/c1,1/c1,0],[0,0,0,0],[0,0,0,0]])
R = np.array(expm(F))
Ae = np.array([[R.item(0),R.item(1)],[R.item(4),R.item(5)]])
Be = np.array([[R.item(2)],[R.item(6)]])
Be1 = np.array([[R.item(3)],[R.item(7)]])
return Ae,Be,Be1;
###########################. Data. #################################################
#USED FOR JUST TRYING WITHOUT ACTUAL DATA
# outdoor = np.array([5.8115,4.394,5.094,5.1123,5.1224])
# indoor = np.array([15.595,15.2429,15.0867,14.9982,14.8993])
###########################. Model Define. #################################################
Xo = np.zeros((2,288))
ix = np.array(indoor[0])
idx = np.array(indoor[1])
err_min = m.inf
c1spam = np.linspace(0.05,0.001,30)
c2spam = np.linspace(6.2,6.5,30)
c3spam = np.linspace(7.1,7.45,30)
totalspam = len(c1spam)*len(c2spam)*len(c3spam)
ind = 0
for c1 in c1spam:
for c2 in c2spam:
for c3 in c3spam:
c4 = 1.1
#MatrixBuild Function
result = Matrixbuild(c1,c2,c3)
Ae,Be,Be1 = result
Xo = np.array([ix,idx])
Datarange = range(0,len(outdoor)-1,1)
for k in Datarange:
Xo[:,k+1] = np.matmul(Ae,Xo[:,k]) + np.matmul(Be,outdoor[k+1]) + Be1*(c4/c1)
ind = ind + 1
print(Xo)
err = np.linalg.norm(Xo[0,range(0,287)]-indoor.T)
if err<err_min:
err_min = err
cbest = np.array([[c1],[c2],[c3],[c4]])
progress(ind,totalspam,status='Done')
# print(X)
# print(err)
# print(cbest)
###########################. Model with Cbest Values. #################################################
c1 = cbest[0]
c2 = cbest[1]
c3 = cbest[2]
result2 = Matrixbuild(c1,c2,c3)
AeBest,BeBest,Be1Best = result2
Xo = np.array([ix,idx])
Datarange = np.arange(0,len(outdoor)-1)
for k in Datarange:
Xo[:,k+1] = np.matmul(AeBestb,Xo[:,k]) + np.matmul(BeBest,outdoor[k+1]) + Be1Best*(c4/c1)
err = np.linalg.norm(Xo[0,range(0,287)]-indoor.T)
print(cbest)
print(err)
###########################. Plots. #################################################
plt.figure(0)
time = np.linspace(1,2,2)
plt.scatter(time,X[0],s=15,c="blue")
plt.scatter(time,indoor[0:2],s=15,c="red")
plt.show()
And again my error occurs in the line with the for loop of
for k in Datarange:
Xo[:,k+1] = np.matmul(Ae,Xo[k]) + np.matmul(Be,outdoor[k+1]) + Be1*(c4/c1)
I was trying to use np.matmul for matrix multiplication but even without it, it wasn't working.
If there are any other questions about my code please ask. Essentially I'm trying to find the best c1,c2,c3 coefficients that fit my data which is indoor temperature by using a basic second order constant coefficient model.

Have you tried with Xo[:,k+1] instead of Xo(:,k+1)? Python uses [] for slicing and indexing.
EDIT:
Xo = np.array([[ix],[idx]])
This creates a 1x1 array with 1 value: (ix, idx). I think you're looking for something like Xo = np.zeros((ix, idx)), which will give you an ixxidx array initialized to zeros. If you don't need the zeros you can use Xo = np.empty((ix, idx)).
See the docs on array creation.

So by reading into how python works a little more and allocation for arrays/matrices, I was able to find out how to do it. I needed to first allocate my 'Xo' value and then input the initial conditions in order for the For loop to work.
Xo = np.zeros((2,num2))
Xo = np.asmatrix(Xo)
Xo[0,0] = ix
Xo[1,0] = idx
Also for the 'for loop', I called the range some value like this,
num1 = range(0,4)
num2 = len(num1) + 1
This helped in order to calculate the total dimension of 'Xo', by calling it 'num2'. It was also defined like that because my 'For loop' went (k+1), this the dimension would grow larger, ex:
for k in num1:
Xo[:,k+1] = Ae*Xo[:,k] + Be*outdoor[k+1] + Be1*(c4/c1)
But there it is! I figured it by comparing Matlab printouts to Python printouts and just trying to debug one line at a time. Now I have the same exact value print out in both goods, so it is time to start using the python code!

Python speed up large nested array processing

I'm wondering if there is a faster way to do this.
"""
Structure
-data[]
-data[0]
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[1]
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[2]
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[3]
-data[number, number, number, number, number, number, number]
- ... ect X 12000
x and y are the first two numbers in each data array.
"""
I need to scan each item in layers 1,2,3 against each item in the first layer (0) looking to see if they fall within a given search radius. This takes a while.
for i in range (len(data[0])):
x = data[0][i][0]
y = data[0][i][1]
for x in range (len(data[1])):
x1 = data[1][x][0]
y1 = data[1][x][1]
if( math.pow((x1 -x),2) + math.pow((y1 - y),2) < somevalue):
matches1.append(data[0][i])
matches2.append(data[1][x])
continue
else:
continue
Thanks for any assistance!

First you should write more readable python code:
for x,y in data[0]:
for x1, y1 in data[1]:
if (x1 - x)**2 + (y1 - y)**2 < somevalue:
matches1.append((x,y))
matches2.append((x1,y1))
The you can vectorize the inner loop with numpy:
for x,y in data[0]:
x1, y1 = data[1].T
indices = (x1 - x)**2 + (y1 - y)**2 < somevalue
matches.append(((x,y), data[1][indices]))

For this specific problem scipy.spatial.KDTree or rather its Cython workalike scipy.spatial.cKDTree would appear to be taylor-made:
import numpy as np
from scipy.spatial import cKDTree
# create some random data
data = np.random.random((4, 12000, 7))
# in each record discard all but x and y
data_xy = data[..., :2]
# build trees
trees = [cKDTree(d) for d in data_xy]
somevalue = 0.001
# find all close pairs between reference layer and other layers
pairs = []
for tree in trees[1:]:
pairs.append(trees[0].query_ball_tree(tree, np.sqrt(somevalue)))
This example takes less than a second. Please note that the output format is different to the one your script produces. For each of the three non-reference layers it is a list of lists, where the inner list at index k contains the indices of the points that are close to point k in the reference list.

I would suggest creating a function out of this and using the numba libray with decorator #jit(nopython=True).
also as suggested you should use numpy arrays as numba is focusing on utilizing numpy operations.
from numba import jit
#jit(nopython=True)
def search(data):
matches1 = []
matches2 = []
for i in range (len(data[0])):
x = data[0][i][0]
y = data[0][i][1]
for x in range (len(data1[1])):
x1 = data[1][x][0]
y1 = data[1][x][1]
if( math.pow((x1 -x),2) + math.pow((y1 - y),2) < somevalue):
matches1.append(data[0][i])
matches2.append(data[1][x])
continue
else:
continue
return matches1, matches2
if __name__ == '__main__':
# Initialize
# import your data however.
m1, m2 = search(data)
The key is to make sure to only use the allowed functions supported by numba.
I have seen speed increases from 100x faster to ~300x faster.

This could also be a good place to use GPGPU computation. From python you have pycuda and pyopencl depending on your underlying hardware. Opencl can also use some of the SIMD instructions on the CPU if you don't have a gpu.
If you don't want to go down the GPGPU road then numpy or numba would also be useful as mentioned before.

Corresponding values of a parameter in Python

I am working in SageMath (Python-based), I am quite new to programming and I have the following question. In my computations I have a quadratic form: x^TAx = b , where the matrix A is defined already as a symmetric matrix, and x is defined as
import itertools
X = itertools.product([0,1], repeat = n)
for x in X:
x = vector(x)
print x
as all combination of [0,1] repeated n times. I got a set of values for b the following way:
import itertools
X = itertools.product([0,1], repeat = n)
results = []
for x in X:
x = vector(x)
x = x.row()
v = x.transpose()
b = x * A * v
results.append(b[0, 0])
And then I defined:
U = set(results)
U1 = sorted(U)
A = []
for i in U1:
U2 = round(i, 2)
A.append(U2)
So I have a sorted set to get a few minimal values of my results. I need to extract minimal values from the set and identify what particular x is corresponding to each value of b. I heard that I can use dictionary method and define preimages in there, but I am really struggling to define my dictionary as {key: value}. Could someone help me please, solve the problem or give me the idea of what direction should I think in? Thank you.

efficiently computing parafac / CP product in numpy

This question focuses on numpy.
I have a set of matrices which all share the same number of columns and have different number of rows. Let's call them A, B, C, D, etc and let their dimensions be IaxK IbxK, IcxK, etc
What I want is to efficiently compute the IaxIbxIc... tensor P defined as follow:
P(ia,ib,ic,id,ie,...)=\sum_k A(ia,k)B(ib,k)C(ic,k)...
So if I have two factors, I end up with simple matrix product.
Of course I can compute this "by hand" through outer products, something like:
def parafac(factors,components=None):
ndims = len(factors)
ncomponents = factors[0].shape[1]
total_result=array([])
if components is None:
components=range(ncomponents)
for k in components:
#for each component (to save memory)
result = array([])
for dim in range(ndims-1,-1,-1):
#Augments model with next dimension
current_dim_slice=[slice(None,None,None)]
current_dim_slice.extend([None]*(ndims-dim-1))
current_dim_slice.append(k)
if result.size:
result = factors[dim].__getitem__(tuple(current_dim_slice))*result[None,...]
else:
result = factors[dim].__getitem__(tuple(current_dim_slice))
if total_result.size:
total_result+=result
else:
total_result=result
return total_result
Still, I would like something much more computationally efficient, like relying on builtin numpy functions, but I cannot find relevant functions, can someone help me ?
Cheers, thanks

Thank you all very much for your answers, I've spent the day on this and I eventually found the solution, so I post it here for the record
This solution requires numpy 1.6 and makes use of einsum, which is
powerful voodoo magic
basically, if you had factor=[A,B,C,D] with A,B,C and D matrices with
the same number of columns, then you would compute the parafac model using:
import numpy
P=numpy.einsum('az,bz,cz,dz->abcd',A,B,C,D)
so, one line!
In the general case, I end up with this:
def parafac(factors):
ndims = len(factors)
request=''
for temp_dim in range(ndims):
request+=string.lowercase[temp_dim]+'z,'
request=request[:-1]+'->'+string.lowercase[:ndims]
return einsum(request,*factors)

Having in mind that outer product is Kronecker product in disguise your problem should be solved by this simple functions:
def outer(vectors):
shape=[v.shape[0] for v in vectors]
return reduce(np.kron, vectors).reshape(shape)
def cp2Tensor(l,A):
terms=[]
for r in xrange(A[0].shape[1]):
term=l[r]*outer([A[n][:,r] for n in xrange(len(A))])
terms.append(term)
return sum(terms)
cp2Tensor takes list of real numbers and list of matrices.
Edited after comment by Jaime.

Ok, so the following works. First a worked out example of what's going on...
a = np.random.rand(5, 8)
b = np.random.rand(4, 8)
c = np.random.rand(3, 8)
ret = np.ones(5,4,3,8)
ret *= a.reshape(5,1,1,8)
ret *= b.reshape(1,4,1,8)
ret *= c.reshape(1,1,3,8)
ret = ret.sum(axis=-1)
And a full function
def tensor(elems) :
cols = elems[0].shape[-1]
n_elems = len(elems)
ret = np.ones(tuple([j.shape[0] for j in elems] + [cols]))
for j,el in enumerate(elems) :
ret *= el.reshape((1,) * j + (el.shape[0],) +
(1,) * (len(elems) - j - 1) + (cols,))
return ret.sum(axis=-1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiplying two arrays in python with different lenghts - python

Do you want array to broadcast to the whichever is the shorter? You can do this def f(x): leng = min(len(x), len(yf)) x = x[:leng] new_yf = yf[:leng] # Don't want to modify global variable. res = 1/( 1 + x * new_yf) return res and it should work.

Related

How to speed up an N dimensional interval tree in python?

For loop Python- from Matlab

Python speed up large nested array processing

Corresponding values of a parameter in Python

efficiently computing parafac / CP product in numpy

Categories

Resources