MATLAB function sparse to python - python

I have to convert these MATLAB lines to python. I don't know how exacly to do this, what I know is that I should use csr_matrix from scipy.sparse.
A = sparse(Nx,Nx);
A = A + sparse([1 Nx],[1 Nx],[1 1],Nx,Nx);
A = A + sparse(2:Nx-1,2:Nx-1,ones(Nx-2,1)*(1+2*r),Nx,Nx);
A = A - sparse(2:Nx-1,1:Nx-2,ones(Nx-2,1)*r,Nx,Nx);
A = A - sparse(2:Nx-1,3:Nx,ones(Nx-2,1)*r,Nx,Nx);
I tried something like this:
A = sp.csr_matrix((Nx,Nx), dtype=np.float)
A = A + sp.csr_matrix((z,(x,y)), shape=(Nx,Nx), dtype=np.float)
But I don't know what to put in this x,y,z positions:
Nx=1000 and r=99.8001

Related

How to manipulate image bands as arrays of numbers

I'm new to Python, and I'm trying to deconstruct image bands as arrays of numbers by applying the Singular Value Decomposition (SVD) to them and then putting them back together with matplotlib.image and the Image module from PIL. An SVD may also be written as a sum of dyads s1u1v1T + ... + sKuKvKT, and the point in decomposing it in this way is that a near-perfect approximation of the image can be made from just a few of those dyads, so less data is required.
There must be something wrong with the calculation, though because result_r, result_g, and result_b look like this when converted to Images, and new_image looks like this.
For an example of what this should look like, here are the first dyads of the layers of this image. The image that I'm using (April23.jpg) is this.
import matplotlib.image as image
import numpy.linalg as la
import numpy as np
from PIL import Image
def getcolumn(j, m):
col = []
for i in range(len(m)):
col.append(m[i][j])
return col
def extractCols(U):
Ucols = []
for j in range(len(U[0])):
Ucols.append(getcolumn(j, U))
return np.asarray(Ucols)
def vectorMultiply(u, v):
matrix = []
for i in range(len(u)):
newVec = []
for j in range(len(v)):
newVec.append(u[i] * v[j])
matrix.append(newVec)
return np.asarray(matrix)
im = Image.open('C:/Users/<user>/Desktop/img/April23.jpg')
im.load()
sim = Image.Image.split(im)
rsim = sim[0].save("rsim.jpg") # image bands as images
gsim = sim[1].save("gsim.jpg")
bsim = sim[2].save("bsim.jpg")
# image bands as arrays of numbers
arsim = image.imread('C:/Users/<user>/Desktop/img/rsim.jpg')
agsim = image.imread('C:/Users/<user>/Desktop/img/gsim.jpg')
absim = image.imread('C:/Users/<user>/Desktop/img/bsim.jpg')
ur, sr, vhr = la.svd(arsim, False) # SVD on each band
ug, sg, vhg = la.svd(agsim, False)
ub, sb, vhb = la.svd(absim, False)
urcols = extractCols(ur)
ugcols = extractCols(ug)
ubcols = extractCols(ub)
# calculating the first dyads
result_r = np.multiply(sr[0], vectorMultiply(urcols[0], vhr[0]))
result_g = np.multiply(sg[0], vectorMultiply(ugcols[0], vhg[0]))
result_b = np.multiply(sb[0], vectorMultiply(ubcols[0], vhb[0]))
r = Image.fromarray(result_r, "L")
g = Image.fromarray(result_g, "L")
b = Image.fromarray(result_b, "L")
new_image = Image.merge("RGB", (r, g, b))
What am I missing, here? It seems to be something with the calculations. I figured for a matrix one would have to extract the columns, say the column [1, 2, 3] from a matrix [[1,...], [2,...], [3,...]], since each element of the matrix is a row. So, I wrote extractCols() for that. numpy's matrix add and multiply seem to be fine. I wrote vectorMultiply because np.dot(), np.multiply(), and np.matmul() didn't seem to realize that u was a column and kept saying the dimensions didn't match up. I tested it and it seemed to do what I wanted it to. I was also thinking that maybe the "rows" of U are actually the columns already and don't need to be extracted, but that didn't work either. I've also tried not using np.asarray() without any luck.
Any advice is appreciated.

Replace torch.gather by other operator?

I have one script code, where x1 and x2 size of 1x68x8x8
tmp_batch, tmp_channel, tmp_height, tmp_width = x1.size()
x1 = x1.view(tmp_batch*tmp_channel, -1)
max_ids = torch.argmax(x1, 1)
max_ids = max_ids.view(-1, 1)
x2 = x2.view(tmp_batch*tmp_channel, -1)
outputs_x_select = torch.gather(x2, 1, max_ids) # size of 68 x 1
As for the above code, I have trouble with torch.gather when I used old onnx. Hence, I would like to find an alternative solution that replaces the toch.gather by other operators but gives the same output with the above code. Could you please give me some suggestions?
One workaround is to use the equivalent numpy method. If you include an import numpy as np statement somewhere, you could do the following.
outputs_x_select = torch.Tensor(np.take_along_axis(x2,max_ids,1))
If that gives you a grad related error, try
outputs_x_select = torch.Tensor(np.take_along_axis(x2.detach(),max_ids,1))
An approach without numpy: in this case, it seems that max_ids contains exactly one entry per row. Thus, I believe the following will work:
max_ids = torch.argmax(x1, 1) # do not reshape
x2 = x2.view(tmp_batch*tmp_channel, -1)
outputs_x_select = x2[torch.arange(tmp_batch*tmp_channel),max_ids]

Vectorizing for loop using splicing in NumPy

I have this for loop:
blockSize = 5
ds = np.arange(20)
ds = np.reshape(ds, (1, len(ds))
counts = np.zeros(len(ds[0]/blockSize))
for i in range(len(counts[0])):
counts[0, i] = np.floor(np.sum(ds[0, i*blockSize:i*blockSize+blockSize]))
I am trying to vectorize it, doing something like this:
countIndices = np.arange(len(counts[0]))
counts[0, countsIndices] = np.floor(np.sum(ds[0, countIndices*blockSize:countIndices*blockSize + blockSize]))
However, this does not work and gives this error:
counts[0, countIndices] = np.floor(np.sum(ds[0, countIndices*blockSize:countIndices*blockSize + blockSize]))
TypeError: only integer scalar arrays can be converted to a scalar index
I know that something like this works:
counts[0, countIndices] = np.floor(ds[0, countIndices*blockSize]
+ ds[0, countIndices*blockSize + 2] +
... ds[0, countIndices*blockSize + blockSize])
The issue is that for large values of blocksize (which blocksize is very large in my actual code), this is not feasible to implement. I am confused as to how to accomplish what I want. Any help is greatly appreciated.
You don't need to do floor if you store the result to an integer array. You can also create a fake new axis of size blockSize to fully vectorize your operation.
block_size = 5
ds = np.arange(80.0).reshape(4, -1) # Shape (4, 20)
counts = np.empty((ds.shape[0], ds.shape[1] // block_size), dtype=int)
To introduce the fake dimension and sum:
ds.reshape(ds.shape[0], -1, block_size).sum(axis=-1, out=counts)
Reshaping does not copy the data, so the operation ds.reshape(ds.shape[0], -1, block_size) is extremely cheap in both time and space.
You can use -1 for one of the reshape dimensions to avoid computing/writing out long division expressions.

How to use for-loops and arrays in python to calculate error propagation

I am trying to write a for loop to calculate error bars by using the derivative method. The formula is relatively simple, however I seem to be running into errors in my code with respect to vector/array sizes. There are a lot of defined vectors in my code, and I have checked the length of all of them. All of the inputs into the for-loop are 1x25 sized arrays.
I've tried changing the indices in the for loop from range(1,25) to range(0,24) but that doesn't seem to work.
# Creating vectors
dfdvg = np.zeros(25)
dfdxi0 = np.zeros(25)
sigsquare = np.zeros(25)
vgerr = vrs
xi0err = xi0s
Asq = np.zeros(25)
Bsq= np.zeros(25)
sig = np.zeros(25)
# calculating derivatives and error vectors
for i in range(0,24):
dfdvg[i] = (np.multiply(rms[:,i],delta[:,i]))**-1
dfdxi0[i] = -vr[:,i]/(vr[:,i]*(np.power(delta[:,i],2)))
Asq[i] = np.power(np.multiply(dfdvg[i],vgerr[i]),2)
Bsq[i] = np.power(np.multiply(dfdxi0[i],xi0err[i]),2)
sigsquare[i] = Asq[i] + Bsq[i]
sig[i] = np.power(sigsquare[i],0.5)
q = np.power(np.multiply(rms,delta),-1)
left = np.multiply(vg,q)
right = -(beta*H)/(3*(1+zeff))
What I want is the "sig" vector, representing the propagated error for each index.
The problem is not in the dimensions of the array, the problem is in the shape. Unluckily you didn't write all your arrays. The point is, if you could just use arrays (25) instead of (1, 25), everything works fine:
vrs = np.random.rand(25)
vr = np.random.rand(25)
xi0s = np.random.rand(25)
rms = np.random.rand(25)
delta = np.random.rand(25)
vg = np.random.rand(25)
# Creating vectors
dfdvg = np.zeros(25)
dfdxi0 = np.zeros(25)
sigsquare = np.zeros(25)
vgerr = vrs
xi0err = xi0s
Asq = np.zeros(25)
Bsq= np.zeros(25)
sig = np.zeros(25)
# calculating derivatives and error vectors
for i in range(0,24):
dfdvg[i] = (np.multiply(rms[i],delta[i]))**-1
dfdxi0[i] = -vr[i]/(vr[i]*(np.power(delta[i],2)))
Asq[i] = np.power(np.multiply(dfdvg[i],vgerr[i]),2)
Bsq[i] = np.power(np.multiply(dfdxi0[i],xi0err[i]),2)
sigsquare[i] = Asq[i] + Bsq[i]
sig[i] = np.power(sigsquare[i],0.5)
q = np.power(np.multiply(rms,delta),-1)
left = np.multiply(vg,q)
(your last line of code seems unrelated)
So, in my opinion, your best option is to reshape your arrays:
vrs=vrs.reshape(25)

getting elements in an array1 that are not in array2

Main Problem
What is the better/pythonic way of retrieving elements in a particular array that are not found in a different array. This is what I have;
idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
idata = np.vstack(idata)
My interest is in performance. My data is an (X,Y,Z) array of size (7000 x 3) and my gdata is an (X,Y) array of (11000 x 2)
Preamble
I am working on an octant search to find the n-number(e.g. 8) of points (+) closest to my circular point (o) in each octant. This would mean that my points (+) are reduced to only 64 (8 per octant). Then for each gdata I would save the elements that are not found in data.
import tkinter as tk
from tkinter import filedialog
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
from collections import defaultdict
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
data = pd.read_excel(file_path)
data = np.array(data, dtype=np.float)
nrow, cols = data.shape
file_path1 = filedialog.askopenfilename()
gdata = pd.read_excel(file_path1)
gdata = np.array(gdata, dtype=np.float)
gnrow, gcols = gdata.shape
N=8
delta = gdata - data[:,:2]
angles = np.arctan2(delta[:,1], delta[:,0])
bins = np.linspace(-np.pi, np.pi, 9)
bins[-1] = np.inf # handle edge case
octantsort = []
for j in range(gnrow):
delta = gdata[j, ::] - data[:, :2]
angles = np.arctan2(delta[:, 1], delta[:, 0])
octantsort = []
for i in range(8):
data_i = data[(bins[i] <= angles) & (angles < bins[i+1])]
if data_i.size > 0:
dist_order = np.argsort(cdist(data_i[:, :2], gdata[j, ::][np.newaxis]), axis=0)
if dist_order.size < npoint_per_octant+1:
[octantsort.append(data_i[dist_order[:npoint_per_octant][j]]) for j in range(dist_order.size)]
else:
[octantsort.append(data_i[dist_order[:npoint_per_octant][j]]) for j in range(npoint_per_octant)]
final = np.vstack(octantsort)
idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
idata = np.vstack(idata)
Is there an efficient and pythonic way of doing this do increase performance in the last two lines of the code?
If I understand your code correctly, then I see the following potential savings:
dedent the final = ... line
don't use arctan it's expensive; since you only want octants compare the coordinates to zero and to each other
don't do a full argsort, use argpartition instead
make your octantsort an "octantargsort", i.e. store the indices into data, not the data points themselves; this would save you the search in the last but one line and allow you to use np.delete for removing
don't use append inside a list comprehension. This will produce a list of Nones that is immediately discarded. You can use list.extend outside the comprehension instead
besides, these list comprehensions look like a convoluted way of converting data_i[dist_order[:npoint_per_octant]] into a list, why not simply cast, or even keep as an array, since you want to vstack in the end?
Here is some sample code illustrating these ideas:
import numpy as np
def discard_nearest_in_each_octant(eater, eaten, n_eaten_p_eater):
# build octants
# start with quadrants ...
top, left = (eaten < eater).T
quadrants = [np.where(v&h)[0] for v in (top, ~top) for h in (left, ~left)]
dcoord2 = (eaten - eater)**2
dc2quadrant = [dcoord2[q] for q in quadrants]
# ... and split them
oct4158 = [q[:, 0] < q [:, 1] for q in dc2quadrant]
# main loop
dc2octants = [[q[o], q[~o]] for q, o in zip (dc2quadrant, oct4158)]
reloap = [[
np.argpartition(o.sum(-1), n_eaten_p_eater)[:n_eaten_p_eater]
if o.shape[0] > n_eaten_p_eater else None
for o in opair] for opair in dc2octants]
# translate indices
octantargpartition = [q[so] if oap is None else q[np.where(so)[0][oap]]
for q, o, oaps in zip(quadrants, oct4158, reloap)
for so, oap in zip([o, ~o], oaps)]
octantargpartition = np.concatenate(octantargpartition)
return np.delete(eaten, octantargpartition, axis=0)

Categories