scipy.sparse dot extremely slow in Python - python

The following code will not even finish on my system:
import numpy as np
from scipy import sparse
p = 100
n = 50
X = np.random.randn(p,n)
L = sparse.eye(p,p, format='csc')
X.T.dot(L).dot(X)
Is there any explanation why this matrix multiplication is hanging?

X.T.dot(L) is not, as you may think, a 50x100 matrix, but an array of 50x100 sparse matrices of 100x100
>>> X.T.dot(L).shape
(50, 100)
>>> X.T.dot(L)[0,0]
<100x100 sparse matrix of type '<type 'numpy.float64'>'
with 100 stored elements in Compressed Sparse Column format>
It seems that the problem is that X's dot method, it being an array, doesn't know about sparse matrices. So you must either convert the sparse matrix to dense using its todense or toarray method. The former returns a matrix object, the latter an array:
>>> X.T.dot(L.todense()).dot(X)
matrix([[ 81.85399873, 3.75640482, 1.62443625, ..., 6.47522251,
3.42719396, 2.78630873],
[ 3.75640482, 109.45428475, -2.62737229, ..., -0.31310651,
2.87871548, 8.27537382],
[ 1.62443625, -2.62737229, 101.58919604, ..., 3.95235372,
1.080478 , -0.16478654],
...,
[ 6.47522251, -0.31310651, 3.95235372, ..., 95.72988689,
-18.99209596, 17.31774553],
[ 3.42719396, 2.87871548, 1.080478 , ..., -18.99209596,
108.90045569, -16.20312682],
[ 2.78630873, 8.27537382, -0.16478654, ..., 17.31774553,
-16.20312682, 105.37102461]])
Alternatively, sparse matrices have a dot method that knows about arrays:
>>> X.T.dot(L.dot(X))
array([[ 81.85399873, 3.75640482, 1.62443625, ..., 6.47522251,
3.42719396, 2.78630873],
[ 3.75640482, 109.45428475, -2.62737229, ..., -0.31310651,
2.87871548, 8.27537382],
[ 1.62443625, -2.62737229, 101.58919604, ..., 3.95235372,
1.080478 , -0.16478654],
...,
[ 6.47522251, -0.31310651, 3.95235372, ..., 95.72988689,
-18.99209596, 17.31774553],
[ 3.42719396, 2.87871548, 1.080478 , ..., -18.99209596,
108.90045569, -16.20312682],
[ 2.78630873, 8.27537382, -0.16478654, ..., 17.31774553,
-16.20312682, 105.37102461]])

Related

Interpolation for a 2D array

I was wondering if there is a way to interpolate a 2D array in python using the same principle used to interpolate a 1D array ( {np.interpolate} ).
So my aim is to increase the number of data points that is within my array ([1000,20] to [1000, 200] [Time_indexing, X]).
I am looking for a function that is capable of doing that.
A = np.array([[ 0.45717218, 0.44250104, 0.47812272, 0.49092173, 0.46002069],
[ 0.29829681, 0.26408021, 0.3709202 , 0.44823109, 0.49311853],
[ 0.05469835, 0.01048596, 0.17398291, 0.30088943, 0.39783137],
[-0.20463768, -0.24610673, -0.0713164 , 0.08406331, 0.22047102],
[-0.4074527 , -0.43573695, -0.31062521, -0.15750053, -0.00222392]])
This is a [5,5] array i want to interpolate it using a spacing of 0.01 hence the final product should be [500,500].
Thank you,
You could use interp2d:
from scipy.interpolate import interp2d
f = interp2d(np.arange(0,500,100), np.arange(0,500,100), A)
f(np.arange(500), np.arange(500))
Output:
array([[ 0.45717218, 0.45702547, 0.45687876, ..., 0.46002069,
0.46002069, 0.46002069],
[ 0.45558343, 0.45543476, 0.45528609, ..., 0.46035167,
0.46035167, 0.46035167],
[ 0.45399467, 0.45384405, 0.45369343, ..., 0.46068265,
0.46068265, 0.46068265],
...,
[-0.4074527 , -0.40773554, -0.40801839, ..., -0.00222392,
-0.00222392, -0.00222392],
[-0.4074527 , -0.40773554, -0.40801839, ..., -0.00222392,
-0.00222392, -0.00222392],
[-0.4074527 , -0.40773554, -0.40801839, ..., -0.00222392,
-0.00222392, -0.00222392]])

How to access the first items in a list of arrays without for loop in Python 3?

I have an ode equation which produces a list like below:
u=[array([ 2.06642033, -0.03448756]),
array([ 2.03964994, -0.18737285]),
array([ 1.99884859, -0.21461016]),
array([ 1.95476809, -0.2254584 ]),
array([ 1.90875336, -0.23472173]),
array([ 1.86082857, -0.24471069]),
... ]
I want to plot u[0] and u[1] based on time. I try to access all first and second elements of u in two list with l1=u[0:len(u)-1][0] and l2=u[0:len(u)-1][1], but it gives me only the first item from list.
Does anyone have a solution for it?
Thanks
You should convert your list of arrays into a single numpy array.
Also note that numpy array indexing in 2d is performed by arr[row, column]. If you don't filter by a dimension, just use :.
from numpy import array
u = [array([ 2.06642033, -0.03448756]), array([ 2.03964994, -0.18737285]),
array([ 1.99884859, -0.21461016]), array([ 1.95476809, -0.2254584 ]),
array([ 1.90875336, -0.23472173]), array([ 1.86082857, -0.24471069])]
u = np.array(u)
res = u[:, 0]
# array([ 2.06642033, 2.03964994, 1.99884859, 1.95476809, 1.90875336,
# 1.86082857])

Create 3D array from multiple 2D arrays

I have two monthly gridded data sets which I want to compare later.
The input looks like this for both data and that is also how I want the output.
In[4]: data1.shape
Out[4]: (444, 72, 144)
In[5]: gfz.shape
Out[5]: (155, 72, 144)
In[6]: data1
Out[6]:
array([[[ 0.98412287, 0.96739882, 0.91172796, ..., 1.12651634,
1.0682013 , 1.07681048],
[ 1.47803092, 1.44721365, 1.49585509, ..., 1.58934438,
1.66956687, 1.57198083],
[ 0.68730044, 0.76112831, 0.78218687, ..., 0.92582172,
1.07873237, 0.87490368],
...,
[ 1.00752461, 1.00758123, 0.99440521, ..., 0.94128627,
0.88981551, 0.93984401],
[ 1.03467119, 1.02640462, 0.91580886, ..., 0.88302392,
0.99204206, 0.96396238],
[ 0.8280431 , 0.82936555, 0.82637453, ..., 0.92009377,
0.77890259, 0.81065702]],
...,
[[-0.12173297, -0.06624345, -0.02809682, ..., -0.04522502,
-0.11502996, -0.22779272],
[-0.61080372, -0.61958522, -0.52239478, ..., -0.6775983 ,
-0.79460669, -0.70022893],
[-0.12011283, -0.10849079, 0.096185 , ..., -0.45782232,
-0.39763898, -0.31247514],
...,
[ 0.90601307, 0.88580155, 0.90268403, ..., 0.86414611,
0.87041426, 0.86274058],
[ 1.46445823, 1.31938004, 1.37585044, ..., 1.51378822,
1.48515761, 1.49078977],
[ 0.29749078, 0.22273554, 0.27161494, ..., 0.43205476,
0.43777165, 0.36340511]],
[[ 0.41008961, 0.44208974, 0.40928891, ..., 0.45899671,
0.39472976, 0.36803097],
[-0.13514084, -0.17332518, -0.11183424, ..., -0.22284794,
-0.2532815 , -0.15402752],
[ 0.28614867, 0.33750001, 0.48767376, ..., 0.01886483,
0.07220326, 0.17406547],
...,
[ 1.0551219 , 1.09540403, 1.19031584, ..., 1.09203815,
1.07658005, 1.08363533],
[ 1.54310501, 1.49531853, 1.56107259, ..., 1.57243073,
1.5867976 , 1.57728028],
[ 1.1034857 , 0.98658448, 1.14141166, ..., 0.97744882,
1.13562942, 1.08589089]],
[[ 1.02020931, 0.99780071, 0.87209344, ..., 1.11072564,
1.01270151, 0.9222675 ],
[ 0.93467152, 0.81068456, 0.68190312, ..., 0.95696563,
0.84669352, 0.84596157],
[ 0.97022212, 0.94228816, 0.97413743, ..., 1.06613588,
1.08708596, 1.04224277],
...,
[ 1.21519053, 1.23492992, 1.2802881 , ..., 1.33915019,
1.32537413, 1.27963519],
[ 1.32051706, 1.28170252, 1.36266208, ..., 1.29100537,
1.38395023, 1.34622073],
[ 0.86108029, 0.86364979, 0.88489276, ..., 0.81707358,
0.82471925, 0.83550251]]], dtype=float32)
So both have the same spatial resolution of 144x72 but different length of time.
As one of them has some missing months, I made sure that only the months are selected were both have data. So I created a two dimensional array where the data is stored according to their longitude and latitude value if both data sets contain this month. In the end I want to have a three dimensional array for data1 and data2 of the same length.
3Darray_data1 =[]
3Darray_data2=[]
xy_data1=[[0 for i in range(len(lons_data1))] for j in range(len(lats_data1))]
xy_data2=[[0 for i in range(len(lons_data2))] for j in range(len(lats_data2))]
# comparing the time steps
for i in range(len(time_data1)):
for j in range(len(time_data2)):
if time_data1.year[i] == time_data2[j].year and time_data1[i].month==time_data2[j].month:
# loop for data1 which writes the data into a 2D array
for x in range(len(lats_data1)):
for y in range(len(lons_data1)):
xy_data1[x][y]=data1[j,0,x,y]
# append to get an array of arrays
xy_data1 = np.squeeze(np.asarray(xy_data1))
3Darray_data1 = np.append(3Darray_data1,[xy_data1])
# loop for data2 which writes the data into a 2D array
for x in range(len(lats_data2)):
for y in range(len(lons_data2)):
xy_data2[x][y]=data2[i,x,y]
# append to get an array of arrays
xy_data2 = np.squeeze(np.asarray(xy_data2))
3Darray_data2 = np.append(3Darray_data2,[xy_data2])
The script runs without an error, however, I only get a really long 1D array.
In[3]: 3Darray_data1
Out[3]: array([ nan, nan, nan, ..., 0.81707358,
0.82471925, 0.83550251])
How can I arrange it to a three dimensional array?
For me I got it working with the following.
I defined the three dimensional array with the fixed dimension of the longitude and latitude and an undefined length of the time axis.
temp_data1 = np.zeros((0,len(lats_data1),len(lons_data1)))
And then I appended two dimensional outputs along the time axis.
3Darray = np.append(3Darray,xy_data1[np.newaxis,:,:],axis=0)

Array-Based Numpy 3d Array Assignment

Take a 2D numpy.array, let's say:
mat = numpy.random.rand(3,3)
In [153]: mat
Out[153]:
array([[ 0.16716156, 0.90822617, 0.83888038],
[ 0.89771815, 0.62627978, 0.34992542],
[ 0.11097042, 0.80858005, 0.0437299 ]])
Changes the indices to numpy.nan is quite straight forward
One of the following works great:
In [154]: diag = numpy.diag_indices(mat.shape[0], ndim = 2)
In [155]: mat[diag] = numpy.nan
or
In [156]: numpy.fill_diagonal(mat, numpy.nan)
But let's say I have a 3D array, where I want the exact same process along every dimension of the 3rd dimension.
mat = numpy.random.rand(3, 5, 5)
In [158]: mat
Out[158]:
array([[[ 0.65000325, 0.71059547, 0.31880388, 0.24818623, 0.57722849],
[ 0.26908326, 0.41962004, 0.78642476, 0.25711662, 0.8662998 ],
[ 0.15332566, 0.12633147, 0.54032977, 0.17322095, 0.17210078],
[ 0.81952873, 0.20751669, 0.73514815, 0.00884358, 0.89222687],
[ 0.62775839, 0.53657471, 0.99611842, 0.75051645, 0.59328044]],
[[ 0.28718216, 0.84982865, 0.27830082, 0.90604492, 0.43119512],
[ 0.43039373, 0.76557782, 0.58089787, 0.81135684, 0.39151152],
[ 0.70592711, 0.30625204, 0.9753166 , 0.32806864, 0.21947731],
[ 0.74600317, 0.33711673, 0.16203076, 0.6002213 , 0.74996638],
[ 0.63555715, 0.71719058, 0.81420001, 0.28968442, 0.01368163]],
[[ 0.06474027, 0.51966572, 0.006429 , 0.98590784, 0.35708074],
[ 0.44977222, 0.63719921, 0.88325451, 0.53820139, 0.51526687],
[ 0.98529117, 0.46219441, 0.09349748, 0.11406291, 0.47697128],
[ 0.77446136, 0.87423445, 0.71810465, 0.39019846, 0.94070077],
[ 0.09154989, 0.36295161, 0.19740833, 0.17803146, 0.6498038 ]]])
A logical way to do that (I would think), is:
mat[:, diag] = numpy.nan # doesn't do it
In fact, to accomplish this, I need to:
In [190]: rng = numpy.arange(5)
In [191]: for i in numpy.arange(mat.shape[0]):
.....: mat[i, rng, rng] = numpy.nan
.....:
In [192]: mat
Out[192]:
array([[[ nan, 0.4040426 , 0.89449522, 0.63593736, 0.94922036],
[ 0.40682651, nan, 0.30812181, 0.01726625, 0.75655994],
[ 0.23925763, 0.41476223, nan, 0.91590111, 0.18391644],
[ 0.99784977, 0.71636554, 0.21252766, nan, 0.24195636],
[ 0.41137357, 0.84705055, 0.60086461, 0.16403918, nan]],
[[ nan, 0.26183712, 0.77621913, 0.5479058 , 0.17142263],
[ 0.17969373, nan, 0.89742863, 0.65698339, 0.95817106],
[ 0.79048886, 0.16365168, nan, 0.97394435, 0.80612441],
[ 0.94169129, 0.10895737, 0.92614597, nan, 0.08689534],
[ 0.20324943, 0.91402716, 0.23112819, 0.2556875 , nan]],
[[ nan, 0.43177039, 0.76901587, 0.82069345, 0.64351534],
[ 0.14148584, nan, 0.35820379, 0.17434688, 0.78884305],
[ 0.85232784, 0.93526843, nan, 0.80981366, 0.57326785],
[ 0.82104636, 0.63453196, 0.5872653 , nan, 0.96214559],
[ 0.69959383, 0.70257404, 0.92471502, 0.50077728, nan]]])
It's for an application where speed is of the utmost importance, so if there isn't an array based implementation of the following, I'm going to do the for-loop / assignment in Cython
This seems to work:
diag = numpy.diag_indices(mat.shape[1], ndim = 2)
mat[:, diag[0], diag[1]] = numpy.nan
The problem is that diag is a 2-element tuple, so using it as-is in a 3D index won't work, and using *diag us unfortunately invalid syntax. However, you can also do this:
diag = (Ellipsis, *numpy.diag_indices(mat.shape[-1], ndim = 2))
mat[diag] = numpy.nan
In this case, diag is the three-element tuple you need to use it as an index. Ellipsis is the object that represents : repeated as many times as necessary in the index. This version will work for any number of dimensions >2 where the last two represent the square matrices you want.
Using linear indexing -
m,n,r = mat.shape
mat.reshape(m,-1)[:,np.arange(r)*(r+1)] = np.nan
Using slicing and boolean indexing -
m,n,r = mat.shape
mat.reshape(m,-1)[:,np.eye(n,r,dtype=bool).ravel()] = np.nan

python numpy array iterate on a single axis

I have seen a few questions similar to mine, but i couldn't find one that suits me.
I want to iterate over one single axis in my array, and without using 2 for loops to make it faster.
First, I open a bunch of pictures and I append them togheter (converting to np array)
after I get an array of array like these:
ffImageArr[0]
array([[ 45.49061198, 172.49061198, 174.49061198, ..., 30.49061198,
-71.50938802, -69.50938802],
[ 60.49061198, 169.49061198, 183.49061198, ..., 0.49061198,
-83.50938802, -66.50938802],
[ 55.49061198, 133.49061198, 135.49061198, ..., -43.50938802,
-130.50938802, -99.50938802],
...,
[ 118.49061198, 203.49061198, 195.49061198, ..., 182.49061198,
97.49061198, 132.49061198],
[ 108.49061198, 238.49061198, 197.49061198, ..., 121.49061198,
99.49061198, 133.49061198],
[ 118.49061198, 232.49061198, 196.49061198, ..., 130.49061198,
123.49061198, 145.49061198]])
ffImageArr[1]
array([[ 43.59677409, 172.59677409, 173.59677409, ..., 29.59677409,
-73.40322591, -71.40322591],
[ 60.59677409, 167.59677409, 182.59677409, ..., 0.59677409,
-86.40322591, -64.40322591],
[ 55.59677409, 133.59677409, 134.59677409, ..., -46.40322591,
-131.40322591, -102.40322591],
...,
[ 119.59677409, 201.59677409, 194.59677409, ..., 180.59677409,
98.59677409, 131.59677409],
[ 109.59677409, 238.59677409, 197.59677409, ..., 119.59677409,
98.59677409, 134.59677409],
[ 117.59677409, 231.59677409, 197.59677409, ..., 129.59677409,
122.59677409, 144.59677409]])
ffImageArr[2]
array([[ 42.16040365, 174.16040365, 177.16040365, ..., 28.16040365,
-75.83959635, -74.83959635],
[ 59.16040365, 168.16040365, 183.16040365, ..., -1.83959635,
-87.83959635, -66.83959635],
[ 54.16040365, 133.16040365, 135.16040365, ..., -47.83959635,
-133.83959635, -103.83959635],
...,
[ 119.16040365, 203.16040365, 196.16040365, ..., 182.16040365,
98.16040365, 132.16040365],
[ 108.16040365, 240.16040365, 199.16040365, ..., 121.16040365,
98.16040365, 132.16040365],
[ 116.16040365, 232.16040365, 196.16040365, ..., 129.16040365,
122.16040365, 143.16040365]])
ffImageArr[3]
array([[ 43.89271484, 174.89271484, 175.89271484, ..., 28.89271484,
-78.10728516, -75.10728516],
[ 59.89271484, 169.89271484, 183.89271484, ..., -2.10728516,
-89.10728516, -67.10728516],
[ 54.89271484, 132.89271484, 135.89271484, ..., -50.10728516,
-137.10728516, -105.10728516],
...,
[ 118.89271484, 204.89271484, 195.89271484, ..., 181.89271484,
98.89271484, 131.89271484],
[ 108.89271484, 240.89271484, 199.89271484, ..., 121.89271484,
98.89271484, 134.89271484],
[ 118.89271484, 234.89271484, 199.89271484, ..., 128.89271484,
123.89271484, 145.89271484]])
My goal is to retrieve an array with the n element of each of these arrays, it's and array of array as fast as possible.
like array =[45.49061198,43.59677409,42.16040365...]
I tried
for i in range(ffImageArr.shape[0]):
print ffImageArr[i,:,:]
but weirdly, [i,:,:] gives the same thing as [:,i:]
Thanks for the help and explanation!
edit :
code that I wrote in the meantime, I will try to use polyfit directly as suggested :
for k in range (ffImageArr.shape[1]):
for i in range(ffImageArr.shape[2]):
fffunc = []
for j in range(ffImageArr.shape[0]):
fffunc.append(ffImageArr[j,k,i])
fffunc = np.array(fffunc)
a = np.polyfit(tempArr,fffunc,1)
firstOrder0.append(a[1])
firstOrder1.append(a[0])
b = np.polyfit(tempArr,fffunc,2)
secondOrder0.append(b[2])
secondOrder1.append(b[1])
secondOrder2.append(b[1])
c = np.polyfit(tempArr,fffunc,3)
thirdOrder0.append(c[3])
thirdOrder1.append(c[2])
thirdOrder2.append(c[1])
thirdOrder3.append(c[0])
Assuming these are grayscale images with only one band/channel and not RGB, i.e. of shape (N, D) and not (N, D, 3), then you can use list comprehension.
# Generate 5 single-band images of size 8x8
ims = np.random.randn(5, 8, 8)
# Coordinates of the nth value
x = 1
y = 1
arr = [im[n, n] for im in ims]

Categories