Performing a random addition between two arrays - python

What I'm trying to do is get two different arrays, where the first array is just filled with zeros and second array would be populated by random numbers. I would like to perform an operation where only certain elements from the latter array are added to the array filled with zeros and the rest of elements within the former array remain as zero. I'm trying to get the addition done in a random way as well. I just added the code below as an example. I honestly don't know how to perform something like this and I would be very grateful for any help or suggestions! Thank you!
shape = (6, 3)
empty_array = np.zeros(shape)
random_array = 0.1 * np.random.randn(*empty_array)
sum = np.add(empty_array, random_array)

You can use a binary mask with the density P:
P = 0.5
# Repeat the next two lines as needed
mask = np.random.binomial(1, P, size = empty_array.size)\
.reshape(shape).astype(bool)
empty_array[mask] += random_array[mask]
If you plan to add more random elements, you may want to re-generate the mask at each further iteration.

If I understand you correctly from your comments, you want to create random numbers at random indices based on the threshold of some percent of whole array (you do not need to create a whole random array and use only a percent of it, such random number generation is usually costly in larger scales):
sz = shape[0]*shape[1]
#this is your for example 20% threshold
threshold = 0.2
#create random numbers and random indices
random_array = np.random.rand(int(threshold*sz))
random_idx = np.random.randint(0,sz,int(threshold*sz))
#now you can add this random_array to random indices of your desired array
empty_array.reshape(-1)[random_idx] += random_array
or another solution:
sz = shape[0]*shape[1]
#this is your for example 20% threshold
threshold = 0.2
random_array = np.random.rand(int(threshold*sz))
#pad with enough zeros and randomly shuffle and finally reshape it
random_array.resize(sz)
np.random.shuffle(random_array)
#now you can add this random_array to any array of your choice
empty_array += random_array.reshape(shape)
sample output:
[[0. 0. 0. ]
[0. 0. 0. ]
[0. 0. 0.7397274 ]
[0. 0. 0. ]
[0. 0. 0.79541551]
[0.75684113 0. 0. ]]

Related

Numpy appending two-dimensional arrays together

I am trying to create a function which exponentiates a 2-D matrix and keeps the result in a 3D array, where the first dimension is indexing the exponent. This is important because the rows of the matrix I am exponentiating represent information about different vertices on a graph. So for example if we have A, A^2, A^3, each is shape (50,50) and I want a matrix D = (3,50,50) so that I can go D[:,1,:] to retrieve all the information about node 1 and be able to do matrix multiplication with that. My code is currently as
def expo(times,A,n):
temp = A;
result = csr_matrix.toarray(temp)
for i in range(0,times):
temp = np.dot(temp,A)
if i == 0:
result = np.array([result,csr_matrix.toarray(temp)]) # this creates a (2,50,50) array
if i > 0:
result = np.append(result,csr_matrix.toarray(temp),axis=0) # this does not work
return result
However, this is not working because in the "i>0" case the temp array is of the shape (50,50) and cannot be appended. I am not sure how to make this work and I am rather confused by the dimensionality in Numpy, e.g. why thinks are (50,1) sometimes and just (50,) other times. Would anyone be able to help me make this code work and explain generally how these things should be done in Numpy?
Documentation reference
If you want to stack matrices in numpy, you can use the stack function.
If you also want the index to correspond to the exponent, you might want to add a unity matrix to the beginning of your output:
MWE
import numpy as np
def expo(A, n):
result =[np.eye(len(A)), A,]
for _ in range(n-1):
result.append(result[-1].dot(A))
return np.stack(result, axis=0)
# If you do not really need the 3D array,
# you could also just return the list
result = expo(np.array([[1,-2],[-2,1]]), 3)
print(result)
# [[[ 1. 0.]
# [ 0. 1.]]
#
# [[ 1. -2.]
# [ -2. 1.]]
#
# [[ 5. -4.]
# [ -4. 5.]]
#
# [[ 13. -14.]
# [-14. 13.]]]
print(result[1])
# [[ 1. -2.]
# [-2. 1.]]
Comments
As you can see, we first simply create the list of matrices, and then convert them to an array at the end. I am not sure if you really need the 3D array though, as you could also just index the list that was created, but that depends on your use case, if that is convenient or not.
I guess the axis keyword argument for a lot of numpy functions can be confusing at first, but the documentation usually has good examples that combined with same trial and error, should get you pretty far. For example for numpy.stack, the very first example is indeed exactly what you want to do.

What is the fastest way to stack numpy arrays in a loop?

I have a code that generates me within a for loop two numpy arrays (data_transform). In the first loop generates a numpy array of (40, 2) and in the second loop one of (175, 2). I want to concatenate these two arrays into one, to give me an array of (215, 2). I tried with np.concatenate and with np.append, but it gives me an error since the arrays must be the same size. Here is an example of how I am doing the code:
result_arr = np.array([])
for label in labels_set:
data = [index for index, value in enumerate(labels_list) if value == label]
for i in data:
sub_corpus.append(corpus[i])
data_sub_tfidf = vec.fit_transform(sub_corpus)
data_transform = pca.fit_transform(data_sub_tfidf)
#Append array
sub_corpus = []
I have also used np.row_stack but nothing else gives me a value of (175, 2) which is the second array I want to concatenate.
What #hpaulj was trying to say with
Stick with list append when doing loops.
is
#use a normal list
result_arr = []
for label in labels_set:
data_transform = pca.fit_transform(data_sub_tfidf)
# append the data_transform object to that list
# Note: this is not np.append(), which is slow here
result_arr.append(data_transform)
# and stack it after the loop
# This prevents slow memory allocation in the loop.
# So only one large chunk of memory is allocated since
# the final size of the concatenated array is known.
result_arr = np.concatenate(result_arr)
# or
result_arr = np.stack(result_arr, axis=0)
# or
result_arr = np.vstack(result_arr)
Your arrays don't really have different dimensions. They have one different dimension, the other one is identical. And in that case you can always stack along the "different" dimension.
Using concatenate, initializing "c":
a = np.array([[8,3,1],[2,5,1],[6,5,2]])
b = np.array([[2,5,1],[2,5,2]])
matrix = [a,b]
c = np.empty([0,matrix[0].shape[1]])
for v in matrix:
c = np.append(c, v, axis=0)
Output:
[[8. 3. 1.]
[2. 5. 1.]
[6. 5. 2.]
[2. 5. 1.]
[2. 5. 2.]]
If you have an array a of size (40, 2) and an array b of size (175,2), you can simply have a final array of size (215, 2) using np.concatenate([a,b]).

Markov Clustering in Python

As the title says, I'm trying to get a Markov Clustering Algorithm to work in Python, namely Python 3.7
Unfortunately, it's not doing much of anything, and it's driving me up the wall trying to fix it.
EDIT: First, I've made the adjustments to the main code to make each column sum to 100, even if it's not perfectly balanced. I'm going to try to account for that in the final answer.
To be clear, the biggest problem is that the numbers spiral out of control, into such easily-understandable numbers as 5.56268465e-309, and I don't know how to convert that into something understandable.
Here's the code so far:
import numpy as np
import math
## How far you'd like your random-walkers to go (bigger number -> more walking)
EXPANSION_POWER = 2
## How tightly clustered you'd like your final picture to be (bigger number -> more clusters)
INFLATION_POWER = 2
ITERATION_COUNT = 10
def normalize(matrix):
return matrix/np.sum(matrix, axis=0)
def expand(matrix, power):
return np.linalg.matrix_power(matrix, power)
def inflate(matrix, power):
for entry in np.nditer(transition_matrix, op_flags=['readwrite']):
entry[...] = math.pow(entry, power)
return matrix
def run(matrix):
#np.fill_diagonal(matrix, 1)
#print(matrix)
matrix = normalize(matrix)
print(matrix)
for _ in range(ITERATION_COUNT):
matrix = normalize(inflate(expand(matrix, EXPANSION_POWER), INFLATION_POWER))
return matrix
transition_matrix = np.array ([[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0.5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0.33,0,0,0.5,0,0,0,0,0,0,0,0,0,0.125,1],
[0,0,0,0.33,0,0,0.5,1,1,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.166,0,0,0,0,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.166,0,0,0,0,0.2,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0,0,0,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0.5,0,0,0,0,0,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0.5,0,1,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0,1,0,1,0,0.125,0],
[0,0,0,0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0.33,0,0,0,0,0,0,0,0,0,0.5,0,0],
[0,0,0,0,0,0.33,0,0,0,0,0,0,0,0,0,0.5,0,0]])
run(transition_matrix)
print(transition_matrix)
This is part of a uni assignment - I need to do this array both weighted and unweighted (though the weighted part can just wait until I've got the bloody thing working at all) any tips or suggestions?
Your transition matrix is not valid.
>>> transition_matrix.sum(axis=0)
>>> matrix([[1. , 1. , 0.99, 0.99, 0.96, 0.99, 1. , 1. , 0. , 1. ,
1. , 1. , 1. , 0. , 0. , 1. , 0.88, 1. ]])
Not only does some of your columns not sum to 1, some of them sum to 0.
This means when you try to normalize your matrix, you will end up with nan because you are dividing by 0.
Lastly, is there a reason why you are using a Numpy matrix instead of just a Numpy array, which is the recommended container for such data? Because using Numpy arrays will simplify some of the operations, such as raising each entry to a power. Also, there are some differences between Numpy matrix and Numpy array which can result in subtle bugs.

How to blur 3D array of points, while maintaining their original values? (Python)

I have a sparse 3D array of values. I am trying to turn each "point" into a fuzzy "sphere", by applying a Gaussian filter to the array.
I would like the original value at the point (x,y,z) to remain the same. I just want to create falloff values around this point... But applying the Gaussian filter changes the original (x,y,z) value as well.
I am currently doing this:
dataCube = scipy.ndimage.filters.gaussian_filter(dataCube, 3, truncate=8)
Is there a way for me to normalize this, or do something so that my original values are still in this new dataCube? I am not necessarily tied to using a Gaussian filter, if that is not the best approach.
You can do this using a convolution with a kernel that has 1 as its central value, and a width smaller than the spacing between your data points.
1-d example:
import numpy as np
import scipy.signal
data = np.array([0,0,0,0,0,5,0,0,0,0,0])
kernel = np.array([0.5,1,0.5])
scipy.signal.convolve(data, kernel, mode="same")
gives
array([ 0. , 0. , 0. , 0. , 2.5, 5. , 2.5, 0. , 0. , 0. , 0. ])
Note that fftconvolve might be much faster for large arrays. You also have to specify what should happen at the boundaries of your array.
Update: 3-d example
import numpy as np
from scipy import signal
# first build the smoothing kernel
sigma = 1.0 # width of kernel
x = np.arange(-3,4,1) # coordinate arrays -- make sure they contain 0!
y = np.arange(-3,4,1)
z = np.arange(-3,4,1)
xx, yy, zz = np.meshgrid(x,y,z)
kernel = np.exp(-(xx**2 + yy**2 + zz**2)/(2*sigma**2))
# apply to sample data
data = np.zeros((11,11,11))
data[5,5,5] = 5.
filtered = signal.convolve(data, kernel, mode="same")
# check output
print filtered[:,5,5]
gives
[ 0. 0. 0.05554498 0.67667642 3.0326533 5. 3.0326533
0.67667642 0.05554498 0. 0. ]

Saving an array inside a column of a matrix in numpy shape error

Let's say I do some calculation and I get a matrix of size 3 by 3 each time in a loop. Assume that each time, I want to save such matrix in a column of a bigger matrix, whose number of rows is equal to 9 (total number of elements in the smaller matrix). first I reshape the smaller matrix and then try to save it into one column of the big matrix. A simple code for only one column looks something like this:
import numpy as np
Big = np.zeros((9,3))
Small = np.random.rand(3,3)
Big[:,0]= np.reshape(Small,(9,1))
print Big
But python throws me the following error:
Big[:,0]= np.reshape(Small,(9,1))
ValueError: could not broadcast input array from shape (9,1) into shape (9)
I also tried to use flatten, but that didn't work either. Is there any way to create a shape(9) array from the small matrix or any other way to handle this error?
Your help is greatly appreciated!
try:
import numpy as np
Big = np.zeros((9,3))
Small = np.random.rand(3,3)
Big[:,0]= np.reshape(Small,(9,))
print Big
or:
import numpy as np
Big = np.zeros((9,3))
Small = np.random.rand(3,3)
Big[:,0]= Small.reshape((9,1))
print Big
or:
import numpy as np
Big = np.zeros((9,3))
Small = np.random.rand(3,3)
Big[:,[0]]= np.reshape(Small,(9,1))
print Big
Either case gets me:
[[ 0.81527817 0. 0. ]
[ 0.4018887 0. 0. ]
[ 0.55423212 0. 0. ]
[ 0.18543227 0. 0. ]
[ 0.3069444 0. 0. ]
[ 0.72315677 0. 0. ]
[ 0.81592963 0. 0. ]
[ 0.63026719 0. 0. ]
[ 0.22529578 0. 0. ]]
Explanation
the shape of Big you are trying to assign to is (9, ) one-dimensional. The shape you are trying to assign with is (9, 1) two-dimensional. You need to reconcile this by making the two-dim a one-dim np.reshape(Small, (9,1)) into np.reshape(Small, (9,)). Or, make the one-dim into a two-dim Big[:, 0] into Big[:, [0]]. The exception is when I assigned 'Big[:, 0] = Small.reshape((9,1))`. In this case, numpy must be checking.

Categories