Population of a Numpy array - python

I am writing a code to populate a Numpy array. It's a 2D array, so I run two For loops with a specified condition. "Coords" array are positions of my particles. However every time I try to run it through the loops "coords" array starts being populated from x=2 position rather than x=1 (so the second position from the array) and then it gives me an error because of the lack of space in the array. Any ideas why is this happening?
Thank you for any help!
The error I am getting:
"IndexError: index 10 is out of bounds for axis 0 with size 10"
Here's my code:
import random
from math import exp, pow
import numpy as np
#Lattice and sampling parameters:
L = 10 #no of sites per one side of the lattice
nSites = L*L
nPart = 10 #no of particles
nSteps = 50000
AddRemFraction = 0.5 #acceptance ratio
# create an array of coordinates of particles:
coords = np.zeros((nPart,2), dtype=object)
# create an LxL array of particle occupancy => occ(x,y) = 1 if site is occupied
occ = np.zeros((L, L), dtype=object)
#no of particles that have been placed on the lattice so far
nPlace = 0
for x in range (1, L):
for y in range (1, L):
if (nPlace < nPart):
nPlace = nPlace +1
coords[nPlace,:] = [x,y]
occ[x,y]=1

Related

Subtracting Two dimensional arrays using numpy broadcasting

I'm new to the numpy in general so this is an easy question however i'm clueless as how to solve it.
i'm trying to implement K nearest neighbor algorithm for classification of a Data set
there are to arrays named new_points and point that respectively have the shape of (30,4)
and (120,4) (with 4 being the total number of the properties of each element)
so i'm trying to calculate the distance between each new point and all old points using numpy.broadcasting
def calc_no_loop(new_points, points):
return np.sum((new_points-points)**2,axis=1)
#doesn't work here is log
ValueError: operands could not be broadcast together with shapes (30,4) (120,4)
however as per rules of broadcasting two array of shapes (30,4) and (120,4) are incompatible
so i would appreciate any insight on how to slove this (using .reshape prehaps - not sure)
please note: that i'have already implemented the same function using one and two loops but can't implement it without one
def calc_two_loops(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
for i in range(m):
for j in range(n):
d[i, j] = np.sum((new_points[i] - points[j])**2)
return d
def calc_one_loop(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
print(d)
for i in range(m):
d[i] = np.sum((new_points[i] - points)**2)
return d
Let's create an exapmle smaller in size:
nNew = 3; nOld = 5 # Number of new / old points
# New points
new_points = np.arange(100, 100 + nNew * 4).reshape(nNew, 4)
# Old points
points = np.arange(10, 10 + nOld * 8, 2).reshape(nOld, 4)
To compute the differences alone, run:
dfr = new_points[:, np.newaxis, :] - points[np.newaxis, :, :]
So far we have differences in each property of each point (every new point with every old point).
The shape of dfr is (3, 5, 4):
first dimension: the number of new point,
second dimension: the number of old point,
third dimension: the difference in each property.
Then, to sum squares of differences by points, run:
d = np.power(dfr, 2).sum(axis=2)
and this is your result.
For my sample data, the result is:
array([[31334, 25926, 21030, 16646, 12774],
[34230, 28566, 23414, 18774, 14646],
[37254, 31334, 25926, 21030, 16646]], dtype=int32)
So you have 30 new points, and 120 old points, so if I understand you correctly you want a shape(120,30) array result of distances.
You could do
import numpy as np
points = np.random.random(120*4).reshape(120,4)
new_points = np.random.random(30*4).reshape(30,4)
def calc_no_loop(new_points, points):
res = np.zeros([len(points[:,0]),len(new_points[:,0])])
for idx in range(len(points[:,0])):
res[idx,:] = np.sum((points[idx,:]-new_points)**2,axis=1)
return np.sqrt(res)
test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)
Which gives
(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
[0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
[0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
...
[0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
[0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
[1.08515826 0.64626221 0.6898687 ... 0.96882542 1.08075076 0.80144746]]
But from your function name above I get the notion that you do not want a loop? Then you could do this instead:
def calc_no_loop(new_points, points):
new_points1 = np.repeat(new_points[np.newaxis,...],len(points),axis=0)
points1 = np.repeat(points[:,np.newaxis,:],len(new_points),axis=1)
return np.sqrt(np.sum((new_points-points1)**2 ,axis=2))
test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)
which has output:
(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
[0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
[0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
...
[0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
[0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
[1.08515826 0.64626221 0.6898687 ... 0.96882542 1.08075076 0.80144746]]
i.e. the same result. Note that I added the np.sqrt() into the result which you may have forgotten in your example above.

Assign 3D data value based on a 2D index profile

I have a 3D numpy array:
data0 = np.random.rand(30, 50, 50)
I have a 2D surface:
surf = np.random.rand(50, 50) * 30
surf = surf.astype(int)
Now I want to assign '0' to data0 along the surface profile. Which I know for loop can achieve this:
for xx in range(50):
for yy in range(50):
data0[0:surf[xx, yy], xx, yy] = 0
Data0 is a 3D volume with size of 30 * 50 * 50. surf is a 2D surface profile with size of 50 * 50. What I am trying to do is filling '0' from top to the surface (axis=0) in the volume
Here, 'for' loop is very slow, and it is inefficient when data0 is very huge. Could someone advise how to efficiently assign the values based on the surf profile?
If you want to use numpy, you can create a mask with z-index values below your surf values set to True, then fill those cells with zeros:
import numpy as np
np.random.seed(123)
x, y, z = 4, 5, 3
data0 = np.random.rand(z, x, y)
surf = np.random.rand(x, y) * z
surf = surf.astype(int)
#your attempt
#we copy the data just for the comparison
data_loop = data0.copy()
for xx in range(x):
for yy in range(y):
data_loop[0:surf[xx, yy], xx, yy] = 0
#again, we copy the data just for the comparison
data_np = data0.copy()
#masking the cells according to your index comparison criteria
mask = np.broadcast_to(np.arange(data0.shape[0])[:,None, None], data0.shape) < surf[None, :]
#set masked values to zero
data_np[mask] = 0
#check for equivalence of resulting arrays
print((data_np==data_loop).all())
I am sure there is a better, numpier way to generate the index number mask. As it is, this version is not necessarily faster. This depends on the shape of your array.
For x=500, y=200, and z=3000, your loop takes 1.42 s and my numpy approach 1.94 s.
For the same array size but with shape x=5000, y=2000, and z=30, your loop approach takes 7.06 s and the numpy approach 1.95 s.

Generate an array of random floats summing to 1 while fixing a few elements in python

I have something like below:
random_array = np.random.random(10)
scaled_array = random_array/np.sum(random_array)
This gives me a nice array with random floats that sum to 1. However, I am trying to take this a step further and do the following:
For example, fix the 2nd and 5th elements to be 0.04 and 0.09 respectively, and generate all other elements randomly. But the sum of the whole array still needs to be exactly 1.
Taking one more step, I want to provide an upper (lower) bound for all/each element(s). For example, I still want to fix the 4th element to be 0.09 but ALSO want to force ALL elements to be LESS THAN 0.1. (They will still add up to 1 because I have more than 10 elements.)
How can I achieve this?
If you want the values before scaling:
import numpy as np
random_array = np.random.random(10)
random_array[1] = 0.04
random_array[4] = 0.09
scaled_array = random_array/np.sum(random_array)
assert np.isclose(1, scaled_array.sum())
If you want fixed values after scaling:
import numpy as np
random_array = np.random.random(10)
random_array[1] = 0
random_array[4] = 0
scaled_array = (random_array/np.sum(random_array)) * (1.0 - (0.04 + 0.09))
scaled_array[1] = 0.04
scaled_array[4] = 0.09
assert np.isclose(1, scaled_array.sum())
Try the string cutting approach of dirichlet distribution:
N=7 # total number of elements in result
d = {2:0.04, 5:0.09} # dictionary with index as key and values
fixed_sum = 0.
result = np.zeros(N) # placeholder numpy array
# Put the fixed elements in their place and calculate their sum
for k,v in d.items():
result[k] = v
fixed_sum = fixed_sum + v
remaining_sum = 1 - fixed_sum
# Use dirichlet distribution to get elements which sum to 1.
# Multiply with remaining_sum to get elements which sum to "remaining_sum".
remaining_arr = np.random.default_rng().dirichlet(np.ones(N-len(d)))*remaining_sum
# Get the index of result where elements are zero.
zero_indx = np.nonzero(result==0)[0]
# Place the elements of remaining_arr in the result.
result[zero_indx] = remaining_arr

Translating Matlab (Octave) group coloring code into python (numpy, pyplot)

I want to translate the following group coloring octave function to python and use it with pyplot.
Function input:
x - Data matrix (m x n)
a - A parameter.
index - A vector of size "m" with values in range [: a]
(For example if a = 4, index can be [random.choice(range(4)) for i in range(m)]
The values in "index" indicate the number of the group the "m"th data point belongs to.
The function should plot all the data points from x and color them in different colors (Number of different colors is "a").
The function in octave:
p = hsv(a); % This is a x 3 metrix
colors = p(index, :); % ****This is m x 3 metrix****
scatter(X(:,1), X(:,2), 10, colors);
I couldn't find a function like hsv in python, so I wrote it myself (I think I did..):
p = colors.hsv_to_rgb(numpy.column_stack((
numpy.linspace(0, 1, a), numpy.ones((a ,2)) )) )
But I can't figure out how to do the matrix selection p(index, :) in python (numpy).
Specially because the size of "index" is bigger then "a".
Thanks in advance for your help.
So, you want to take an m x 3 of HSV values, and convert each row to RGB?
import numpy as np
import colorsys
mymatrix = np.matrix([[11,12,13],
[21,22,23],
[31,32,33]])
def to_hsv(x):
return colorsys.rgb_to_hsv(*x)
#Apply the to_hsv function to each matrix row.
print np.apply_along_axis(to_hsv, axis=1, arr=mymatrix)
This produces:
[[ 0.5 0. 13. ]
[ 0.5 0. 23. ]
[ 0.5 0. 33. ]]
Follow through on your comment:
If I understand you have a matrix p that is an a x 3 matrix, and you want to randomly select rows from the matrix over and over again, until you have a new matrix that is m x 3?
Ok. Let's say you have a matrix p defined as follows:
a = 5
p = np.random.randint(5, size=(a, 3))
Now, make a list of random integers between the range 0 -> 3 (index starts at 0 and ends to a-1), That is m in length:
m = 20
index = np.random.randint(a, size=m)
Now access the right indexes and plug them into a new matrix:
p_prime = np.matrix([p[i] for i in index])
Produces a 20 x 3 matrix.

Errors with matplotlib plot, python

I get this horrible massive error when trying to plot using matplotlib:
Traceback (most recent call last):
File "24oct_specanal.py", line 90, in <module>
main()
File "24oct_specanal.py", line 83, in main
plt.plot(Svar,Sav)
File "/usr/lib64/python2.6/site-packages/matplotlib/pyplot.py", line 2458, in plot
ret = ax.plot(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/matplotlib/axes.py", line 3849, in plot
self.add_line(line)
File "/usr/lib64/python2.6/site-packages/matplotlib/axes.py", line 1443, in add_line
self._update_line_limits(line)
File "/usr/lib64/python2.6/site-packages/matplotlib/axes.py", line 1451, in _update_line_limits
p = line.get_path()
File "/usr/lib64/python2.6/site-packages/matplotlib/lines.py", line 644, in get_path
self.recache()
File "/usr/lib64/python2.6/site-packages/matplotlib/lines.py", line 392, in recache
x = np.asarray(xconv, np.float_)
File "/usr/lib64/python2.6/site-packages/numpy/core/numeric.py", line 235, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
This is the code I am using:
import numpy as np
import numpy.linalg
import random
import matplotlib.pyplot as plt
import pylab
from scipy.optimize import curve_fit
from array import array
def makeAImatrix(n):
A=np.zeros((n,n))
I=np.ones((n))
for i in range(0,n):
for j in range(i+1,n):
A[j,i]=random.random()
for i in range(0,n):
for j in range(i+1,n):
A[i,j] = A[j,i]
for i in range(n):
A[i,i]=1
return (A, I)
def main():
n=5 #number of species
t=1 # number of matrices to check
Aflat = []
Aflatlist = [] #list of matrices
Aflatav = []
Aflatvar = []
Aflatskew = []
remspec = []
Afreeze = [] #this is a LIST OF VECTORS that stores the vector corresponding to each extinct species as
#it is taken out. it is NOT the same as the original A matrix as it is only
#coherant in one direction. it is also NOT A SQUARE.
Sex = [] # (Species extinct) this is a vector that corresponds to the Afreeze matrix. if a species is extinct then
#the value stored here will be -1.
Sav = [] # (Species average) The average value of the A cooefficiants for each species
Svar = [] # (Species variance)
for k in range (0,t):
allpos = 0
A, I = makeAImatrix(n)
while allpos !=1: #while all solutions are not positive
x = numpy.linalg.solve(A,I)
if any(t<0 for t in x): #if any of the solutions in x are negative
p=np.where(x==min(x)) # find the most negative solution, p is the position
#now store the A coefficiants of the extinct species in the Afreeze list
Afreeze.append(A[p])
Sex.append(-1) #given -1 value as species is extinct.
x=np.delete(x, p, 0)
A=np.delete(A, p, 0)
A=np.delete(A, p, 1)
I=np.delete(I, p, 0)
else:
allpos = 1 #set allpos to one so loop is broken
l=len(x)
#now fill Afreeze and Sex with the remaining species that have survived
for m in range (0, l):
Afreeze.append(A[m])
Sex.append(1) # value of 1 as this species has survived
#now time to analyse the coefficiants for each species.
for m in range (0, len(Sex)):
X1 = sum(Afreeze[m])/len(Afreeze[m]) # this is the mean
X2 = 0
for p in range (len(Afreeze[m])):
X2 = X2 + Afreeze[m][p]
X2 = X2/len(Afreeze[m])
Sav.append(X1)
Svar.append(X2 - X1*X1)
spec = []
for b in range(0,n):
spec.append(b)
plt.plot(Svar,Sav)
plt.show()
#plt.scatter(spec, Sav)
#plt.show()
if __name__ == '__main__':
main()
I cannot figure this out at all! I think it was working before but then just stopped working. Any ideas?
Your problem is in this section:
if any(t<0 for t in x): #if any of the solutions in x are negative
p=np.where(x==min(x)) # find the most negative solution, p is the position
#now store the A coefficiants of the extinct species in the Afreeze list
Afreeze.append(A[p])
You're indexing a 2D array, and the result is still a 2D array. So, your Afreeze will get a 2D array appended, instead of a 1D array. Later, where you sum the separate elements of Afreeze, a summed 2D array will result in a 1D array, and that gets added to Sav and Svar. By the time you feed these variables to plt.plot(), matplotlib will get an array as one of the elements instead of a single number, which it of course can't cope with.
You probably want:
if any(t<0 for t in x):
p=np.where(x==min(x))
Afreeze.append(A[p][0])
but I haven't tried to follow the logic of the script very much; that's up to you.
Perhaps good to see if this is indeed what you want: print the value of A[p][0] in the line before it gets appended to Afreeze.
I noted that because of the random.random() in the matrix creation, the if statement isn't always true, so the problem doesn't always show up. Minor detail, but could confuse people.
Fix your identation?
import numpy as np
import numpy.linalg
import random
import matplotlib.pyplot as plt
import pylab
from scipy.optimize import curve_fit
from array import array
def main():
n=20 #number of species
spec=np.zeros((n+1))
for i in range(0,n):
spec[i]=i
t=100 #initial number of matrices to check
B = np.zeros((n+1)) #matrix to store the results of how big the matrices have to be
for k in range (0,t):
A=np.zeros((n,n))
I=np.ones((n))
for i in range(0,n):
for j in range(i+1,n):
A[j,i]=random.random()
for i in range(0,n):
for j in range(i+1,n):
A[i,j] = A[j,i]
for i in range(n):
A[i,i]=1
allpos = 0
while allpos !=1: #while all solutions are not positive
x = numpy.linalg.solve(A,I)
if any(t<0 for t in x): #if any of the solutions in x are negative
p=np.where(x==min(x)) # find the most negative solution, p is the position
x=np.delete(x, p, 0)
A=np.delete(A, p, 0)
A=np.delete(A, p, 1)
I=np.delete(I, p, 0)
else:
allpos = 1 #set allpos to one so loop is broken
l=len(x)
B[l] = B[l]+1
B = B/n
pi=3.14
resfile=open("results.txt","w")
for i in range (0,len(spec)):
resfile.write("%d " % spec[i])
resfile.write("%0.6f \n" %B[i])
resfile.close()
plt.hist(B, bins=n)
plt.title("Histogram")
plt.show()
plt.plot(spec,B)
plt.xlabel("final number of species")
plt.ylabel("fraction of total matrices")
plt.title("plot")
plt.show()
if __name__ == '__main__':
main()
Got this:

Categories