New to python and numpy, searched and tried all possible solution not getting results
I have a function that returns 2 matrices. I want to create an array or matrices that saves each of the matrices being returned by my function. I've done so many different versions, this was the closest. I'm used to java and not python. If I do the following: centroidsm[0] and clustersm[0], I cannot get each individual array.
This is my code:
centroidsm = []
centroidsm.append([])
clustersm = []
clustersm.append([])
for k in range(2,20):
centroids, clusters = kMeans(train, k)
centroidsm[k].append(centroids)
clustersm[k].append(clusters)
First, I don't know why you're appending an empty array to centroidsm. You should do away with those lines.
Second, if centroidsm is supposed to be an array of centroid matrices, you simply need to call centroidism.append(centroids) inside you for loop (centroidsm[k].append attempts to append to an array at index k - an array that doesn't exist).
centroidsm = []
clustersm = []
for k in range(2,20):
centroids, clusters = kMeans(train, k)
centroidsm.append(centroids)
clustersm.append(clusters)
Related
I have two numpy arrays and each has shape of (10000,10000).
One is value array and the other one is index array.
Value=np.random.rand(10000,10000)
Index=np.random.randint(0,1000,(10000,10000))
I want to make a list (or 1D numpy array) by summing all the "Value array" referring the "Index array". For example, for each index i, finding matching array index and giving it to value array as argument
for i in range(1000):
NewArray[i] = np.sum(Value[np.where(Index==i)])
However, This is too slow since I have to do this loop through 300,000 arrays.
I tried to come up with some logical indexing method like
NewArray[Index] += Value[Index]
But it didn't work.
The next thing I tried is using dictionary
for k, v in list(zip(Index.flatten(),Value.flatten())):
NewDict[k].append(v)
and
for i in NewDict:
NewDict[i] = np.sum(NewDict[i])
But it was slow too
Is there any smart way to speed up?
I had two thoughts. First, try masking, it speeds this up by about 4x:
for i in range(1000):
NewArray[i] = np.sum(Value[Index==i])
Alternately, you can sort your arrays to put the values you're adding together in contiguous memory space. Masking or using where() has to gather all your values together each time you call sum on the slice. By front-loading this gathering, you might be able to speed things up considerably:
# flatten your arrays
vals = Value.ravel()
inds = Index.ravel()
s = np.argsort(inds) # these are the indices that will sort your Index array
v_sorted = vals[s].copy() # the copy here orders the values in memory instead of just providing a view
i_sorted = inds[s].copy()
searches = np.searchsorted(i_sorted, np.arange(0, i_sorted[-1] + 2)) # 1 greater than your max, this gives you your array end...
for i in range(len(searches) -1):
st = searches[i]
nd = searches[i+1]
NewArray[i] = v_sorted[st:nd].sum()
This method takes 26 sec on my computer vs 400 using the old way. Good luck. If you want to read more about contiguous memory and performance check this discussion out.
As the title states, I want to create a function that'll take a multidimensional array A, and a number B, that ultimately returns the number in A that is the closest to B. If the number B is in A, then return it. If there's 2 numbers in A that are equally distant from B, choose the first one by counting from row to row.
This is the code I have so far:
import numpy as np
def g_C(A,B):
A = np.asanyarray(A)
assert A.ndim == 2 # to assert that A is a multidimensional array.
get = (np.abs(A-B)).argmin()
return (A[get])
However from my understanding, I think (np.abs(M-N)).argmin() really only effectively works for sorted arrays? I'm not allowed to sort the array in this problem; I have to work on it for face value, examining row by row, and grabbing the first instance of the closest number to B.
So for example, g_C([[1,3,6,-8],[2,7,1,0],[4,5,2,8],[2,3,7,10]],9) should return 8
Also, I was given the hint that numpy.argmin would help, and I see that it's purpose is to extract the first occurrence something occurs, which makes sense in this problem, but I'm not sure how exactly to fit that into the code I have at the moment.
EDIT
The flat suggestion works perfectly fine. Thank you everyone.
I'm trying RagingRoosevelt's second suggestion, and I'm stuck.
def g_C(A,B):
A = np.asanyarray(A)
D = np.full_like(A, B) # created an array D with same qualities as array A, but just filled with values of B
diffs = abs(D-A) # finding absolute value differences between D and A
close = diffs.argmin(axis=1) # find argmin of 'diffs', row by row
close = np.asanyarray(close) # converted the argmins of 'diff' into an array
closer = close.argmin() # the final argmin ??
return closer
I'm trying out this suggestion because I have another problem related to this where I have to extract the row who's sum is the closest number to B. And I figure this is good practice anyway.
Your existing code is fine except, by default, argmin returns an index to the flattened array. So you could do
return A.flat[abs(A - B).argmin()]
to get the right value from A.
EDIT: For your other problem - finding the row in a 2-dimensional array A whose sum is closest to B - you can do:
return A[abs(A.sum(axis=1) - B).argmin()]
In either case I don't see any need to create an array of B.
Your problem is the same as a find-min problem. The only difference is that you're looking for min(abs(A[i]-B)) instead. So, iterate over your array. As you do so, record the smallest absolute delta and the index at which it occurred. When you find a smaller delta, update the record and then keep searching. When you've made it all the way through, return whatever value was at the recorded index.
Since you're working with numpy arrays, another approach is that you could create an array of identical size as A but filled only with value B. Compute the difference between the arrays and then use argmin on each row. Assemble an array of all minimum values for each row and then do argmin again to pull out the smallest of the values.
This will work for any 2-dimensional array with a nested for-loop, but I am not sure that this is what you want (as in it doesn't use numpy).
def g_C(A, B):
i = A[0][0]
m = abs(B - A[0][0])
for r in A:
for i in r:
if abs(B - i) < m:
m = abs(B - i)
n = i
return n
Nevertheless, it does work:
>>> g_C([[1,3,6,-8],[2,7,1,0],[4,5,2,8],[2,3,7,10]],9)
8
I want to combine an unspecified (finite) number of matrices under a Kroneckerproduct. In order to do this I want to save the matrices in an array but I don't know how to do this. At the moment I have:
for i in range(LNew-2):
for j in range(LNew-2):
Bulk = np.empty(shape=(LNew-1,LNew-1))
if i == j:
Bulk[i,j]=H2
else:
Bulk[i,j]=idm
Here the H2 and idm are both matrices, which I want to combine under a Kronecker product. But since Bulk is an ndarray object I suppose it wont accept arraylike objects inside it.
edit:
This is the function in which I want to use this idea. I am using it to build a Hamiltonian matrix for a quantum spin chain. So H2 is the Hamiltonian for a two particle chain,
H2 is a 4x4 matrix and idm is the 2x2 identity matrix.
and now the three particle chain is np.kron(H2,idm)+np.kron(idm,H2)
and for four particles
np.kron(np.kron(H2,idm),idm)+np.kron(idm,np.kron(H2,idm))+np.kron(idm,np.kron(idm,H2)) and so on.
def ExpandHN(LNew):
idm = np.identity(2)
H2 = GetH(2,'N')
HNew = H2
for i in range(LNew-2):
for j in range(LNew-2):
Bulk = np.empty(shape=(LNew-1,LNew-1))
if i == j:
Bulk[i,j]=H2
else:
Bulk[i,j]=idm
i = 0
for i in range(LNew-2):
for j in range(LNew-3):
HNew += np.kron(Bulk[i,j],Bulk[i,j+1]) #something like this
return HNew
As you can see the second set of for loops hasn't been worked out.
That being said, if someone has a totaly different but working solution I would be happy with that too.
If I understand correctly, the your question boils down to how to create arrays of arrays with numpy. I would suggest to use the standard python object dict:
Bulk = dict()
for i in range(LNew-2):
for j in range(LNew-2):
if i == j:
Bulk[(i,j)]=H2
else:
Bulk[(i,j)]=idm
The usage of tuples as keys allows you to maintain an array-like indexing of the matrices.
Also note, that you should define Bulk outside of the two for loops (in any case).
HTH
I am trying to run the following code, energylist is a list of 2-dimensional arrays. I want to manipulate an element [k][i] in the 2-D arrays in energylist. This is done in the for loop with energylist[r][k][i]= (N1[s]*0.1+N2[t]*1).
N1 = xrange(0,4,1)
N2 = xrange(0,4,1)
V1 = arange(-20,20.1,10)
V2 = arange(-20,20.1,10)
energy = zeros((len(V2),len(V1)))
energylist=[]
for l in xrange(0,16,1):
energylist.append(energy)
for i in xrange(0,len(V1),1):
for k in xrange(0,len(V2),1):
r=0
for s in xrange(0,len(N1),1):
for t in xrange(0,len(N2),1):
energylist[r][k][i]= (N1[s]*0.1+N2[t]*1)
r += 1
However, after running this, all of the arrays in energylist are the same, although obviously this is not reasonable as N1 and N2 have changed. The code works if I replace the line energylist[r][k][i]= (N1[s]*0.1+N2[t]*1) with
energy=array(energylist[r])
energy[k][i]= (N1[s]*0.1+N2[t]*1)
energylist[r]=array(energy)
What is wrong with my original code?
The problem is in your line
energylist.append(energy)
Here you append only a REFERENCE to your only one existing energy array. Thus, you have only one array stored in memory, being referenced many times. If you modify this array anywhere, all pointers will point to the modified array. Your situation is not that you have many arrays with the same content - you have only one array with many references.
The solution would be:
for l in xrange(0,16,1):
energylist.append(zeros((len(V2),len(V1))))
Here, you create an explicit new zeros array in each append, so you will get 16 arrays stored in memory which you can modify separately.
I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell has been selected it can't be selected again, but all cells must be selected by the end).
I'm transitioning from IDL where I can find a nice way to do this, but I assume that NumPy has a nice way to do this too. What would you suggest?
Update: I should have stated that I'm trying to do this on 2D arrays, and therefore get a set of 2D indices back.
How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?
If you need to change the array in-place than you can create an index array like this:
your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)
print your_array[index_array[:10]]
All of these answers seemed a little convoluted to me.
I'm assuming that you have a multi-dimensional array from which you want to generate an exhaustive list of indices. You'd like these indices shuffled so you can then access each of the array elements in a randomly order.
The following code will do this in a simple and straight-forward manner:
#!/usr/bin/python
import numpy as np
#Define a two-dimensional array
#Use any number of dimensions, and dimensions of any size
d=numpy.zeros(30).reshape((5,6))
#Get a list of indices for an array of this shape
indices=list(np.ndindex(d.shape))
#Shuffle the indices in-place
np.random.shuffle(indices)
#Access array elements using the indices to do cool stuff
for i in indices:
d[i]=5
print d
Printing d verified that all elements have been accessed.
Note that the array can have any number of dimensions and that the dimensions can be of any size.
The only downside to this approach is that if d is large, then indices may become pretty sizable. Therefore, it would be nice to have a generator. Sadly, I can't think of how to build a shuffled iterator off-handedly.
Extending the nice answer from #WoLpH
For a 2D array I think it will depend on what you want or need to know about the indices.
You could do something like this:
data = np.arange(25).reshape((5,5))
x, y = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)
OR
data = np.arange(25).reshape((5,5))
grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)
You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.
Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.
Use random.sample to generates ints in 0 .. A.size with no duplicates,
then split them to index pairs:
import random
import numpy as np
def randint2_nodup( nsample, A ):
""" uniform int pairs, no dups:
r = randint2_nodup( nsample, A )
A[r]
for jk in zip(*r):
... A[jk]
"""
assert A.ndim == 2
sample = np.array( random.sample( xrange( A.size ), nsample )) # nodup ints
return sample // A.shape[1], sample % A.shape[1] # pairs
if __name__ == "__main__":
import sys
nsample = 8
ncol = 5
exec "\n".join( sys.argv[1:] ) # run this.py N= ...
A = np.arange( 0, 2*ncol ).reshape((2,ncol))
r = randint2_nodup( nsample, A )
print "r:", r
print "A[r]:", A[r]
for jk in zip(*r):
print jk, A[jk]
Let's say you have an array of data points of size 8x3
data = np.arange(50,74).reshape(8,-1)
If you truly want to sample, as you say, all the indices as 2d pairs, the most compact way to do this that i can think of, is:
#generate a permutation of data's size, coerced to data's shape
idxs = divmod(np.random.permutation(data.size),data.shape[1])
#iterate over it
for x,y in zip(*idxs):
#do something to data[x,y] here
pass
Moe generally, though, one often does not need to access 2d arrays as 2d array simply to shuffle 'em, in which case one can be yet more compact. just make a 1d view onto the array and save yourself some index-wrangling.
flat_data = data.ravel()
flat_idxs = np.random.permutation(flat_data.size)
for i in flat_idxs:
#do something to flat_data[i] here
pass
This will still permute the 2d "original" array as you'd like. To see this, try:
flat_data[12] = 1000000
print data[4,0]
#returns 1000000
people using numpy version 1.7 or later there can also use the builtin function numpy.random.choice